EXPLODE
Expands the elements of one or more collection columns (ARRAY or SET) into individual table rows, one row per element. For each exploded collection, the results include two columns, one for the element index, and one for the value at that position. If the function explodes a single collection, these columns are named position
and value
by default. If the function explodes two or more collections, the columns for each collection are named pos_
column-name
and val_
column-name
. You can use an AS clause in the SELECT to change these column names.
EXPLODE and UNNEST both expand collections. They have the following differences:
-
By default, EXPLODE expands only the first collection it is passed and UNNEST expands all of them. See the
explode_count
andexplode_all
parameters. -
By default, EXPLODE returns element positions in an
index
column and UNNEST does not. See thewith_offsets
parameter. -
By default, EXPLODE requires an OVER clause and UNNEST ignores an OVER clause if present. See the
skip_partitioning
parameter.
Behavior type
ImmutableSyntax
EXPLODE (column[,...] [USING PARAMETERS param=value])
[ OVER ( [window-partition-clause] ) ]
Arguments
column
- Column in the table being queried. Unless
explode_all
is true, you must specify at least as many collection columns as the value of theexplode_count
parameter. Columns that are not collections are passed through without modification.Passthrough columns are not needed if
skip_partitioning
is true. OVER(...)
- How to partition and sort input data. The input data is the result set that the query returns after it evaluates FROM, WHERE, GROUP BY, and HAVING clauses. For EXPLODE, use OVER() or OVER(PARTITION BEST).
This clause is ignored if
skip_partitioning
is true.
Parameters
explode_all
(BOOLEAN)- If true, explode all collection columns. When
explode_all
is true, passthrough columns are not permitted.Default: false
explode_count
(INT)- The number of collection columns to explode. The function checks each column, up to this value, and either explodes it if is a collection or passes it through if it is not a collection or if this limit has been reached. If the value of
explode_count
is greater than the number of collection columns specified, the function returns an error.If
explode_all
is true, you cannot specifyexplode_count
.Default: 1
skip_partitioning
(BOOLEAN)- Whether to skip partitioning and ignore the OVER clause if present. EXPLODE translates a single row of input into multiple rows of output, one per collection element. There is, therefore, usually no benefit to partitioning the input first. Skipping partitioning can help a query avoid an expensive sort or merge operation. Even so, setting this parameter can negatively affect performance in rare cases.
Default: false
with_offset
(BOOLEAN)- Whether to return the index of each element.
Default: true
Null-handling
This function expands each element in a collection into a row, including null elements. If the input column is NULL or an empty collection, the function produces no rows for that column:
=> SELECT EXPLODE(ARRAY[1,2,null,4]) OVER();
position | value
----------+-------
0 | 1
1 | 2
2 |
3 | 4
(4 rows)
=> SELECT EXPLODE(ARRAY[]::ARRAY[INT]) OVER();
position | value
----------+-------
(0 rows)
=> SELECT EXPLODE(NULL::ARRAY[INT]) OVER();
position | value
----------+-------
(0 rows)
Joining on results
To use JOIN with this function you must set the skip_partitioning
parameter, either in the function call or as a session parameter.
You can use the output of this function as if it were a relation by using CROSS JOIN or LEFT JOIN LATERAL in a query. Other JOIN types are not supported.
Consider the following table of students and exam scores:
=> SELECT * FROM tests;
student | scores | questions
---------+---------------+-----------------
Bob | [92,78,79] | [20,20,100]
Lee | |
Pat | [] | []
Sam | [97,98,85] | [20,20,100]
Tom | [68,75,82,91] | [20,20,100,100]
(5 rows)
The following query finds the best test scores across all students who have scores:
=> ALTER SESSION SET UDPARAMETER FOR ComplexTypesLib skip_partitioning = true;
=> SELECT student, score FROM tests
CROSS JOIN EXPLODE(scores) AS t (pos, score)
ORDER BY score DESC;
student | score
---------+-------
Sam | 98
Sam | 97
Bob | 92
Tom | 91
Sam | 85
Tom | 82
Bob | 79
Bob | 78
Tom | 75
Tom | 68
(10 rows)
The following query returns maximum and average per-question scores, considering both the exam score and the number of questions:
=> SELECT student, MAX(score/qcount), AVG(score/qcount) FROM tests
CROSS JOIN EXPLODE(scores, questions USING PARAMETERS explode_count=2)
AS t(pos_s, score, pos_q, qcount)
GROUP BY student;
student | MAX | AVG
---------+----------------------+------------------
Bob | 4.600000000000000000 | 3.04333333333333
Sam | 4.900000000000000000 | 3.42222222222222
Tom | 4.550000000000000000 | 2.37
(3 rows)
These queries produce results for three of the five students. One student has a null value for scores and another has an empty array. These rows are not included in the function's output.
To include null and empty arrays in output, use LEFT JOIN LATERAL instead of CROSS JOIN:
=> SELECT student, MIN(score), AVG(score) FROM tests
LEFT JOIN LATERAL EXPLODE(scores) AS t (pos, score)
GROUP BY student;
student | MIN | AVG
---------+-----+------------------
Bob | 78 | 83
Lee | |
Pat | |
Sam | 85 | 93.3333333333333
Tom | 68 | 79
(5 rows)
The LATERAL keyword is required with LEFT JOIN. It is optional for CROSS JOIN.
Examples
Consider an orders table with the following contents:
=> SELECT orderkey, custkey, prodkey, orderprices, email_addrs
FROM orders LIMIT 5;
orderkey | custkey | prodkey | orderprices | email_addrs
------------+---------+-----------------------------------------------+-----------------------------------+----------------------------------------------------------------------------------------------------------------
113-341987 | 342799 | ["MG-7190 ","VA-4028 ","EH-1247 ","MS-7018 "] | ["60.00","67.00","22.00","14.99"] | ["bob@example,com","robert.jones@example.com"]
111-952000 | 342845 | ["ID-2586 ","IC-9010 ","MH-2401 ","JC-1905 "] | ["22.00","35.00",null,"12.00"] | ["br92@cs.example.edu"]
111-345634 | 342536 | ["RS-0731 ","SJ-2021 "] | ["50.00",null] | [null]
113-965086 | 342176 | ["GW-1808 "] | ["108.00"] | ["joe.smith@example.com"]
111-335121 | 342321 | ["TF-3556 "] | ["50.00"] | ["789123@example-isp.com","alexjohnson@example.com","monica@eng.example.com","sara@johnson.example.name",null]
(5 rows)
The following query explodes the order prices for a single customer. The other two columns are passed through and are repeated for each returned row:
=> SELECT EXPLODE(orderprices, custkey, email_addrs
USING PARAMETERS skip_partitioning=true)
AS (position, orderprices, custkey, email_addrs)
FROM orders WHERE custkey='342845' ORDER BY orderprices;
position | orderprices | custkey | email_addrs
----------+-------------+---------+------------------------------
2 | | 342845 | ["br92@cs.example.edu",null]
3 | 12.00 | 342845 | ["br92@cs.example.edu",null]
0 | 22.00 | 342845 | ["br92@cs.example.edu",null]
1 | 35.00 | 342845 | ["br92@cs.example.edu",null]
(4 rows)
The previous example uses the skip_partitioning
parameter. Instead of setting it for each call to EXPLODE, you can set it as a session parameter. EXPLODE is part of the ComplexTypesLib UDx library. The following example returns the same results:
=> ALTER SESSION SET UDPARAMETER FOR ComplexTypesLib skip_partitioning=true;
=> SELECT EXPLODE(orderprices, custkey, email_addrs)
AS (position, orderprices, custkey, email_addrs)
FROM orders WHERE custkey='342845' ORDER BY orderprices;
You can explode more than one column by specifying the explode_count
parameter:
=> SELECT EXPLODE(orderkey, prodkey, orderprices
USING PARAMETERS explode_count=2, skip_partitioning=true)
AS (orderkey,pk_idx,pk_val,ord_idx,ord_val)
FROM orders
WHERE orderkey='113-341987';
orderkey | pk_idx | pk_val | ord_idx | ord_val
------------+--------+----------+---------+---------
113-341987 | 0 | MG-7190 | 0 | 60.00
113-341987 | 0 | MG-7190 | 1 | 67.00
113-341987 | 0 | MG-7190 | 2 | 22.00
113-341987 | 0 | MG-7190 | 3 | 14.99
113-341987 | 1 | VA-4028 | 0 | 60.00
113-341987 | 1 | VA-4028 | 1 | 67.00
113-341987 | 1 | VA-4028 | 2 | 22.00
113-341987 | 1 | VA-4028 | 3 | 14.99
113-341987 | 2 | EH-1247 | 0 | 60.00
113-341987 | 2 | EH-1247 | 1 | 67.00
113-341987 | 2 | EH-1247 | 2 | 22.00
113-341987 | 2 | EH-1247 | 3 | 14.99
113-341987 | 3 | MS-7018 | 0 | 60.00
113-341987 | 3 | MS-7018 | 1 | 67.00
113-341987 | 3 | MS-7018 | 2 | 22.00
113-341987 | 3 | MS-7018 | 3 | 14.99
(16 rows)
The following example uses a multi-dimensional array:
=> SELECT name, pingtimes FROM network_tests;
name | pingtimes
------+-------------------------------------------------------
eng1 | [[24.24,25.27,27.16,24.97],[23.97,25.01,28.12,29.5]]
eng2 | [[27.12,27.91,28.11,26.95],[29.01,28.99,30.11,31.56]]
qa1 | [[23.15,25.11,24.63,23.91],[22.85,22.86,23.91,31.52]]
(3 rows)
=> SELECT EXPLODE(name, pingtimes USING PARAMETERS explode_count=1) OVER()
FROM network_tests;
name | position | value
------+----------+---------------------------
eng1 | 0 | [24.24,25.27,27.16,24.97]
eng1 | 1 | [23.97,25.01,28.12,29.5]
eng2 | 0 | [27.12,27.91,28.11,26.95]
eng2 | 1 | [29.01,28.99,30.11,31.56]
qa1 | 0 | [23.15,25.11,24.63,23.91]
qa1 | 1 | [22.85,22.86,23.91,31.52]
(6 rows)
You can rewrite the previous query as follows to produce the same results:
=> SELECT name, EXPLODE(pingtimes USING PARAMETERS skip_partitioning=true)
FROM network_tests;