This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
WITH clause
A WITH clause defines one or more named common table expressions (CTEs), where each CTE encapsulates a result set that can be referenced by another CTE in the same WITH clause, or by the primary query.
A WITH clause defines one or more named common table expressions (CTEs), where each CTE encapsulates a result set that can be referenced by another CTE in the same WITH clause, or by the primary query. Vertica can evaluate WITH clauses in two ways:
- Inline expansion (default): Vertica evaluates each WITH clause every time it is referenced by the primary query.
- Materialization: Vertica evaluates each WITH clause once, stores results in a temporary table, and references this table as often as the query requires.
In both cases, WITH clauses can help simplify complicated queries and avoid statement repetition.
Syntax
WITH [ /*+ENABLE_WITH_CLAUSE_MATERIALIZATION */ ] [ RECURSIVE ] {
cte-identifier [ ( column-aliases ) ] AS (
[ subordinate-WITH-clause ]
query-expression )
} [,...]
Arguments
/*+ENABLE_WITH_CLAUSE_MATERIALIZATION*/
- Enables materialization of all queries in the current WITH clause. Otherwise, materialization is set by configuration parameter WithClauseMaterialization, by default set to 0 (disabled). If WithClauseMaterialization is disabled, materialization is automatically cleared when the primary query of the WITH clause returns. For details, see Materialization of WITH clause.
RECURSIVE
- Specifies to iterate over the WITH clause's own result set, through repeated execution of an embedded UNION or UNION ALL statement. For details, see WITH clause recursion.
cte-identifier
- Identifies a common table expression (CTE) within a WITH clause. This identifier is available to CTEs of the same WITH clause, and of parent and child WITH clauses (if any). CTE identifiers of the outermost (primary) WITH clause are also available to the primary query.
All CTE identifiers of the same WITH clause must be unique. For example, the following WITH clause defines two CTEs, so they require unique identifiers: regional_sales
and top_regions
:
WITH
-- query sale amounts for each region
regional_sales AS (SELECT ... ),
top_regions AS ( SELECT ... )
)
column-aliases
- A comma-delimited list of result set column aliases. The list of aliases must map to all column expressions in the CTE query. If omitted, result set columns can only be referenced by the names used in the query.
In the following example, the revenue
CTE specifies two column aliases: vkey
and total_revenue
. These map to column vendor_key
and aggregate expression SUM(total_order_cost)
, respectively. The primary query references these aliases:
WITH revenue ( vkey, total_revenue ) AS (
SELECT vendor_key, SUM(total_order_cost)
FROM store.store_orders_fact
GROUP BY vendor_key ORDER BY vendor_key)
SELECT v.vendor_name, v.vendor_address, v.vendor_city, r.total_revenue
FROM vendor_dimension v JOIN revenue r ON v.vendor_key = r.vkey
WHERE r.total_revenue = (SELECT MAX(total_revenue) FROM revenue )
ORDER BY vendor_name;
subordinate-WITH-clause
- A WITH clause that is nested within the current one. CTEs of this WITH clause can only reference CTEs of the same clause, and of parent and child WITH clauses.
Important
The primary query can only reference CTEs in the outermost WITH clause. It cannot reference the CTEs of any nested WITH clause.
query-expression
- The query of a given CTE.
Restrictions
WITH clauses only support SELECT and INSERT statements. They do not support UPDATE or DELETE statements.
Examples
Single WITH clause with single CTE
The following SQL defines a WITH clause with one CTE, revenue
, which aggregates data in table store.store_orders_fact
. The primary query references the WITH clause result set twice: in its JOIN
clause and predicate:
-- define WITH clause
WITH revenue ( vkey, total_revenue ) AS (
SELECT vendor_key, SUM(total_order_cost)
FROM store.store_orders_fact
GROUP BY vendor_key ORDER BY 1)
-- End WITH clause
-- primary query
SELECT v.vendor_name, v.vendor_address, v.vendor_city, r.total_revenue
FROM vendor_dimension v JOIN revenue r ON v.vendor_key = r.vkey
WHERE r.total_revenue = (SELECT MAX(total_revenue) FROM revenue )
ORDER BY vendor_name;
vendor_name | vendor_address | vendor_city | total_revenue
------------------+----------------+-------------+---------------
Frozen Suppliers | 471 Mission St | Peoria | 49877044
(1 row)
Single WITH clause and multiple CTEs
In the following example, the WITH clause contains two CTEs:
The primary query aggregates sales by region and departments in the top_regions
result set:
WITH
-- query sale amounts for each region
regional_sales (region, total_sales) AS (
SELECT sd.store_region, SUM(of.total_order_cost) AS total_sales
FROM store.store_dimension sd JOIN store.store_orders_fact of ON sd.store_key = of.store_key
GROUP BY store_region ),
-- query previous result set
top_regions AS (
SELECT region, total_sales
FROM regional_sales ORDER BY total_sales DESC LIMIT 3
)
-- primary query
-- aggregate sales in top_regions result set
SELECT sd.store_region AS region, pd.department_description AS department, SUM(of.total_order_cost) AS product_sales
FROM store.store_orders_fact of
JOIN store.store_dimension sd ON sd.store_key = of.store_key
JOIN public.product_dimension pd ON of.product_key = pd.product_key
WHERE sd.store_region IN (SELECT region FROM top_regions)
GROUP BY ROLLUP (region, department) ORDER BY region, product_sales DESC, GROUPING_ID();
region | department | product_sales
---------+----------------------------------+---------------
East | | 1716917786
East | Meat | 189837962
East | Produce | 170607880
East | Photography | 162271618
East | Frozen Goods | 141077867
East | Gifts | 137604397
East | Bakery | 136497842
East | Liquor | 130410463
East | Canned Goods | 128683257
East | Cleaning supplies | 118996326
East | Dairy | 118866901
East | Seafood | 109986665
East | Medical | 100404891
East | Pharmacy | 71671717
MidWest | | 1287550770
MidWest | Meat | 141446607
MidWest | Produce | 125156100
MidWest | Photography | 122666753
MidWest | Frozen Goods | 105893534
MidWest | Gifts | 103088595
MidWest | Bakery | 102844467
MidWest | Canned Goods | 97647270
MidWest | Liquor | 97306898
MidWest | Cleaning supplies | 90775242
MidWest | Dairy | 89065443
MidWest | Seafood | 82541528
MidWest | Medical | 76674814
MidWest | Pharmacy | 52443519
West | | 2159765937
West | Meat | 235841506
West | Produce | 215277204
West | Photography | 205949467
West | Frozen Goods | 178311593
West | Bakery | 172824555
West | Gifts | 172134780
West | Liquor | 164798022
West | Canned Goods | 163330813
West | Cleaning supplies | 148776443
West | Dairy | 145244575
West | Seafood | 139464407
West | Medical | 126184049
West | Pharmacy | 91628523
| | 5164234493
(43 rows)
INSERT statement that includes WITH clause
The following SQL uses a WITH clause to insert data from a JOIN query into table total_store_sales
:
CREATE TABLE total_store_sales (store_key int, region VARCHAR(20), store_sales numeric (12,2));
INSERT INTO total_store_sales
WITH store_sales AS (
SELECT sd.store_key, sd.store_region::VARCHAR(20), SUM (of.total_order_cost)
FROM store.store_dimension sd JOIN store.store_orders_fact of ON sd.store_key = of.store_key
GROUP BY sd.store_region, sd.store_key ORDER BY sd.store_region, sd.store_key)
SELECT * FROM store_sales;
=> SELECT * FROM total_store_sales ORDER BY region, store_key;
store_key | region | store_sales
-----------+-----------+-------------
2 | East | 47668303.00
6 | East | 48136354.00
12 | East | 46673113.00
22 | East | 48711211.00
24 | East | 48603836.00
31 | East | 46836469.00
36 | East | 48461449.00
37 | East | 48018279.00
41 | East | 48713084.00
44 | East | 47808362.00
49 | East | 46990023.00
50 | East | 47643329.00
9 | MidWest | 46851087.00
15 | MidWest | 48787354.00
27 | MidWest | 48497620.00
29 | MidWest | 47639234.00
30 | MidWest | 49013483.00
38 | MidWest | 48856012.00
42 | MidWest | 47297912.00
45 | MidWest | 48544521.00
46 | MidWest | 48887255.00
4 | NorthWest | 47580215.00
39 | NorthWest | 47136892.00
47 | NorthWest | 48477574.00
8 | South | 48131455.00
13 | South | 47605422.00
17 | South | 46054367.00
...
(50 rows)
1 - Inline expansion of WITH clause
By default, Vertica uses inline expansion to evaluate WITH clauses.
By default, Vertica uses inline expansion to evaluate WITH clauses. Vertica evaluates each WITH clause every time it is referenced by the primary query. Inline expansion often works best if the query does not reference the same WITH clause multiple times, or if some local optimizations are possible after inline expansion.
Example
The following example shows a WITH clause that is a good candidate for inline expansion. The WITH clause is used in a query that obtains order information for all 2007 orders shipped between December 01-07:
-- Begin WITH
WITH store_orders_fact_new AS(
SELECT * FROM store.store_orders_fact WHERE date_shipped between '2007-12-01' and '2007-12-07')
-- End WITH
-- Begin primary query
SELECT store_key, product_key, product_version, SUM(quantity_ordered*unit_price) AS total_price
FROM store_orders_fact_new
GROUP BY store_key, product_key, product_version
ORDER BY total_price DESC;
store_key | product_key | product_version | total_price
-----------+-------------+-----------------+-------------
232 | 1855 | 2 | 29008
125 | 8500 | 4 | 28812
139 | 3707 | 2 | 28812
212 | 3203 | 1 | 28000
236 | 8023 | 4 | 27548
123 | 10598 | 2 | 27146
34 | 8888 | 4 | 27100
203 | 2243 | 1 | 27027
117 | 13932 | 2 | 27000
84 | 768 | 1 | 26936
123 | 1038 | 1 | 26885
106 | 18932 | 1 | 26864
93 | 10395 | 3 | 26790
162 | 13073 | 1 | 26754
15 | 3679 | 1 | 26675
52 | 5957 | 5 | 26656
190 | 8114 | 3 | 26611
5 | 7772 | 1 | 26588
139 | 6953 | 3 | 26572
202 | 14735 | 1 | 26404
133 | 2740 | 1 | 26312
198 | 8545 | 3 | 26287
221 | 7582 | 2 | 26280
127 | 9468 | 3 | 26224
63 | 8115 | 4 | 25960
171 | 2088 | 1 | 25650
250 | 11210 | 3 | 25608
...
Vertica processes the query as follows:
-
Expands the WITH clause reference to store_orders_fact_new
within the primary query.
-
After expanding the WITH clause, evaluates the primary query.
2 - Materialization of WITH clause
When materialization is enabled, Vertica evaluates each WITH clause once, stores results in a temporary table, and references this table as often as the query requires.
When materialization is enabled, Vertica evaluates each WITH clause once, stores results in a temporary table, and references this table as often as the query requires. Vertica drops the temporary table after primary query execution completes.
Note
If the primary query returns with an error, temporary tables might be dropped only after the client’s session ends.
Materialization can facilitate better performance when WITH clauses are complex—for example, when the WITH clauses contain JOIN and GROUP BY clauses, and are referenced multiple times in the primary query.
If materialization is enabled, WITH statements perform an auto-commit of the user transaction. This occurs even when using EXPLAIN with the WITH statement.
Enabling WITH clause materialization
WITH materialization is set by configuration parameter WithClauseMaterialization, by default set to 0 (disabled). You can enable and disable materialization by setting WithClauseMaterialization at database and session levels, with ALTER DATABASE and ALTER SESSION, respectively:
-
Database:
=> ALTER DATABASE db-spec SET PARAMETER WithClauseMaterialization={ 0 | 1 };
=> ALTER DATABASE db-spec CLEAR PARAMETER WithClauseMaterialization;
-
Session: Parameter setting remains in effect until you explicitly clear it, or the session ends.
=> ALTER SESSION SET PARAMETER WithClauseMaterialization={ 0 | 1 };
=> ALTER SESSION CLEAR PARAMETER WithClauseMaterialization;
You can also enable WITH materialization for individual queries with the hint ENABLE_WITH_CLAUSE_MATERIALIZATION. Materialization is automatically cleared when the query returns. For example:
=> WITH /*+ENABLE_WITH_CLAUSE_MATERIALIZATION */ revenue AS (
SELECT vendor_key, SUM(total_order_cost) AS total_revenue
FROM store.store_orders_fact
GROUP BY vendor_key ORDER BY 1)
...
Processing WITH clauses using EE5 temp relations
By default, when WITH clause queries are reused, Vertica saves those WITH clause query outputs in EE5 temp relations. However, this option can be changed. EE5 temp relation support is set by configuration parameter EnableWITHTempRelReuseLimit, which can be set in the following ways:
-
0: Disables this feature.
-
1: Force-saves all WITH clause queries into EE5 temp relations, whether or not they are reused.
-
2 (default): Saves only reused WITH clause queries into EE5 temp relations.
-
3 or more: Saves WITH clause queries into EE5 temp relations only when they are used at least this number of times.
EnableWITHTempRelReuseLimit can be set at database and session levels, with ALTER DATABASE and ALTER SESSION, respectively. When WithClauseMaterialization is set to 1, that setting overrides any EnableWITHTempRelReuseLimit settings.
Note that for WITH queries with complex types, temp relations are disabled.
Example
The following example shows a WITH clause that is a good candidate for materialization. The query obtains data for the vendor who has the highest combined order cost for all orders:
-- Enable materialization
=> ALTER SESSION SET PARAMETER WithClauseMaterialization=1;
-- Define WITH clause
=> WITH revenue AS (
SELECT vendor_key, SUM(total_order_cost) AS total_revenue
FROM store.store_orders_fact
GROUP BY vendor_key ORDER BY 1)
-- End WITH clause
-- Primary query
=> SELECT vendor_name, vendor_address, vendor_city, total_revenue
FROM vendor_dimension v, revenue r
WHERE v.vendor_key = r.vendor_key AND total_revenue = (SELECT MAX(total_revenue) FROM revenue )
ORDER BY vendor_name;
vendor_name | vendor_address | vendor_city | total_revenue
------------------+----------------+-------------+---------------
Frozen Suppliers | 471 Mission St | Peoria | 49877044
(1 row)
Vertica processes this query as follows:
-
WITH clause revenue
evaluates its SELECT statement from table store.store_orders_fact
.
-
Results of the revenue
clause are stored in a local temporary table.
-
Whenever the revenue
clause statement is referenced, the results stored in the table are used.
-
The temporary table is dropped when query execution is complete.
3 - WITH clause recursion
A WITH clause that includes the RECURSIVE option iterates over its own output through repeated execution of a UNION or UNION ALL query.
A WITH clause that includes the RECURSIVE option iterates over its own output through repeated execution of a UNION or UNION ALL query. Recursive queries are useful when working with self-referential data—hierarchies such as manager-subordinate relationships, or tree-structured data such as taxonomies.
The configuration parameter WithClauseRecursionLimit—by default set to 8—sets the maximum depth of recursion. You can set this parameter at database and session scopes with ALTER DATABASE and ALTER SESSION, respectively. Recursion continues until it reaches the configured maximum depth, or until the last iteration returns with no data.
You specify a recursive WITH clause as follows:
WITH [ /*+ENABLE_WITH_CLAUSE_MATERIALIZATION*/ ] RECURSIVE
cte-identifier [ ( column-aliases ) ] AS (
non-recursive-term
UNION [ ALL ]
recursive-term
)
Non-recursive and recursive terms are separated by UNION or UNION ALL:
-
The non-recursive-term
query sets its result set in cte-identifier
, which is subject to recursion in recursive-term
.
-
The UNION statement's recursive-term
recursively iterates over its own output. When recursion is complete, the results of all iterations are compiled and set in cte-identifier
.
For example:
=> ALTER SESSION SET PARAMETER WithClauseRecursionLimit=4; -- maximum recursion depth = 4
=> WITH RECURSIVE nums (n) AS (
SELECT 1 -- non-recursive (base) term
UNION ALL
SELECT n+1 FROM nums -- recursive term
)
SELECT n FROM nums; -- primary query
This simple query executes as follows:
-
Executes the WITH RECURSIVE clause:
-
Evaluates the non-recursive term SELECT 1, and places the result set—1—in nums
.
-
Iterates over the UNION ALL query (SELECT n+1) until the number of iterations is greater than the configuration parameter WithClauseRecursionLimit.
-
Combines the results of all UNION queries and sets the result set in nums
, and then exits to the primary query.
-
Executes the primary query SELECT n FROM nums:
n
---
1
2
3
4
5
(5 rows)
In this case , WITH RECURSIVE clause exits after four iterations as per WithClauseRecursionLimit. If you restore WithClauseRecursionLimit to its default value of 8, then the clause exits after eight iterations:
=> ALTER SESSION CLEAR PARAMETER WithClauseRecursionLimit;
=> WITH RECURSIVE nums (n) AS (
SELECT 1
UNION ALL
SELECT n+1 FROM nums
)
SELECT n FROM nums;
n
---
1
2
3
4
5
6
7
8
9
(9 rows)
Important
Be careful to set WithClauseRecursionLimit only as high as needed to traverse the deepest hierarchies. Vertica sets no limit on this parameter; however, a high value can incur considerable overhead that adversely affects performance and exhausts system resources.
If a high recursion count is required, then consider enabling materialization. For details, see WITH RECURSIVE Materialization.
Restrictions
The following restrictions apply:
-
The SELECT list of a non-recursive term cannot include the wildcard *
(asterisk) or the function MATCH_COLUMNS.
-
A recursive term can reference the target CTE only once.
-
Recursive reference cannot appear within an outer join.
-
Recursive reference cannot appear within a subquery.
-
WITH clauses do not support UNION options ORDER BY, LIMIT, and OFFSET.
Examples
A small software company maintains the following data on employees and their managers:
=> SELECT * FROM personnel.employees ORDER BY emp_id;
emp_id | fname | lname | section_id | section_name | section_leader | leader_id
--------+-----------+-----------+------------+---------------------+------------------+-----------
0 | Stephen | Mulligan | 0 | | |
1 | Michael | North | 201 | Development | Zoe Black | 3
2 | Megan | Berry | 202 | QA | Richard Chan | 18
3 | Zoe | Black | 101 | Product Development | Renuka Patil | 24
4 | Tim | James | 203 | IT | Ebuka Udechukwu | 17
5 | Bella | Tucker | 201 | Development | Zoe Black | 3
6 | Alexandra | Climo | 202 | QA | Richard Chan | 18
7 | Leonard | Gray | 203 | IT | Ebuka Udechukwu | 17
8 | Carolyn | Henderson | 201 | Development | Zoe Black | 3
9 | Ryan | Henderson | 201 | Development | Zoe Black | 3
10 | Frank | Tucker | 205 | Sales | Benjamin Glover | 29
11 | Nathan | Ferguson | 102 | Sales Marketing | Eric Redfield | 28
12 | Kevin | Rampling | 101 | Product Development | Renuka Patil | 24
13 | Tuy Kim | Duong | 201 | Development | Zoe Black | 3
14 | Dwipendra | Sing | 204 | Tech Support | Sarah Feldman | 26
15 | Dylan | Wijman | 206 | Documentation | Kevin Rampling | 12
16 | Tamar | Sasson | 207 | Marketing | Nathan Ferguson | 11
17 | Ebuka | Udechukwu | 101 | Product Development | Renuka Patil | 24
18 | Richard | Chan | 101 | Product Development | Renuka Patil | 24
19 | Maria | del Rio | 201 | Development | Zoe Black | 3
20 | Hua | Song | 204 | Tech Support | Sarah Feldman | 26
21 | Carmen | Lopez | 204 | Tech Support | Sarah Feldman | 26
22 | Edgar | Mejia | 206 | Documentation | Kevin Rampling | 12
23 | Riad | Salim | 201 | Development | Zoe Black | 3
24 | Renuka | Patil | 100 | Executive Office | Stephen Mulligan | 0
25 | Rina | Dsouza | 202 | QA | Richard Chan | 18
26 | Sarah | Feldman | 101 | Product Development | Renuka Patil | 24
27 | Max | Mills | 102 | Sales Marketing | Eric Redfield | 28
28 | Eric | Redfield | 100 | Executive Office | Stephen Mulligan | 0
29 | Benjamin | Glover | 102 | Sales Marketing | Eric Redfield | 28
30 | Dominic | King | 205 | Sales | Benjamin Glover | 29
32 | Ryan | Metcalfe | 206 | Documentation | Kevin Rampling | 12
33 | Piers | Paige | 201 | Development | Zoe Black | 3
34 | Nicola | Kelly | 207 | Marketing | Nathan Ferguson | 11
(34 rows)
You can query this data for employee-manager relationships through WITH RECURSIVE. For example, the following query's WITH RECURSIVE clause gets employee-manager relationships for employee Eric Redfield, including all employees who report directly and indirectly to him:
WITH RECURSIVE managers (employeeID, employeeName, sectionID, section, lead, leadID)
AS (SELECT emp_id, fname||' '||lname, section_id, section_name, section_leader, leader_id
FROM personnel.employees WHERE fname||' '||lname = 'Eric Redfield'
UNION
SELECT emp_id, fname||' '||lname AS employee_name, section_id, section_name, section_leader, leader_id FROM personnel.employees e
JOIN managers m ON m.employeeID = e.leader_id)
SELECT employeeID, employeeName, lead AS 'Reports to', section, leadID from managers ORDER BY sectionID, employeeName;
The WITH RECURSIVE clause defines the CTE managers
, and then executes in two phases:
-
The non-recursive term populates managers
with data that it queries from personnel.employees
.
-
The recursive term's UNION query iterates over its own output until, on the fourth cycle, it finds no more data. The results of all iterations are then compiled and set in managers
, and the WITH CLAUSE exits to the primary query.
The primary query returns three levels of data from managers
—one for each recursive iteration:
Similarly, the following query iterates over the same data to get all employee-manager relationships for employee Richard Chan, who is one level lower in the company chain of command:
WITH RECURSIVE managers (employeeID, employeeName, sectionID, section, lead, leadID)
AS (SELECT emp_id, fname||' '||lname, section_id, section_name, section_leader, leader_id
FROM personnel.employees WHERE fname||' '||lname = 'Richard Chan'
UNION
SELECT emp_id, fname||' '||lname AS employee_name, section_id, section_name, section_leader, leader_id FROM personnel.employees e
JOIN managers m ON m.employeeID = e.leader_id)
SELECT employeeID, employeeName, lead AS 'Reports to', section, leadID from managers ORDER BY sectionID, employeeName;
The WITH RECURSIVE clause executes as before, except this time it finds no more data after two iterations and exits. Accordingly, the primary query returns two levels of data from managers
:
WITH RECURSIVE materialization
By default, materialization is disabled. In this case, Vertica rewrites the WITH RECURSIVE query into subqueries, as many as necessary for the required level of recursion.
If recursion is very deep, the high number of query rewrites is liable to incur considerable overhead that adversely affects performance and exhausts system resources. In this case, consider enabling materialization, either with the configuration parameter WithClauseMaterialization, or the hint ENABLE_WITH_CLAUSE_MATERIALIZATION. In either case, intermediate result sets from all recursion levels are written to local temporary tables. When recursion is complete, the intermediate results in all temporary tables are compiled and passed on to the primary query.
Note
If materialization is not possible, you can improve throughput on a resource pool that handles recursive queries by setting its
EXECUTIONPARALLELISM parameter to 1.