SELECT statements can also embed various directives, or hints, that let you control how a given query is handled—for example, join hints such as JTYPE, which enforces the join type (merge or hash join).
Returns all columns in the queried tables that match pattern.
expression[[AS]alias]
An expression that typically resolves to column data from the queried tables—for example, names of columns that are specified in the FROM clause; also:
You can optionally assign a temporary alias to each column expression and reference that alias elsewhere in the SELECT statement—for example, in the query predicate or ORDER BY clause. Vertica uses the alias as the column heading in query output.
FOR UPDATE
Specifies to obtain an X lock on all tables specified in the query, most often used from READ COMMITTED isolation.
FOR UPDATE requires update/delete permissions on the queried tables and cannot be issued from a read-only transaction.
Privileges
Non-superusers:
USAGE on the schema
SELECT on the table or view
Note
As view owner, you can grant other users SELECT privilege on the view only if one of the following is true:
You own the view's base table.
You have SELECT...WITH GRANT OPTION privilege on the view's base table.
Examples
When multiple clients run transactions as in the following example query, deadlocks can occur if FOR UPDATE is not used. Two transactions acquire an S lock, and when both attempt to upgrade to an X lock, they encounter deadlocks:
=> SELECT balance FROM accounts WHERE account_id=3476 FOR UPDATE;
...
=> UPDATE accounts SET balance = balance+10 WHERE account_id=3476;
=> COMMIT;
Combines two or more SELECT queries. EXCEPT returns distinct results of the left-hand query that are not also found in the right-hand query.
Note
MINUS is an alias for EXCEPT.
Syntax
SELECT
EXCEPT except-query[...]
[ ORDER BY { column-name | ordinal-number } [ ASC | DESC ] [,...] ]
[ LIMIT { integer | ALL } ]
[ OFFSETinteger ]
Notes
Use the EXCEPT clause to filter out specific results from a SELECT statement. The EXCEPT query operates on the results of two or more SELECT queries. It returns only those rows in the left-hand query that are not also present in the right-hand query.
Vertica evaluates multiple EXCEPT clauses in the same SELECT query from left to right, unless parentheses indicate otherwise.
You cannot use the ALL keyword with an EXCEPT query.
The results of each SELECT statement must be union compatible. Each statement must return the same number of columns, and the corresponding columns must have compatible data types. For example, you cannot use the EXCEPT clause on a column of type INTEGER and a column of type VARCHAR. If statements do not meet these criteria, Vertica returns an error.
Note
The Data type coercion chart lists the data types that can be cast to other data types. If one data type can be cast to the other, those two data types are compatible.
You can use EXCEPT in FROM, WHERE, and HAVING clauses.
You can order the results of an EXCEPT operation by including an ORDER BY operation in the statement. When you write the ORDER BY list, specify the column names from the leftmost SELECT statement, or specify integers that indicate the position of the columns by which to sort.
The rightmost ORDER BY, LIMIT, or OFFSET clauses in an EXCEPT query do not need to be enclosed in parentheses, because the rightmost query specifies that Vertica perform the operation on the results of the EXCEPT operation. Any ORDER BY, LIMIT, or OFFSET clauses contained in SELECT queries that appear earlier in the EXCEPT query must be enclosed in parentheses.
The following query returns the IDs and last names of employees that exist in Company_A, but not in Company_B:
=> SELECT id, emp_lname FROM Company_A
EXCEPT
SELECT id, emp_lname FROM Company_B;
id | emp_lname
------+-----------
1234 | Stephen
5678 | Alice
(2 rows)
The following query sorts the results of the previous query by employee last name:
=> SELECT id, emp_lname FROM Company_A
EXCEPT
SELECT id, emp_lname FROM Company_B
ORDER BY emp_lname ASC;
id | emp_lname
------+-----------
5678 | Alice
1234 | Stephen
(2 rows)
If you order by the column position, the query returns the same results:
=> SELECT id, emp_lname FROM Company_A
EXCEPT
SELECT id, emp_lname FROM Company_B
ORDER BY 2 ASC;
id | emp_lname
------+-----------
5678 | Alice
1234 | Stephen
(2 rows)
The following query returns the IDs and last names of employees that exist in Company_A, but not in Company_B or Company_C:
=> SELECT id, emp_lname FROM Company_A
EXCEPT
SELECT id, emp_lname FROM Company_B
EXCEPT
SELECT id, emp_lname FROM Company_C;
id | emp_lname
------+-----------
5678 | Alice
(1 row)
The following query shows the results of mismatched data types:
=> SELECT id, emp_lname FROM Company_A
EXCEPT
SELECT emp_lname, id FROM Company_B;
ERROR 3429: For 'EXCEPT', types int and varchar are inconsistent
DETAIL: Columns: id and emp_lname
Using the VMart example database, the following query returns information about all Connecticut-based customers who bought items through stores and whose purchases amounted to more than $500, except for those customers who paid cash:
=> SELECT customer_key, customer_name FROM public.customer_dimension
WHERE customer_key IN (SELECT customer_key FROM store.store_sales_fact
WHERE sales_dollar_amount > 500
EXCEPT
SELECT customer_key FROM store.store_sales_fact
WHERE tender_type = 'Cash')
AND customer_state = 'CT';
customer_key | customer_name
--------------+----------------------
15084 | Doug V. Lampert
21730 | Juanita F. Peterson
24412 | Mary U. Garnett
25840 | Ben Z. Taylor
29940 | Brian B. Dobisz
32225 | Ruth T. McNulty
33127 | Darlene Y. Rodriguez
40000 | Steve L. Lewis
44383 | Amy G. Jones
46495 | Kevin H. Taylor
(10 rows)
Specifies to return a random sampling of records, where percent specifies the approximate sampling size. The percent value must be between 0 and 100, exclusive, and can include decimal values. The number of records returned is not guaranteed to be the exact percentage specified.
All rows of the data have equal opportunities to be selected. Vertica performs sampling before applying other query filters.
Examples
Count all records in customer_dimension table:
=> SELECT COUNT(*) FROM customer_dimension;
COUNT
-------
50000
(1 row)
Return a small sampling of rows in table customer_dimension:
=> SELECT customer_name, customer_state FROM customer_dimension TABLESAMPLE(0.5) WHERE customer_state='IL';
customer_name | customer_state
---------------------+----------------
Amy Y. McNulty | IL
Daniel C. Nguyen | IL
Midori O. Greenwood | IL
Meghan U. Lampert | IL
Tiffany Y. Lang | IL
Laura S. King | IL
Steve T. Nguyen | IL
Craig S. Webber | IL
Luigi A. Lewis | IL
Mark W. Williams | IL
(10 rows)
Use simple random sampling to return an approximate percentage of records. The percentage value must be greater than 0 and less than 100. All rows in the total potential return set are equally eligible to be included in the sampling. Vertica performs this sampling before other filters in the query are applied. The number of records returned is not guaranteed to be exactly percent.
The TABLESAMPLE option is valid only with user-defined tables and Data Collector (DC) tables. Views and system tables are not supported.
ON join-predicate
Specifies the columns to join on.
Invalid for NATURAL and CROSS joins, required for all other join types.
Alternative JOIN syntax options
Vertica supports two older join syntax conventions:
Table joins specified by join predicate in a WHERE clause
The following SELECT statement qualifies its JOIN clause with the TABLESAMPLE option:
=> SELECT user_id.id, user_name.name FROM user_name TABLESAMPLE(50)
JOIN user_id TABLESAMPLE(50) ON user_name.id = user_id.id;
id | name
------+--------
489 | Markus
2234 | Cato
763 | Pompey
(3 rows)
2.2 - Table-reference
A temporary name used for references to table.
Syntax
[[database.]schema.]table[ [AS] alias]
Parameters
[database.]schema
Database and schema. The default schema is public. If you specify a database, it must be the current database.
table
A table in the logical schema.
[AS] alias
A temporary name used for references to table.
3 - GROUP BY clause
Use the GROUP BY clause with aggregate functions in a SELECT statement to collect data across multiple records.
Use the GROUP BY clause with aggregate functions in a SELECT statement to collect data across multiple records. Vertica groups the results into one or more sets of rows that match an expression.
The GROUP BY clause without aggregates is similar to using SELECT DISTINCT.
ROLLUP is an extension to the GROUP BY clause. ROLLUP performs subtotal aggregations.
Syntax
GROUP BY [/*+GBYTYPE(algorithm)*/] { expression | aggregate-expression }[,...]
Specifies which algorithm has precedence for implementing this
GROUP BY clause, over the algorithm the Vertica query optimizer might otherwise choose. You can set algorithm to one of the following values:
Any expression, including constants and column references in the tables specified in the FROM clause. For example:
column,... column, (expression)
aggregate-expression
An ordered list of columns, expressions, CUBE, GROUPING SETS, or ROLLUP aggregates.
You can include CUBE and ROLLUP aggregates within a GROUPING SETS aggregate. CUBE and ROLLUP aggregates can result in a large amount of output. In that case, use GROUPING SETS to return only certain results.
You cannot include any aggregates within a CUBE or ROLLUP expression.
You can append multiple GROUPING SETS, CUBE, or ROLLUP aggregates in the same query. For example:
GROUP BY a,b,c,d, ROLLUP(a,b)
GROUP BY a,b,c,d, CUBE((a,b),c,d)
GROUP BY a,b,c,d, CUBE(a,b), ROLLUP (c,d)
GROUP BY ROLLUP(a), CUBE(b), GROUPING SETS(c)
GROUP BY a,b,c,d, GROUPING SETS ((a,d),(b,c),CUBE(a,b))
GROUP BY a,b,c,d, GROUPING SETS ((a,d),(b,c),(a,b),(a),(b),())
Usage considerations
expression cannot include aggregate functions. However, you can use the GROUP BY clause with CUBE, GROUPING SETS, and ROLLUP to return summary values for each group.
When you create a GROUP BY clause, you must include all non-aggregated columns that appear in the SELECT list.
If the GROUP BY clause includes a WHERE clause, Vertica ignores all rows that do not satisfy the WHERE clause.
Examples
This example shows how to use the WHERE clause with GROUP BY. In this case, the example retrieves all employees whose last name begins with S, and ignores all rows that do not meet this criteria. The GROUP BY clause uses the ILIKE function to retrieve only last names beginning with S. The aggregate function SUM computes the total vacation days for each group.
=> SELECT employee_last_name, SUM(vacation_days)
FROM employee_dimension
WHERE employee_last_name ILIKE 'S%'
GROUP BY employee_last_name;
employee_last_name | SUM
--------------------+------
Sanchez | 2892
Smith | 2672
Stein | 2660
(3 rows)
The GROUP BY clause in the following example groups results by vendor region, and vendor region's biggest deal:
=> SELECT vendor_region, MAX(deal_size) AS "Biggest Deal"
FROM vendor_dimension
GROUP BY vendor_region;
vendor_region | Biggest Deal
---------------+--------------
East | 990889
MidWest | 699163
NorthWest | 76101
South | 854136
SouthWest | 609807
West | 964005
(6 rows)
The following query modifies the previous one with a HAVING clause, which specifies to return only groups whose maximum deal size exceeds $900,000:
=> SELECT vendor_region, MAX(deal_size) as "Biggest Deal"
FROM vendor_dimension
GROUP BY vendor_region
HAVING MAX(deal_size) > 900000;
vendor_region | Biggest Deal
---------------+--------------
East | 990889
West | 964005
(2 rows)
You can use the GROUP BY clause with one-dimensional arrays of scalar types. In the following example, grants is an ARRAY[VARCHAR] and grant_values is an ARRAY[INT].
=> SELECT department, grants, SUM(apply_sum(grant_values))
FROM employees
GROUP BY grants, department;
department | grants | SUM
------------+--------------------------+--------
Physics | ["US-7376","DARPA-1567"] | 235000
Astronomy | ["US-7376","DARPA-1567"] | 9000
Physics | ["US-7376"] | 30000
(3 rows)
The GROUP BY clause without aggregates is similar to using SELECT DISTINCT. For example, the following two queries return the same results:
=> SELECT DISTINCT household_id FROM customer_dimension;
=> SELECT household_id FROM customer_dimension GROUP BY household_id;
An expression that is not an aggregate or a grouping function that includes constants and column references in FROM-specified tables. For example:
column1, (column2+1), column3+column4
A multilevel expression, one of the following:
ROLLUP
CUBE
GROUPING SETS
Restrictions
GROUP BY CUBE does not order data. If you want to sort data, use the ORDER BY clause. The ORDER BY clause must come after the GROUP BY clause.
You can use CUBE inside a GROUPING SETS expression, but not inside a ROLLUP expression or another CUBE expression.
Levels of CUBE aggregation
If n is the number of grouping columns, CUBE creates 2n levels of aggregations. For example:
CUBE (A, B, C) creates all possible groupings, resulting in eight groups:
(A, B, C)
(A, B)
(A, C)
(B, C)
(A)
(B)
(C)
()
If you increase the number of CUBE columns, the number of CUBE groupings increases exponentially. The CUBE query may be resource intensive and produce combinations that are not of interest. In that case, consider using theGROUPING SETS aggregate, which allows you to choose specific groupings.
Examples
Using CUBE to return all groupings
Suppose you have a table that contains information about family expenses for books and electricity:
The results include subtotals for each category and year, and a grand total ($549.92).
Using CUBE with the HAVING clause
This example shows how you can restrict the GROUP BY results, use the HAVING clause with the CUBE aggregate. This query returns only the category totals and the full total:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY CUBE(Category,Year) HAVING GROUPING(Year)=1;
Category | Year | SUM
-------------+------+--------
Books | | 99.96
Electricity | | 449.96
| | 549.92
The next query returns only the aggregations for the two categories for each year. The GROUPING ID function specifies to omit the grand total ($549.92):
=> SELECT Category, Year, SUM (Amount) FROM expenses
GROUP BY CUBE(Category,Year) HAVING GROUPING_ID(Category,Year)<2
ORDER BY 1, 2, GROUPING_ID();
Category | Year | SUM
------------+------+--------
Books | 2005 | 39.98
Books | 2007 | 29.99
Books | 2008 | 29.99
Books | | 99.96
Electrical | 2005 | 109.99
Electrical | 2006 | 109.99
Electrical | 2007 | 229.98
Electrical | | 449.96
To sort data, use the
ORDER BY clause. The ORDER BY clause must follow the GROUP BY clause.
Syntax
GROUP BY group-expression[,...]
Parameters
group-expression``
One or both of the following:
An expression that is not an aggregate or a grouping function that includes constants and column references in FROM-specified tables. For example:
column1, (column2+1), column3+column4
A multilevel expression, one of the following:
ROLLUP
CUBE
GROUPING SETS
Defining the groupings
GROUPING SETS allows you to specify exactly which groupings you want in the results. You can also concatenate the groupings as follows:
The following example clauses result in the groupings shown.
This clause...
Defines groupings...
...GROUP BY GROUPING SETS(A,B,C,D)...
(A), (B), (C), (D)
...GROUP BY GROUPING SETS((A),(B),(C),(D))...
(A), (B), (C), (D)
...GROUP BY GROUPING SETS((A,B,C,D))...
(A, B, C, D)
...GROUP BY GROUPING SETS(A,B),GROUPING SETS(C,D)...
(A, C), (B, C), (A, D), (B, C)
...GROUP BY GROUPING SETS((A,B)),GROUPING SETS(C,D)...
(A, B, C), (A, B, D)
...GROUP BY GROUPING SETS(A,B),GROUPING SETS(ROLLUP(C,D))...
(A,B), (A,B,C), (A,B,C,D)
...GROUP BY A,B,C,GROUPING SETS(ROLLUP(C, D))...
(A, B, C, D), (A, B, C), (A, B, C)
The clause contains two groups (A, B, C). In the HAVING clause, use the GROUP_ID function as a predicate, to eliminate the second grouping.
Example: selecting groupings
This example shows how to select only those groupings you want. Suppose you want to aggregate on columns only, and you do not need the grand total. The first query omits the total. In the second query, you add () to the GROUPING SETS list to get the total. Use the ORDER BY clause to sort the results by grouping:
Automatically performs subtotal aggregations as an extension to the GROUP BY clause.
Automatically performs subtotal aggregations as an extension to the GROUP BY clause. ROLLUP performs these aggregations across multiple dimensions, at different levels, within a single SQL query.
You can use the ROLLUP clause with three grouping functions:
An expression that is not an aggregate or a grouping function that includes constants and column references in FROM-specified tables. For example:
column1, (column2+1), column3+column4
A multilevel expression, one of the following:
ROLLUP
CUBE
GROUPING SETS
Restrictions
GROUP BY ROLLUP does not sort results. To sort data, an ORDER BY clause must follow the GROUP BY clause.
Levels of aggregation
If n is the number of grouping columns, ROLLUP creates n+1 levels of subtotals and grand total. Because ROLLUP removes the right-most column at each step, specify column order carefully.
Suppose that ROLLUP(A, B, C) creates four groups:
(A, B, C)
(A, B)
(A)
()
Because ROLLUP removes the right-most column at each step, there are no groups for (A, C) and (B, C).
If you enclose two or more columns in parentheses, GROUP BY treats them as a single entity. For example:
ROLLUP(A, B, C) creates four groups:
(A, B, C)
(A, B)
(A)
()
ROLLUP((A, B), C) treats (A, B) as a single entity and creates three groups:
(A, B, C)
(A, B)
()
Example: aggregating the full data set
The following example shows how to use the GROUP BY clause to determine family expenses for electricity and books over several years. The SUM aggregate function computes the total amount of money spent in each category per year.
Suppose you have a table that contains information about family expenses for books and electricity:
For the expenses table, ROLLUP computes the subtotals in each category between 2005–2007:
Books: $99.96
Electricity: $449.96
Grand total: $549.92.
Use the ORDER BY clause to sort the results:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY ROLLUP(Category, Year) ORDER BY 1,2, GROUPING_ID();
Category | Year | SUM
-------------+------+--------
Books | 2005 | 39.98
Books | 2007 | 29.99
Books | 2008 | 29.99
Books | | 99.96
Electricity | 2005 | 109.99
Electricity | 2006 | 109.99
Electricity | 2007 | 229.98
Electricity | | 449.96
| | 549.92
Example: using ROLLUP with the HAVING clause
This example shows how to use the HAVING clause with ROLLUP to restrict the GROUP BY results. The following query produces only those ROLLUP categories where year is subtotaled, based on the expression in the GROUPING function:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY ROLLUP(Category,Year) HAVING GROUPING(Year)=1
ORDER BY 1, 2, GROUPING_ID();
Category | Year | SUM
-------------+------+--------
Books | | 99.96
Electricity | | 449.96
| | 549.92
The next example rolls up on (Category, Year), but not on the full results. The GROUPING_ID function specifies to aggregate less than three levels:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY ROLLUP(Category,Year) HAVING GROUPING_ID(Category,Year)<3
ORDER BY 1, 2, GROUPING_ID();
Category | Year | SUM
-------------+------+--------
Books | 2005 | 39.98
Books | 2007 | 29.99
Books | 2008 | 29.99
Books | | 99.96
Electricity | 2005 | 109.99
Electricity | 2006 | 109.99
Electricity | 2007 | 229.98
Electricity | | 449.96
Filters the results of a GROUP BY clause. Semantically, the HAVING clause occurs after the GROUP BY operation. It was added to the SQL standard because a WHERE clause cannot specify aggregate functions.
Syntax
HAVING condition[,...]
Parameters
condition
Unambiguously references a grouping column, unless the reference appears in an aggregate function.
Examples
The following example returns the employees with salaries greater than $800,000:
=> SELECT employee_last_name, MAX(annual_salary) as highest_salary FROM employee_dimension
GROUP BY employee_last_name HAVING MAX(annual_salary) > 800000 ORDER BY highest_salary DESC;
employee_last_name | highest_salary
--------------------+----------------
Sanchez | 992363
Vogel | 983634
Vu | 977716
Lewis | 957949
Taylor | 953373
King | 937765
Gauthier | 927335
Garnett | 903104
Bauer | 901181
Jones | 885395
Rodriguez | 861647
Young | 846657
Greenwood | 837543
Overstreet | 831317
Garcia | 811231
(15 rows)
5 - INTERSECT clause
Calculates the intersection of the results of two or more SELECT queries.
Calculates the intersection of the results of two or more SELECT queries. INTERSECT returns distinct values by both the query on the left and right sides of the INTERSECT operand.
Use the INTERSECT clause to return all elements that are common to the results of all the SELECT queries. The INTERSECT query operates on the results of two or more SELECT queries. INTERSECT returns only the rows that are returned by all the specified queries.
You cannot use the ALL keyword with an INTERSECT query.
The results of each SELECT query must be union compatible; they must return the same number of columns, and the corresponding columns must have compatible data types. For example, you cannot use the INTERSECT clause on a column of type INTEGER and a column of type VARCHAR. If the SELECT queries do not meet these criteria, Vertica returns an error.
Note
The Data type coercion chart lists the data types that can be cast to other data types. If one data type can be cast to the other, those two data types are compatible.
Order the results of an INTERSECT operation by using an ORDER BY clause. In the ORDER BY list, specify the column names from the leftmost SELECT statement or specify integers that indicate the position of the columns by which to sort.
You can use INTERSECT in FROM, WHERE, and HAVING clauses.
The rightmost ORDER BY, LIMIT, or OFFSET clauses in an INTERSECT query do not need to be enclosed in parentheses because the rightmost query specifies that Vertica perform the operation on the results of the INTERSECT operation. Any ORDER BY, LIMIT, or OFFSET clauses contained in SELECT queries that appear earlier in the INTERSECT query must be enclosed in parentheses.
The order by column names is from the first select.
Vertica supports INTERSECT noncorrelated subquery predicates. For example:
=> SELECT * FROM T1
WHERE T1.x IN
(SELECT MAX(c1) FROM T2
INTERSECT
SELECT MAX(cc1) FROM T3
INTERSECT
SELECT MAX(d1) FROM T4);
Examples
Consider the following three tables:
Company_A
id emp_lname dept sales
------+------------+----------------+-------
1234 | Stephen | auto parts | 1000
5678 | Alice | auto parts | 2500
9012 | Katherine | floral | 500
3214 | Smithson | sporting goods | 1500
Company_B
id emp_lname dept sales
------+------------+-------------+-------
4321 | Marvin | home goods | 250
9012 | Katherine | home goods | 500
8765 | Bob | electronics | 20000
3214 | Smithson | home goods | 1500
The following query returns the IDs and last names of employees that exist in both Company_A and Company_B:
=> SELECT id, emp_lname FROM Company_A
INTERSECT
SELECT id, emp_lname FROM Company_B;
id | emp_lname
------+-----------
3214 | Smithson
9012 | Katherine
(2 rows)
The following query returns the same two employees in descending order of sales:
=> SELECT id, emp_lname, sales FROM Company_A
INTERSECT
SELECT id, emp_lname, sales FROM Company_B
ORDER BY sales DESC;
id | emp_lname | sales
------+-----------+-------
3214 | Smithson | 1500
9012 | Katherine | 500
(2 rows)
The following query returns the employee who works for both companies whose sales in Company_B are greater than 1000:
=> SELECT id, emp_lname, sales FROM Company_A
INTERSECT
(SELECT id, emp_lname, sales FROM company_B WHERE sales > 1000)
ORDER BY sales DESC;
id | emp_lname | sales
------+-----------+-------
3214 | Smithson | 1500
(1 row)
In the following query returns the ID and last name of the employee who works for all three companies:
=> SELECT id, emp_lname FROM Company_A
INTERSECT
SELECT id, emp_lname FROM Company_B
INTERSECT
SELECT id, emp_lname FROM Company_C;
id | emp_lname
------+-----------
3214 | Smithson
(1 row)
The following query shows the results of a mismatched data types; these two queries are not union compatible:
=> SELECT id, emp_lname FROM Company_A
INTERSECT
SELECT emp_lname, id FROM Company_B;
ERROR 3429: For 'INTERSECT', types int and varchar are inconsistent
DETAIL: Columns: id and emp_lname
Using the VMart example database, the following query returns information about all Connecticut-based customers who bought items online and whose purchase amounts were between $400 and $500:
=> SELECT customer_key, customer_name from public.customer_dimension
WHERE customer_key IN (SELECT customer_key
FROM online_sales.online_sales_fact
WHERE sales_dollar_amount > 400
INTERSECT
SELECT customer_key FROM online_sales.online_sales_fact
WHERE sales_dollar_amount > 500)
AND customer_state = 'CT' ORDER BY customer_key;
customer_key | customer_name
--------------+------------------------
39 | Sarah S. Winkler
44 | Meghan H. Overstreet
70 | Jack X. Cleveland
103 | Alexandra I. Vu
110 | Matt . Farmer
173 | Mary R. Reyes
188 | Steve G. Williams
233 | Theodore V. McNulty
250 | Marcus E. Williams
294 | Samantha V. Young
313 | Meghan P. Pavlov
375 | Sally N. Vu
384 | Emily R. Smith
387 | Emily L. Garcia
...
The previous query and the next one are equivalent, and return the same results:
=> SELECT customer_key,customer_name FROM public.customer_dimension
WHERE customer_key IN (SELECT customer_key
FROM online_sales.online_sales_fact
WHERE sales_dollar_amount > 400
AND sales_dollar_amount < 500)
AND customer_state = 'CT' ORDER BY customer_key;
Name of the database or namespace that contains table:
Database name: If specified, it must be the current database.
Namespace name (Eon Mode only): You must specify the namespace of objects in non-default namespaces. If no namespace is provided, Vertica assumes the object is in the default namespace.
schema
Name of the schema, by default public. If you specify the namespace or database name, you must provide the schema name, even if the schema is public.
If you specify a schema name that contains a period, the part before the period cannot be the same as the name of an existing namespace.
table
The name of the table to create.
ON COMMIT { DELETE | PRESERVE } ROWS
Specifies whether data is transaction- or session-scoped:
DELETE (default) marks the temporary table for transaction-scoped data. Vertica removes all table data after each commit.
PRESERVE marks the temporary table for session-scoped data, which is preserved beyond the lifetime of a single transaction. Vertica removes all table data when the session ends.
Examples
The following SELECT statement has an INTO TABLE clause that creates table newTable from customer_dimension:
=> SELECT * INTO TABLE newTable FROM customer_dimension;
The following SELECT statement creates temporary table newTempTable. By default, temporary tables are created at a global scope, so its definition is visible to other sessions and persists until it is explicitly dropped. No customer_dimension data is copied into the new table, and Vertica issues a warning accordingly:
=> SELECT * INTO TEMP TABLE newTempTable FROM customer_dimension;
WARNING 4102: No rows are inserted into table "public"."newTempTable" because
ON COMMIT DELETE ROWS is the default for create temporary table
HINT: Use "ON COMMIT PRESERVE ROWS" to preserve the data in temporary table
CREATE TABLE
The following SELECT statement creates local temporary table newTempTableLocal. This table is visible only to the session in which it was created, and is automatically dropped when the session ends. The INTO TABLE clause includes ON COMMIT PRESERVE ROWS, so Vertica copies all selection data into the new table:
=> SELECT * INTO LOCAL TEMP TABLE newTempTableLocal ON COMMIT PRESERVE ROWS
FROM customer_dimension;
CREATE TABLE
7 - LIMIT clause
Specifies the maximum number of result set rows to return, either from the entire result set, or from windows of a partitioned result set.
Specifies the maximum number of result set rows to return, either from the entire result set, or from windows of a partitioned result set.
LIMIT { num-rows OVER ( PARTITION BYpartition-expr[,...] ORDER BYorder-expr[,...] ) }
| ALL }
Arguments
num-rows
The maximum number of rows to return.
OVER()
Specifies how to partition and sort input data with respect to the current row. The input data is the result set that the query returns after it evaluates FROM, WHERE, GROUP BY, and HAVING clauses.
Column or expression returning a column to partition results by.
order-expr[ ASC | DESC ] [ NULLS { FIRST | LAST } ]
Column or expression returning the column to order results by. Each column can be sorted in ascending (default) or descending order, with nulls first or last.
ALL
Returns all rows, valid only when LIMIT is applied to the entire result set.
Limiting returned rows
LIMIT returns the specified number of rows from the queried dataset. Row precedence is determined by the query's ORDER BY clause. If you do not use an ORDER BY clause, the query returns an undefined subset of the result set.
When a SELECT statement specifies both LIMIT and OFFSET, Vertica first processes the OFFSET, and then applies LIMIT to the remaining rows.
The following query returns the first 10 rows of data, as ordered first by region and then by number of employees:
You can use LIMIT to apply window partitioning on query results and limit the number of rows that are returned in each window.
For example, the following query specifies window partitioning on the result set. LIMIT is set to 2, so each window partition can return no more than two rows. The OVER clause partitions the result set by store_region, so in each region the query returns the two stores with the smallest number of employees:
A SQL extension that lets you screen large amounts of historical data in search of event patterns, the MATCH clause provides subclasses for analytic partitioning and ordering and matches rows from the result table based on a pattern you define.
A SQL extension that lets you screen large amounts of historical data in search of event patterns, the MATCH clause provides subclasses for analytic partitioning and ordering and matches rows from the result table based on a pattern you define.
You specify a pattern as a regular expression, which is composed of event types defined in the DEFINE subclause, where each event corresponds to a row in the input table. Then you can search for the pattern within a sequence of input events. Pattern matching returns the contiguous sequence of rows that conforms to PATTERN subclause. For example, pattern P (A B* C) consist of three event types: A, B, and C. When Vertica finds a match in the input table, the associated pattern instance must be an event of type A followed by 0 or more events of type B, and an event of type C.
Pattern matching is particularly useful for clickstream analysis where you might want to identify users' actions based on their Web browsing behavior (page clicks). For details, see Event series pattern matching.
Syntax
MATCH ( [ PARTITION BY table-column ] ORDER BY table-column
DEFINE event-name AS boolean-expr [,...]
PATTERN pattern-name AS ( regexp)
[ rows-match-clause ] )
Arguments
PARTITION BY
Defines the window data scope in which the pattern, defined in the PATTERN subclause, is matched. The partition clause partitions the data by matched patterns defined in the PATTERN subclause. For each partition, data is sorted by the ORDER BY clause. If the partition clause is omitted, the entire data set is considered a single partition.
ORDER BY
Defines the window data scope in which the pattern, defined in the PATTERN subclause, is matched. For each partition, the order clause specifies how the input data is ordered for pattern matching.
Note
The ORDER BY clause is mandatory.
DEFINE
Defines the boolean expressions that make up the event types in the regular expressions. For example:
DEFINE
Entry AS RefURL NOT ILIKE '%website2.com%' AND PageURL ILIKE
'%website2.com%',
Onsite AS PageURL ILIKE '%website2.com%' AND Action='V',
Purchase AS PageURL ILIKE '%website2.com%' AND Action='P'
Name of the event to evaluate for each row—in the earlier example, Entry, Onsite, Purchase.
Note
Event names are case insensitive and follow the same naming conventions as those used for tables and columns.
boolean-expr
Expression that returns true or false. boolean_expr can include Logical operators and relational (comparison) operators. For example:
Purchase AS PageURL ILIKE '%website2.com%' AND Action = 'P'
PATTERN pattern-name
Name of the pattern defined in the PATTERN subclause; for example, P is the pattern name defined below:
PATTERN P AS (...)
A PATTERN is a search pattern that is comprised of a name and a regular expression.
Note
Vertica supports one pattern per query.
regexp
A regular expression comprised of event types defined in the DEFINE subclause and one or more quantifiers below. When Vertica evaluates the MATCH clause, the regular expression identifies the rows that meet the expression criteria.
*: Match 0 or more times
*?: Match 0 or more times, not greedily
+: Match 1 or more times
+?: Match 1 or more times, not greedily
?: Match 0 or 1 time
??: Match 0 or 1 time, not greedily
*+: Match 0 or more times, possessive
++: Match 1 or more times, possessive
?+: Match 0 or 1 time, possessive
|: Alternation. Matches expression before or after the vertical bar. Similar to a Boolean OR.
rows-match-clause
Specifies how to resolve more than one event evaluating to true for a single row, one of the following:
ROWS MATCH ALL EVENTS: If more than one event evaluates to true for a single row, Vertica returns this error :
ERROR: pattern events must be mutually exclusive
HINT: try using ROWS MATCH FIRST EVENT
ROWS MATCH FIRST EVENT: If more than one event evaluates to true for a given row, Vertica uses the first event in the SQL statement for that row.
Pattern semantic evaluation
The semantic evaluating ordering of the SQL clauses is: FROM -> WHERE -> PATTERN MATCH -> SELECT.
Data is partitioned as specified in the PARTITION BY clause. If the partition clause is omitted, the entire data set is considered a single partition.
For each partition, the order clause specifies how the input data is ordered for pattern matching.
Events are evaluated for each row. A row could have 0, 1, or N events evaluate to true. If more than one event evaluates to true for the same row, Vertica returns a run-time error unless you specify ROWS MATCH FIRST EVENT. If you specify ROWS MATCH FIRST EVENT and more than one event evaluates to TRUE for a single row, Vertica chooses the event that was defined first in the SQL statement to be the event it uses for the row.
Vertica performs pattern matching by finding the contiguous sequence of rows that conforms to the pattern defined in the PATTERN subclause.
For each match, Vertica outputs the rows that contribute to the match. Rows not part of the match (do not satisfy one or more predicates) are not output.
Vertica reports only non-overlapping matches. If an overlap occurs, Vertica chooses the first match found in the input stream. After finding the match, Vertica looks for the next match, starting at the end of the previous match.
Vertica reports the longest possible match, not a subset of a match. For example, consider pattern: AB with input: AAAB. Because A uses the greedy regular expression quantifier (), Vertica reports all A inputs (AAAB), not AAB, AB, or B.
Notes and restrictions
DISTINCT and GROUP BY/HAVING clauses are not allowed in pattern match queries.
The following expressions are not allowed in the DEFINE subclause:
Subqueries, such as DEFINE X AS c IN ELECT c FROM table
Analytic functions, such as DEFINE X AS c <LEA1) OVER (ORDER BY 1)
Aggregate functions, such as DEFINE X AS c <MA1)
You cannot use the same pattern name to define a different event; for example, the following is not allowed for X:
DEFINE X AS c1 < 3
X AS c1 >= 3
Used with MATCH clause, Vertica MATCH clause functions provide additional data about the patterns it finds. For example, you can use the functions to return values representing the name of the event that matched the input row, the sequential number of the match, or a partition-wide unique identifier for the instance of the pattern that matched.
The SQL MATCH clause syntax lets you screen large amounts of historical data in search of event patterns.
The SQL MATCH clause syntax lets you screen large amounts of historical data in search of event patterns. You specify a pattern as a regular expression and can then search for the pattern within a sequence of input events. MATCH provides subclauses for analytic data partitioning and ordering, and the pattern matching occurs on a contiguous set of rows.
Pattern matching is particularly useful for clickstream analysis where you might want to identify users' actions based on their Web browsing behavior (page clicks). A typical online clickstream funnel is:
Company home page -> product home page -> search -> results -> purchase online
Using this clickstream funnel, you can search for a match on the user's sequence of web clicks and identify that user:
Landed on the company home page
Navigated to the product page
Ran a search
Clicked a link from the search results
Made a purchase
Clickstream funnel schema
The examples in this topic use this clickstream funnel and the following clickstream_log table schema:
=> CREATE TABLE clickstream_log (
uid INT, --user ID
sid INT, --browsing session ID, produced by previous sessionization computation
ts TIME, --timestamp that occurred during the user's page visit
refURL VARCHAR(20), --URL of the page referencing PageURL
pageURL VARCHAR(20), --URL of the page being visited
action CHAR(1) --action the user took after visiting the page ('P' = Purchase, 'V' = View)
);
INSERT INTO clickstream_log VALUES (1,100,'12:00','website1.com','website2.com/home', 'V');
INSERT INTO clickstream_log VALUES (1,100,'12:01','website2.com/home','website2.com/floby', 'V');
INSERT INTO clickstream_log VALUES (1,100,'12:02','website2.com/floby','website2.com/shamwow', 'V');
INSERT INTO clickstream_log values (1,100,'12:03','website2.com/shamwow','website2.com/buy', 'P');
INSERT INTO clickstream_log values (2,100,'12:10','website1.com','website2.com/home', 'V');
INSERT INTO clickstream_log values (2,100,'12:11','website2.com/home','website2.com/forks', 'V');
INSERT INTO clickstream_log values (2,100,'12:13','website2.com/forks','website2.com/buy', 'P');
COMMIT;
This example includes the Vertica MATCH clause functions to analyze users' browsing history over website2.com. It identifies patterns where the user performed the following tasks:
Landed on website2.com from another web site (Entry)
Browsed to any number of other pages (Onsite)
Made a purchase (Purchase)
In the following statement, pattern P (Entry Onsite* Purchase) consist of three event types: Entry, Onsite, and Purchase. When Vertica finds a match in the input table, the associated pattern instance must be an event of type Entry followed by 0 or more events of type Onsite, and an event of type Purchase
=> SELECT uid,
sid,
ts,
refurl,
pageurl,
action,
event_name(),
pattern_id(),
match_id()
FROM clickstream_log
MATCH
(PARTITION BY uid, sid ORDER BY ts
DEFINE
Entry AS RefURL NOT ILIKE '%website2.com%' AND PageURL ILIKE '%website2.com%',
Onsite AS PageURL ILIKE '%website2.com%' AND Action='V',
Purchase AS PageURL ILIKE '%website2.com%' AND Action = 'P'
PATTERN
P AS (Entry Onsite* Purchase)
ROWS MATCH FIRST EVENT);
In the output below, the first four rows represent the pattern for user 1's browsing activity, while the following three rows show user 2's browsing habits.
Omits a specified number of rows from the beginning of the result set.
Omits a specified number of rows from the beginning of the result set.
Syntax
OFFSET rows
Parameters
start-row
Specifies the first row to include in the result set. All preceding rows are omitted.
Dependencies
Use an ORDER BY clause with OFFSET. Otherwise, the query returns an undefined subset of the result set.
OFFSET must follow the ORDER BY clause in a SELECT statement or UNION clause.
When a SELECT statement or UNION clause specifies both LIMIT and OFFSET, Vertica first processes the OFFSET statement, and then applies the LIMIT statement to the remaining rows.
Examples
The following query returns 14 rows from the customer_dimension table:
=> SELECT customer_name, customer_gender FROM customer_dimension
WHERE occupation='Dancer' AND customer_city = 'San Francisco' ORDER BY customer_name;
customer_name | customer_gender
----------------------+-----------------
Amy X. Lang | Female
Anna H. Li | Female
Brian O. Weaver | Male
Craig O. Pavlov | Male
Doug Z. Goldberg | Male
Harold S. Jones | Male
Jack E. Perkins | Male
Joseph W. Overstreet | Male
Kevin . Campbell | Male
Raja Y. Wilson | Male
Samantha O. Brown | Female
Steve H. Gauthier | Male
William . Nielson | Male
William Z. Roy | Male
(14 rows)
If you modify the previous query to specify an offset of 8 (OFFSET 8), Vertica skips the first eight rows of the previous result set. The query returns the following results:
=> SELECT customer_name, customer_gender FROM customer_dimension
WHERE occupation='Dancer' AND customer_city = 'San Francisco' ORDER BY customer_name OFFSET 8;
customer_name | customer_gender
-------------------+-----------------
Kevin . Campbell | Male
Raja Y. Wilson | Male
Samantha O. Brown | Female
Steve H. Gauthier | Male
William . Nielson | Male
William Z. Roy | Male
(6 rows)
11 - ORDER BY clause
Sorts a query result set on one or more columns or column expressions.
Sorts a query result set on one or more columns or column expressions. Vertica uses the current locale and collation sequence to compare and sort string values.
Note
Vertica projection data is always stored sorted by the ASCII (binary) collating sequence.
Syntax
ORDER BY expression [ ASC | DESC ] [,...]
Parameters
expression
One of the following:
Name or ordinal number of a SELECT list item. The ordinal number refers to the position of the result column, counting from the left beginning at one. Use them to order by a column whose name is not unique. Ordinal numbers are invalid for an ORDER BY clause of an analytic function's OVER clause.
Arbitrary expression formed from columns that do not appear in the SELECT list
The following example uses a transform function. It returns an error because the ORDER BY column is not in the window partition.
=> CREATE TABLE t(geom geometry(200), geog geography(200));
=> SELECT PolygonPoint(geom) OVER(PARTITION BY geom)
AS SEL_0 FROM t ORDER BY geog;
ERROR 2521: Cannot specify anything other than user defined transforms and partitioning expressions in the ORDER BY list
The following example, using the same table, corrects this error.
=> SELECT PolygonPoint(geom) OVER(PARTITION BY geom)
AS SEL_0 FROM t ORDER BY geom;
The following example uses an array in the ORDER BY clause.
=> SELECT * FROM employees
ORDER BY grant_values;
id | department | grants | grant_values
----+------------+--------------------------+----------------
36 | Astronomy | ["US-7376","DARPA-1567"] | [5000,4000]
36 | Physics | ["US-7376","DARPA-1567"] | [10000,25000]
33 | Physics | ["US-7376"] | [30000]
42 | Physics | ["US-7376","DARPA-1567"] | [65000,135000]
(4 rows)
12 - TIMESERIES clause
Provides gap-filling and interpolation (GFI) computation, an important component of time series analytics computation.
Provides gap-filling and interpolation (GFI) computation, an important component of time series analytics computation. See Time series analytics for details and examples.
Syntax
TIMESERIES slice-time AS 'length-and-time-unit-expr' OVER (
[ PARTITION BY column-expr[,...] ] ORDER BY time-expr ) [ ORDER BY table-column[,...] ]
Parameters
slice-time
A time column produced by the TIMESERIES clause, which stores the time slice start times generated from gap filling.
Note: This parameter is an alias, so you can use any name that an alias would take.
length-and-time-unit-expr
An INTERVAL DAY TO SECOND literal that specifies the length of time unit of time slice computation. For example:
`TIMESERIES slice_time AS '3 seconds' ...
OVER()
Specifies partitioning and ordering for the function. OVER() also specifies that the time series function operates on a query result set—that is, the rows that are returned after the FROM, WHERE, GROUP BY, and HAVING clauses are evaluated.
PARTITION BY (column-expr`[,...] )`
Partitions the data by the specified column expressions. Gap filling and interpolation is performed on each partition separately.
ORDER BY time-expr
Sorts the data by the TIMESTAMP expression time-expr, which computes the time information of the time series data.
Note
The TIMESERIES clause requires an ORDER BY operation on the timestamp column.
Notes
If the window-partition-clause is not specified in TIMESERIES OVER(), for each defined time slice, exactly one output record is produced; otherwise, one output record is produced per partition per time slice. Interpolation is computed there.
Given a query block that contains a TIMESERIES clause, the following are the semantic phases of execution (after evaluating the FROM and the optional WHERE clauses):
Compute time-expression.
Perform the same computation as the TIME_SLICE() function on each input record based on the result of time-exp and 'length-and-time-unit-expr'.
Perform gap filling to generate time slices missing from the input.
Name the result of this computation as slice_time, which represents the generated "time series" column (alias) after gap filling.
Partition the data by expression, slice-time. For each partition, do step 4.
Sort the data by time-expr. Interpolation is computed here.
There is semantic overlap between the TIMESERIES clause and the TIME_SLICE function with the following key differences:
TIMESERIES only supports the interval qualifierDAY TO SECOND; it does not allow YEAR TO MONTH.
Unlike TIME_SLICE, the time slice length and time unit expressed in *length-and-time-unit-expr *must be constants so gaps in the time slices are well-defined.
TIMESERIES performs gap filling; the TIME_SLICE function does not.
TIME_SLICE can return the start or end time of a time slice, depending on the value of its fourth input parameter (start-or-end). TIMESERIES, on the other hand, always returns the start time of each time slice. To output the end time of each time slice, write a SELECT statement like the following:
=> SELECT slice_time + <slice_length>;
Restrictions
When the TIMESERIES clause occurs in a SQL query block, only the following clauses can be used in the same query block:
SELECT
FROM
WHERE
ORDER BY
GROUP BY and HAVING clauses are not allowed. If a GROUP BY operation is needed before or after gap-filling and interpolation (GFI), use a subquery and place the GROUP BY In the outer query. For example:
=> SELECT symbol, AVG(first_bid) as avg_bid FROM (
SELECT symbol, slice_time, TS_FIRST_VALUE(bid1) AS first_bid
FROM Tickstore
WHERE symbol IN ('MSFT', 'IBM')
TIMESERIES slice_time AS '5 seconds' OVER (PARTITION BY symbol ORDER BY ts)
) AS resultOfGFI
GROUP BY symbol;
When the TIMESERIES clause is present in the SQL query block, the SELECT list can include only the following:
For example, the following two queries return a syntax error because bid1 is not a PARTITION BY or GROUP BY column:
=> SELECT bid, symbol, TS_FIRST_VALUE(bid) FROM Tickstore
TIMESERIES slice_time AS '5 seconds' OVER (PARTITION BY symbol ORDER BY ts);
ERROR: column "Tickstore.bid" must appear in the PARTITION BY list of Timeseries clause or be used in a Timeseries Output function
=> SELECT bid, symbol, AVG(bid) FROM Tickstore
GROUP BY symbol;
ERROR: column "Tickstore.bid" must appear in the GROUP BY clause or be used in an aggregate function
ALL concatenates all rows, including duplicates. For best performance, use UNION ALL.
UNION ALL supports columns of complex types; UNION DISTINCT does not.
Requirements
Each row of the UNION result set must be in the result set of at least one of its SELECT statements.
Each SELECT statement must specify the same number of columns.
Data types of corresponding SELECT statement columns must be compatible, otherwise Vertica returns an error.
ORDER BY, LIMIT, and OFFSET clauses in UNION
A UNION statement can specify its own ORDER BY, LIMIT, and OFFSET clauses, as in the following example:
=> SELECT id, emp_name FROM company_a UNION ALL SELECT id, emp_name FROM company_b ORDER BY emp_name LIMIT 2;
id | emp_name
------+----------
5678 | Alice
8765 | Bob
(2 rows)
Each SELECT statement in a UNION clause can specify its own ORDER BY, LIMIT, and OFFSET clauses. Vertica processes the SELECT statement clauses before it processes the UNION clauses. In the following example, Vertica processes the individual queries and then concatenates the two result sets:
=> (SELECT id, emp_name FROM company_a ORDER BY emp_name LIMIT 2)
UNION ALL
(SELECT id, emp_name FROM company_b ORDER BY emp_name LIMIT 2);
id | emp_name
------+-----------
5678 | Alice
9012 | Katherine
8765 | Bob
9012 | Katherine
(4 rows)
The following requirements and restrictions determine how Vertica processes a UNION clause that contains ORDER BY, LIMIT, and OFFSET clauses:
A UNION's ORDER BY clause must specify columns from the first (leftmost) SELECT statement.
ORDER BY must precede LIMIT and OFFSET.
When a SELECT or UNION statement specifies both LIMIT and OFFSET, Vertica first processes the OFFSET statement, and then applies the LIMIT statement to the remaining rows.
=> SELECT DISTINCT customer_key, customer_name FROM public.customer_dimension
WHERE customer_key IN
(SELECT customer_key FROM store.store_sales_fact WHERE sales_dollar_amount > 500
UNION ALL
SELECT customer_key FROM online_sales.online_sales_fact WHERE sales_dollar_amount > 500)
AND customer_state = 'CT';
customer_key | customer_name
--------------+------------------------
7021 | Luigi T. Dobisz
1971 | Betty V. Dobisz
46284 | Ben C. Gauthier
33885 | Tanya Y. Taylor
5449 | Sarah O. Robinson
29059 | Sally Z. Fortin
11200 | Foodhope
15582 | John J. McNulty
24638 | Alexandra F. Jones
...
UNION ALL with complex types
You can use UNION ALL with complex types. Consider a table with the following definition:
=> CREATE TABLE restaurants(
name VARCHAR, cuisine VARCHAR,
locations ARRAY[ROW(city VARCHAR(50), state VARCHAR(2)),50],
menu ARRAY[ROW(item VARCHAR(50), price FLOAT),100] );
Suppose you are in a new city looking for a place to eat. The database has information about the following restaurants:
=> SELECT name, cuisine FROM restaurants
WHERE CONTAINS(locations,ROW('Pittsburgh', 'PA'));
name | cuisine
-------------------+----------
Bakersfield Tacos | Mexican
Bob's pizzeria | Italian
Succulent Steaks | American
Sushi House | Asian
Villa Milano | Italian
(5 rows)
Suppose you are hungry for Italian food. If you cannot have Italian, you want something inexpensive. The following query uses two SELECT clauses from the same table, one finding menu items for Italian restaurants and one finding menu items under $10:
=> WITH menu_entries AS
(SELECT name, cuisine,
EXPLODE(menu USING PARAMETERS skip_partitioning=true) AS (idx, menu_entry)
FROM restaurants WHERE CONTAINS(locations,ROW('Pittsburgh', 'PA')))
SELECT name, cuisine, menu_entry FROM menu_entries WHERE cuisine = 'Italian'
UNION ALL
SELECT name, cuisine, menu_entry FROM menu_entries WHERE menu_entry.price <= 10;
name | cuisine | menu_entry
-------------------+---------+--------------------------------------------
Bob's pizzeria | Italian | {"item":"cheese pizza","price":8.25}
Bob's pizzeria | Italian | {"item":"spinach pizza","price":10.5}
Villa Milano | Italian | {"item":"pasta carbonara","price":24.99}
Villa Milano | Italian | {"item":"eggplant parmesan","price":23.49}
Villa Milano | Italian | {"item":"herbed salmon","price":28.99}
Bakersfield Tacos | Mexican | {"item":"veggie taco","price":9.95}
Bob's pizzeria | Italian | {"item":"cheese pizza","price":8.25}
(7 rows)
You cannot use LIMIT OVER with UNION ALL if the selected columns are of complex types. In this case, the statement returns an error like "Multi-value expressions are not supported in this context". You can still use LIMIT OVER in a single SELECT statement by using parentheses to make the scoping explicit.
Examples
The examples that follow use these two tables:
=> SELECT * FROM company_a;
ID emp_name dept sales
------+------------+-------------+-------
1234 | Stephen | auto parts | 1000
5678 | Alice | auto parts | 2500
9012 | Katherine | floral | 500
=> SELECT * FROM company_b;
ID emp_name dept sales
------+------------+-------------+-------
4321 | Marvin | home goods | 250
9012 | Katherine | home goods | 500
8765 | Bob | electronics | 20000
The following query finds all employee IDs and names from the two tables. The UNION statement uses DISTINCT to combine unique IDs and last names of employees. Katherine works for both companies, so she appears only once in the result set. DISTINCT is the default and can be omitted:
=> SELECT id, emp_name FROM company_a
UNION DISTINCT SELECT id, emp_name FROM company_b ORDER BY id;
id | emp_name
------+-----------
1234 | Stephen
4321 | Marvin
5678 | Alice
8765 | Bob
9012 | Katherine
(5 rows)
If the UNION statement instead uses ALL, the query returns two records for Katherine:
=> SELECT id, emp_name FROM company_a
UNION ALL SELECT id, emp_name FROM company_b ORDER BY id;
id | emp_name
------+-----------
1234 | Stephen
5678 | Alice
9012 | Katherine
4321 | Marvin
9012 | Katherine
8765 | Bob
(6 rows)
The following query returns the top two salespeople in each company. Each SELECT statement specifies its own ORDER BY and LIMIT clauses, and the UNION statement concatenates the result sets as returned by each query:
=> (SELECT id, emp_name, sales FROM company_a ORDER BY sales DESC LIMIT 2)
UNION ALL
(SELECT id, emp_name, sales FROM company_b ORDER BY sales DESC LIMIT 2);
id | emp_name | sales
------+-----------+-------
8765 | Bob | 20000
5678 | Alice | 2500
1234 | Stephen | 1000
9012 | Katherine | 500
(4 rows)
The following query returns all employees in both companies with an overall ordering. The ORDER BY clause is part of the UNION statement:
=> SELECT id, emp_name, sales FROM company_a
UNION
SELECT id, emp_name, sales FROM company_b
ORDER BY sales;
id | emp_name | sales
------+-----------+-------
4321 | Marvin | 250
9012 | Katherine | 500
1234 | Stephen | 1000
5678 | Alice | 2500
8765 | Bob | 20000
(5 rows)
The following query groups total sales by department within each company. Each SELECT statement has its own GROUP BY clause. UNION combines the aggregate results from each query:
=> (SELECT 'Company A' as company, dept, SUM(sales) FROM company_a
GROUP BY dept)
UNION
(SELECT 'Company B' as company, dept, SUM(sales) FROM company_b
GROUP BY dept)
ORDER BY 1;
company | dept | sum
-----------+-------------+-------
Company A | auto parts | 3500
Company A | floral | 500
Company B | electronics | 20000
Company B | home goods | 750
(4 rows)
Specifies which rows to include in a query's result set.
Specifies which rows to include in a query's result set.
Syntax
WHERE boolean-expression [ subquery ]...
Arguments
boolean-expression
An expression that returns true or false. The result set only includes rows that evaluate to true. The expression can include boolean operators and the following predicate elements:
Use parentheses to group expressions, predicates, and boolean operators. For example:
... WHERE NOT (A=1 AND B=2) OR C=3;
Examples
The following example returns the names of all customers in the Eastern region whose name starts with the string Amer:
=> SELECT DISTINCT customer_name
FROM customer_dimension
WHERE customer_region = 'East'
AND customer_name ILIKE 'Amer%';
customer_name
---------------
Americare
Americom
Americore
Americorp
Ameridata
Amerigen
Amerihope
Amerimedia
Amerishop
Ameristar
Ameritech
(11 rows)
15 - WITH clause
A WITH clause defines one or more named common table expressions (CTEs), where each CTE encapsulates a result set that can be referenced by another CTE in the same WITH clause, or by the primary query.
A WITH clause defines one or more named common table expressions (CTEs), where each CTE encapsulates a result set that can be referenced by another CTE in the same WITH clause, or by the primary query. Vertica can evaluate WITH clauses in two ways:
Inline expansion (default): Vertica evaluates each WITH clause every time it is referenced by the primary query.
Materialization: Vertica evaluates each WITH clause once, stores results in a temporary table, and references this table as often as the query requires.
In both cases, WITH clauses can help simplify complicated queries and avoid statement repetition.
Enables materialization of all queries in the current WITH clause. Otherwise, materialization is set by configuration parameter WithClauseMaterialization, by default set to 0 (disabled). If WithClauseMaterialization is disabled, materialization is automatically cleared when the primary query of the WITH clause returns. For details, see Materialization of WITH clause.
RECURSIVE
Specifies to iterate over the WITH clause's own result set, through repeated execution of an embedded UNION or UNION ALL statement. For details, see WITH clause recursion.
cte-identifier
Identifies a common table expression (CTE) within a WITH clause. This identifier is available to CTEs of the same WITH clause, and of parent and child WITH clauses (if any). CTE identifiers of the outermost (primary) WITH clause are also available to the primary query.
All CTE identifiers of the same WITH clause must be unique. For example, the following WITH clause defines two CTEs, so they require unique identifiers: regional_sales and top_regions:
WITH
-- query sale amounts for each region
regional_sales AS (SELECT ... ),
top_regions AS ( SELECT ... )
)
column-aliases
A comma-delimited list of result set column aliases. The list of aliases must map to all column expressions in the CTE query. If omitted, result set columns can only be referenced by the names used in the query.
In the following example, the revenue CTE specifies two column aliases: vkey and total_revenue. These map to column vendor_key and aggregate expression SUM(total_order_cost), respectively. The primary query references these aliases:
WITH revenue ( vkey, total_revenue ) AS (
SELECT vendor_key, SUM(total_order_cost)
FROM store.store_orders_fact
GROUP BY vendor_key ORDER BY vendor_key)
SELECT v.vendor_name, v.vendor_address, v.vendor_city, r.total_revenue
FROM vendor_dimension v JOIN revenue r ON v.vendor_key = r.vkey
WHERE r.total_revenue = (SELECT MAX(total_revenue) FROM revenue )
ORDER BY vendor_name;
subordinate-WITH-clause
A WITH clause that is nested within the current one. CTEs of this WITH clause can only reference CTEs of the same clause, and of parent and child WITH clauses.
Important
The primary query can only reference CTEs in the outermost WITH clause. It cannot reference the CTEs of any nested WITH clause.
query-expression
The query of a given CTE.
Restrictions
WITH clauses only support SELECT and INSERT statements. They do not support UPDATE or DELETE statements.
The following SQL defines a WITH clause with one CTE, revenue, which aggregates data in table store.store_orders_fact. The primary query references the WITH clause result set twice: in its JOIN clause and predicate:
-- define WITH clause
WITH revenue ( vkey, total_revenue ) AS (
SELECT vendor_key, SUM(total_order_cost)
FROM store.store_orders_fact
GROUP BY vendor_key ORDER BY 1)
-- End WITH clause
-- primary query
SELECT v.vendor_name, v.vendor_address, v.vendor_city, r.total_revenue
FROM vendor_dimension v JOIN revenue r ON v.vendor_key = r.vkey
WHERE r.total_revenue = (SELECT MAX(total_revenue) FROM revenue )
ORDER BY vendor_name;
vendor_name | vendor_address | vendor_city | total_revenue
------------------+----------------+-------------+---------------
Frozen Suppliers | 471 Mission St | Peoria | 49877044
(1 row)
Single WITH clause and multiple CTEs
In the following example, the WITH clause contains two CTEs:
regional_sales totals sales for each region
top_regions uses the result set from regional_sales to identify the three regions with the highest sales:
The primary query aggregates sales by region and departments in the top_regions result set:
WITH
-- query sale amounts for each region
regional_sales (region, total_sales) AS (
SELECT sd.store_region, SUM(of.total_order_cost) AS total_sales
FROM store.store_dimension sd JOIN store.store_orders_fact of ON sd.store_key = of.store_key
GROUP BY store_region ),
-- query previous result set
top_regions AS (
SELECT region, total_sales
FROM regional_sales ORDER BY total_sales DESC LIMIT 3
)
-- primary query
-- aggregate sales in top_regions result set
SELECT sd.store_region AS region, pd.department_description AS department, SUM(of.total_order_cost) AS product_sales
FROM store.store_orders_fact of
JOIN store.store_dimension sd ON sd.store_key = of.store_key
JOIN public.product_dimension pd ON of.product_key = pd.product_key
WHERE sd.store_region IN (SELECT region FROM top_regions)
GROUP BY ROLLUP (region, department) ORDER BY region, product_sales DESC, GROUPING_ID();
region | department | product_sales
---------+----------------------------------+---------------
East | | 1716917786
East | Meat | 189837962
East | Produce | 170607880
East | Photography | 162271618
East | Frozen Goods | 141077867
East | Gifts | 137604397
East | Bakery | 136497842
East | Liquor | 130410463
East | Canned Goods | 128683257
East | Cleaning supplies | 118996326
East | Dairy | 118866901
East | Seafood | 109986665
East | Medical | 100404891
East | Pharmacy | 71671717
MidWest | | 1287550770
MidWest | Meat | 141446607
MidWest | Produce | 125156100
MidWest | Photography | 122666753
MidWest | Frozen Goods | 105893534
MidWest | Gifts | 103088595
MidWest | Bakery | 102844467
MidWest | Canned Goods | 97647270
MidWest | Liquor | 97306898
MidWest | Cleaning supplies | 90775242
MidWest | Dairy | 89065443
MidWest | Seafood | 82541528
MidWest | Medical | 76674814
MidWest | Pharmacy | 52443519
West | | 2159765937
West | Meat | 235841506
West | Produce | 215277204
West | Photography | 205949467
West | Frozen Goods | 178311593
West | Bakery | 172824555
West | Gifts | 172134780
West | Liquor | 164798022
West | Canned Goods | 163330813
West | Cleaning supplies | 148776443
West | Dairy | 145244575
West | Seafood | 139464407
West | Medical | 126184049
West | Pharmacy | 91628523
| | 5164234493
(43 rows)
INSERT statement that includes WITH clause
The following SQL uses a WITH clause to insert data from a JOIN query into table total_store_sales:
CREATE TABLE total_store_sales (store_key int, region VARCHAR(20), store_sales numeric (12,2));
INSERT INTO total_store_sales
WITH store_sales AS (
SELECT sd.store_key, sd.store_region::VARCHAR(20), SUM (of.total_order_cost)
FROM store.store_dimension sd JOIN store.store_orders_fact of ON sd.store_key = of.store_key
GROUP BY sd.store_region, sd.store_key ORDER BY sd.store_region, sd.store_key)
SELECT * FROM store_sales;
=> SELECT * FROM total_store_sales ORDER BY region, store_key;
store_key | region | store_sales
-----------+-----------+-------------
2 | East | 47668303.00
6 | East | 48136354.00
12 | East | 46673113.00
22 | East | 48711211.00
24 | East | 48603836.00
31 | East | 46836469.00
36 | East | 48461449.00
37 | East | 48018279.00
41 | East | 48713084.00
44 | East | 47808362.00
49 | East | 46990023.00
50 | East | 47643329.00
9 | MidWest | 46851087.00
15 | MidWest | 48787354.00
27 | MidWest | 48497620.00
29 | MidWest | 47639234.00
30 | MidWest | 49013483.00
38 | MidWest | 48856012.00
42 | MidWest | 47297912.00
45 | MidWest | 48544521.00
46 | MidWest | 48887255.00
4 | NorthWest | 47580215.00
39 | NorthWest | 47136892.00
47 | NorthWest | 48477574.00
8 | South | 48131455.00
13 | South | 47605422.00
17 | South | 46054367.00
...
(50 rows)
15.1 - Inline expansion of WITH clause
By default, Vertica uses inline expansion to evaluate WITH clauses.
By default, Vertica uses inline expansion to evaluate WITH clauses. Vertica evaluates each WITH clause every time it is referenced by the primary query. Inline expansion often works best if the query does not reference the same WITH clause multiple times, or if some local optimizations are possible after inline expansion.
Example
The following example shows a WITH clause that is a good candidate for inline expansion. The WITH clause is used in a query that obtains order information for all 2007 orders shipped between December 01-07:
Expands the WITH clause reference to store_orders_fact_new within the primary query.
After expanding the WITH clause, evaluates the primary query.
15.2 - Materialization of WITH clause
When materialization is enabled, Vertica evaluates each WITH clause once, stores results in a temporary table, and references this table as often as the query requires.
When materialization is enabled, Vertica evaluates each WITH clause once, stores results in a temporary table, and references this table as often as the query requires. Vertica drops the temporary table after primary query execution completes.
Note
If the primary query returns with an error, temporary tables might be dropped only after the client’s session ends.
Materialization can facilitate better performance when WITH clauses are complex—for example, when the WITH clauses contain JOIN and GROUP BY clauses, and are referenced multiple times in the primary query.
If materialization is enabled, WITH statements perform an auto-commit of the user transaction. This occurs even when using EXPLAIN with the WITH statement.
Enabling WITH clause materialization
WITH materialization is set by configuration parameter WithClauseMaterialization, by default set to 0 (disabled). You can enable and disable materialization by setting WithClauseMaterialization at database and session levels, with ALTER DATABASE and ALTER SESSION, respectively:
Database:
=> ALTER DATABASE db-spec SET PARAMETER WithClauseMaterialization={ 0 | 1 };
=> ALTER DATABASE db-spec CLEAR PARAMETER WithClauseMaterialization;
Session: Parameter setting remains in effect until you explicitly clear it, or the session ends.
=> ALTER SESSION SET PARAMETER WithClauseMaterialization={ 0 | 1 };
=> ALTER SESSION CLEAR PARAMETER WithClauseMaterialization;
You can also enable WITH materialization for individual queries with the hint ENABLE_WITH_CLAUSE_MATERIALIZATION. Materialization is automatically cleared when the query returns. For example:
=> WITH /*+ENABLE_WITH_CLAUSE_MATERIALIZATION */ revenue AS (
SELECT vendor_key, SUM(total_order_cost) AS total_revenue
FROM store.store_orders_fact
GROUP BY vendor_key ORDER BY 1)
...
Processing WITH clauses using EE5 temp relations
By default, when WITH clause queries are reused, Vertica saves those WITH clause query outputs in EE5 temp relations. However, this option can be changed. EE5 temp relation support is set by configuration parameter EnableWITHTempRelReuseLimit, which can be set in the following ways:
0: Disables this feature.
1: Force-saves all WITH clause queries into EE5 temp relations, whether or not they are reused.
2 (default): Saves only reused WITH clause queries into EE5 temp relations.
3 or more: Saves WITH clause queries into EE5 temp relations only when they are used at least this number of times.
EnableWITHTempRelReuseLimit can be set at database and session levels, with ALTER DATABASE and ALTER SESSION, respectively. When WithClauseMaterialization is set to 1, that setting overrides any EnableWITHTempRelReuseLimit settings.
Note that for WITH queries with complex types, temp relations are disabled.
Example
The following example shows a WITH clause that is a good candidate for materialization. The query obtains data for the vendor who has the highest combined order cost for all orders:
-- Enable materialization
=> ALTER SESSION SET PARAMETER WithClauseMaterialization=1;
-- Define WITH clause
=> WITH revenue AS (
SELECT vendor_key, SUM(total_order_cost) AS total_revenue
FROM store.store_orders_fact
GROUP BY vendor_key ORDER BY 1)
-- End WITH clause
-- Primary query
=> SELECT vendor_name, vendor_address, vendor_city, total_revenue
FROM vendor_dimension v, revenue r
WHERE v.vendor_key = r.vendor_key AND total_revenue = (SELECT MAX(total_revenue) FROM revenue )
ORDER BY vendor_name;
vendor_name | vendor_address | vendor_city | total_revenue
------------------+----------------+-------------+---------------
Frozen Suppliers | 471 Mission St | Peoria | 49877044
(1 row)
Vertica processes this query as follows:
WITH clause revenue evaluates its SELECT statement from table store.store_orders_fact.
Results of the revenue clause are stored in a local temporary table.
Whenever the revenue clause statement is referenced, the results stored in the table are used.
The temporary table is dropped when query execution is complete.
15.3 - WITH clause recursion
A WITH clause that includes the RECURSIVE option iterates over its own output through repeated execution of a UNION or UNION ALL query.
A WITH clause that includes the RECURSIVE option iterates over its own output through repeated execution of a UNION or UNION ALL query. Recursive queries are useful when working with self-referential data—hierarchies such as manager-subordinate relationships, or tree-structured data such as taxonomies.
The configuration parameter WithClauseRecursionLimit—by default set to 8—sets the maximum depth of recursion. You can set this parameter at database and session scopes with ALTER DATABASE and ALTER SESSION, respectively. Recursion continues until it reaches the configured maximum depth, or until the last iteration returns with no data.
You specify a recursive WITH clause as follows:
WITH [ /*+ENABLE_WITH_CLAUSE_MATERIALIZATION*/ ] RECURSIVE
cte-identifier [ ( column-aliases ) ] AS (
non-recursive-term
UNION [ ALL ]
recursive-term
)
Non-recursive and recursive terms are separated by UNION or UNION ALL:
The non-recursive-term query sets its result set in cte-identifier, which is subject to recursion in recursive-term.
The UNION statement's recursive-term recursively iterates over its own output. When recursion is complete, the results of all iterations are compiled and set in cte-identifier.
For example:
=> ALTER SESSION SET PARAMETER WithClauseRecursionLimit=4; -- maximum recursion depth = 4
=> WITH RECURSIVE nums (n) AS (
SELECT 1 -- non-recursive (base) term
UNION ALL
SELECT n+1 FROM nums -- recursive term
)
SELECT n FROM nums; -- primary query
This simple query executes as follows:
Executes the WITH RECURSIVE clause:
Evaluates the non-recursive term SELECT 1, and places the result set—1—in nums.
Iterates over the UNION ALL query (SELECT n+1) until the number of iterations is greater than the configuration parameter WithClauseRecursionLimit.
Combines the results of all UNION queries and sets the result set in nums, and then exits to the primary query.
Executes the primary query SELECT n FROM nums:
n
---
1
2
3
4
5
(5 rows)
In this case , WITH RECURSIVE clause exits after four iterations as per WithClauseRecursionLimit. If you restore WithClauseRecursionLimit to its default value of 8, then the clause exits after eight iterations:
=> ALTER SESSION CLEAR PARAMETER WithClauseRecursionLimit;
=> WITH RECURSIVE nums (n) AS (
SELECT 1
UNION ALL
SELECT n+1 FROM nums
)
SELECT n FROM nums;
n
---
1
2
3
4
5
6
7
8
9
(9 rows)
Important
Be careful to set WithClauseRecursionLimit only as high as needed to traverse the deepest hierarchies. Vertica sets no limit on this parameter; however, a high value can incur considerable overhead that adversely affects performance and exhausts system resources.
If a high recursion count is required, then consider enabling materialization. For details, see WITH RECURSIVE Materialization.
Restrictions
The following restrictions apply:
The SELECT list of a non-recursive term cannot include the wildcard * (asterisk) or the function MATCH_COLUMNS.
A recursive term can reference the target CTE only once.
Recursive reference cannot appear within an outer join.
Recursive reference cannot appear within a subquery.
WITH clauses do not support UNION options ORDER BY, LIMIT, and OFFSET.
Examples
A small software company maintains the following data on employees and their managers:
=> SELECT * FROM personnel.employees ORDER BY emp_id;
emp_id | fname | lname | section_id | section_name | section_leader | leader_id
--------+-----------+-----------+------------+---------------------+------------------+-----------
0 | Stephen | Mulligan | 0 | | |
1 | Michael | North | 201 | Development | Zoe Black | 3
2 | Megan | Berry | 202 | QA | Richard Chan | 18
3 | Zoe | Black | 101 | Product Development | Renuka Patil | 24
4 | Tim | James | 203 | IT | Ebuka Udechukwu | 17
5 | Bella | Tucker | 201 | Development | Zoe Black | 3
6 | Alexandra | Climo | 202 | QA | Richard Chan | 18
7 | Leonard | Gray | 203 | IT | Ebuka Udechukwu | 17
8 | Carolyn | Henderson | 201 | Development | Zoe Black | 3
9 | Ryan | Henderson | 201 | Development | Zoe Black | 3
10 | Frank | Tucker | 205 | Sales | Benjamin Glover | 29
11 | Nathan | Ferguson | 102 | Sales Marketing | Eric Redfield | 28
12 | Kevin | Rampling | 101 | Product Development | Renuka Patil | 24
13 | Tuy Kim | Duong | 201 | Development | Zoe Black | 3
14 | Dwipendra | Sing | 204 | Tech Support | Sarah Feldman | 26
15 | Dylan | Wijman | 206 | Documentation | Kevin Rampling | 12
16 | Tamar | Sasson | 207 | Marketing | Nathan Ferguson | 11
17 | Ebuka | Udechukwu | 101 | Product Development | Renuka Patil | 24
18 | Richard | Chan | 101 | Product Development | Renuka Patil | 24
19 | Maria | del Rio | 201 | Development | Zoe Black | 3
20 | Hua | Song | 204 | Tech Support | Sarah Feldman | 26
21 | Carmen | Lopez | 204 | Tech Support | Sarah Feldman | 26
22 | Edgar | Mejia | 206 | Documentation | Kevin Rampling | 12
23 | Riad | Salim | 201 | Development | Zoe Black | 3
24 | Renuka | Patil | 100 | Executive Office | Stephen Mulligan | 0
25 | Rina | Dsouza | 202 | QA | Richard Chan | 18
26 | Sarah | Feldman | 101 | Product Development | Renuka Patil | 24
27 | Max | Mills | 102 | Sales Marketing | Eric Redfield | 28
28 | Eric | Redfield | 100 | Executive Office | Stephen Mulligan | 0
29 | Benjamin | Glover | 102 | Sales Marketing | Eric Redfield | 28
30 | Dominic | King | 205 | Sales | Benjamin Glover | 29
32 | Ryan | Metcalfe | 206 | Documentation | Kevin Rampling | 12
33 | Piers | Paige | 201 | Development | Zoe Black | 3
34 | Nicola | Kelly | 207 | Marketing | Nathan Ferguson | 11
(34 rows)
You can query this data for employee-manager relationships through WITH RECURSIVE. For example, the following query's WITH RECURSIVE clause gets employee-manager relationships for employee Eric Redfield, including all employees who report directly and indirectly to him:
WITH RECURSIVE managers (employeeID, employeeName, sectionID, section, lead, leadID)
AS (SELECT emp_id, fname||' '||lname, section_id, section_name, section_leader, leader_id
FROM personnel.employees WHERE fname||' '||lname = 'Eric Redfield'
UNION
SELECT emp_id, fname||' '||lname AS employee_name, section_id, section_name, section_leader, leader_id FROM personnel.employees e
JOIN managers m ON m.employeeID = e.leader_id)
SELECT employeeID, employeeName, lead AS 'Reports to', section, leadID from managers ORDER BY sectionID, employeeName;
The WITH RECURSIVE clause defines the CTE managers, and then executes in two phases:
The non-recursive term populates managers with data that it queries from personnel.employees.
The recursive term's UNION query iterates over its own output until, on the fourth cycle, it finds no more data. The results of all iterations are then compiled and set in managers, and the WITH CLAUSE exits to the primary query.
The primary query returns three levels of data from managers—one for each recursive iteration:
Similarly, the following query iterates over the same data to get all employee-manager relationships for employee Richard Chan, who is one level lower in the company chain of command:
WITH RECURSIVE managers (employeeID, employeeName, sectionID, section, lead, leadID)
AS (SELECT emp_id, fname||' '||lname, section_id, section_name, section_leader, leader_id
FROM personnel.employees WHERE fname||' '||lname = 'Richard Chan'
UNION
SELECT emp_id, fname||' '||lname AS employee_name, section_id, section_name, section_leader, leader_id FROM personnel.employees e
JOIN managers m ON m.employeeID = e.leader_id)
SELECT employeeID, employeeName, lead AS 'Reports to', section, leadID from managers ORDER BY sectionID, employeeName;
The WITH RECURSIVE clause executes as before, except this time it finds no more data after two iterations and exits. Accordingly, the primary query returns two levels of data from managers:
WITH RECURSIVE materialization
By default, materialization is disabled. In this case, Vertica rewrites the WITH RECURSIVE query into subqueries, as many as necessary for the required level of recursion.
If recursion is very deep, the high number of query rewrites is liable to incur considerable overhead that adversely affects performance and exhausts system resources. In this case, consider enabling materialization, either with the configuration parameter WithClauseMaterialization, or the hint ENABLE_WITH_CLAUSE_MATERIALIZATION. In either case, intermediate result sets from all recursion levels are written to local temporary tables. When recursion is complete, the intermediate results in all temporary tables are compiled and passed on to the primary query.
Note
If materialization is not possible, you can improve throughput on a resource pool that handles recursive queries by setting its EXECUTIONPARALLELISM parameter to 1.