This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
SELECT
Returns a result set from one or more data sources—tables, views, joined tables, and named subqueries.
Returns a result set from one or more data sources—tables, views, joined tables, and named subqueries.
Syntax
[ AT epoch ] [ WITH-clause ] SELECT [ ALL | DISTINCT ]
{ * | { MATCH_COLUMNS('pattern') | expression [ [AS] alias ] }[,...] }
[ into-table-clause ]
[ from-clause ]
[ where-clause ]
[ time-series-clause ]
[ group-by-clause[,...] ]
[ having-clause[,...] ]
[ match-clause ]
[ union-clause ]
[ intersect-clause ]
[ except-clause ]
[ order-by-clause [ offset-clause ]]
[ limit-clause ]
[ FOR UPDATE [ OF table-name[,...] ] ]
Note
SELECT statements can also embed various directives, or hints, that let you control how a given query is handled—for example, join hints such as JTYPE, which enforces the join type (merge or hash join).
For details on using Vertica hints, see Hints.
Parameters
AT
epoch
- Returns data from the specified epoch, where
epoch
is one of the following:
-
EPOCH LATEST: Return data up to but not including the current epoch. The result set includes data from the latest committed DML transaction.
-
EPOCH integer
: Return data up to and including the integer
-specified epoch.
-
TIME 'timestamp
': Return data from the timestamp
-specified epoch.
Note
These options are ignored if used to query temporary or external tables.
See Epochs for additional information about how Vertica uses epochs.
For details, see Historical queries.
ALL | DISTINCT
-
The
ALL
or DISTINCT
qualifier must immediately follow the SELECT
keyword. Only one instance of this keyword can appear in the select list.
*
- Lists all columns in the queried tables.
Caution
Selecting all columns from the queried tables can produce a very large wide set, which can adversely affect performance.
-
MATCH_COLUMNS('pattern')
- Returns all columns in the queried tables that match
pattern
.
expression
[[AS]
alias
]
- An expression that typically resolves to column data from the queried tables—for example, names of columns that are specified in the FROM clause; also:
You can optionally assign a temporary alias to each column expression and reference that alias elsewhere in the SELECT statement—for example, in the query predicate or ORDER BY clause. Vertica uses the alias as the column heading in query output.
FOR UPDATE
- Specifies to obtain an X lock on all tables specified in the query, most often used from
READ COMMITTED
isolation.
FOR UPDATE
requires update/delete permissions on the queried tables and cannot be issued from a read-only transaction.
Privileges
Non-superusers:
Note
As view owner, you can grant other users SELECT privilege on the view only if one of the following is true:
Examples
When multiple clients run transactions as in the following example query, deadlocks can occur if FOR UPDATE
is not used. Two transactions acquire an S lock, and when both attempt to upgrade to an X lock, they encounter deadlocks:
=> SELECT balance FROM accounts WHERE account_id=3476 FOR UPDATE;
...
=> UPDATE accounts SET balance = balance+10 WHERE account_id=3476;
=> COMMIT;
See also
1 - EXCEPT clause
Combines two or more SELECT queries.
Combines two or more SELECT queries. EXCEPT returns distinct results of the left-hand query that are not also found in the right-hand query.
Note
MINUS is an alias for EXCEPT.
Syntax
SELECT
EXCEPT except-query[...]
[ ORDER BY { column-name | ordinal-number } [ ASC | DESC ] [,...] ]
[ LIMIT { integer | ALL } ]
[ OFFSET integer ]
Notes
-
Use the EXCEPT clause to filter out specific results from a SELECT statement. The EXCEPT query operates on the results of two or more SELECT queries. It returns only those rows in the left-hand query that are not also present in the right-hand query.
-
Vertica evaluates multiple EXCEPT clauses in the same SELECT query from left to right, unless parentheses indicate otherwise.
-
You cannot use the ALL keyword with an EXCEPT query.
-
The results of each SELECT statement must be union compatible. Each statement must return the same number of columns, and the corresponding columns must have compatible data types. For example, you cannot use the EXCEPT clause on a column of type INTEGER and a column of type VARCHAR. If statements do not meet these criteria, Vertica returns an error.
Note
The
Data type coercion chart lists the data types that can be cast to other data types. If one data type can be cast to the other, those two data types are compatible.
-
You can use EXCEPT in FROM, WHERE, and HAVING clauses.
-
You can order the results of an EXCEPT operation by including an ORDER BY operation in the statement. When you write the ORDER BY list, specify the column names from the leftmost SELECT statement, or specify integers that indicate the position of the columns by which to sort.
-
The rightmost ORDER BY, LIMIT, or OFFSET clauses in an EXCEPT query do not need to be enclosed in parentheses, because the rightmost query specifies that Vertica perform the operation on the results of the EXCEPT operation. Any ORDER BY, LIMIT, or OFFSET clauses contained in SELECT queries that appear earlier in the EXCEPT query must be enclosed in parentheses.
-
Vertica supports EXCEPT noncorrelated subquery predicates. For example:
=> SELECT * FROM T1
WHERE T1.x IN
(SELECT MAX(c1) FROM T2
EXCEPT
SELECT MAX(cc1) FROM T3
EXCEPT
SELECT MAX(d1) FROM T4);
Examples
Consider the following three tables:
Company_A
Id | emp_lname | dept | sales
------+-----------+----------------+-------
1234 | Stephen | auto parts | 1000
5678 | Alice | auto parts | 2500
9012 | Katherine | floral | 500
3214 | Smithson | sporting goods | 1500
(4 rows)
Company_B
Id | emp_lname | dept | sales
------+-----------+-------------+-------
4321 | Marvin | home goods | 250
8765 | Bob | electronics | 20000
9012 | Katherine | home goods | 500
3214 | Smithson | home goods | 1500
(4 rows)
Company_C
Id | emp_lname | dept | sales
------+-----------+----------------+-------
3214 | Smithson | sporting goods | 1500
5432 | Madison | sporting goods | 400
7865 | Cleveland | outdoor | 1500
1234 | Stephen | floral | 1000
(4 rows)
The following query returns the IDs and last names of employees that exist in Company_A, but not in Company_B:
=> SELECT id, emp_lname FROM Company_A
EXCEPT
SELECT id, emp_lname FROM Company_B;
id | emp_lname
------+-----------
1234 | Stephen
5678 | Alice
(2 rows)
The following query sorts the results of the previous query by employee last name:
=> SELECT id, emp_lname FROM Company_A
EXCEPT
SELECT id, emp_lname FROM Company_B
ORDER BY emp_lname ASC;
id | emp_lname
------+-----------
5678 | Alice
1234 | Stephen
(2 rows)
If you order by the column position, the query returns the same results:
=> SELECT id, emp_lname FROM Company_A
EXCEPT
SELECT id, emp_lname FROM Company_B
ORDER BY 2 ASC;
id | emp_lname
------+-----------
5678 | Alice
1234 | Stephen
(2 rows)
The following query returns the IDs and last names of employees that exist in Company_A, but not in Company_B or Company_C:
=> SELECT id, emp_lname FROM Company_A
EXCEPT
SELECT id, emp_lname FROM Company_B
EXCEPT
SELECT id, emp_lname FROM Company_C;
id | emp_lname
------+-----------
5678 | Alice
(1 row)
The following query shows the results of mismatched data types:
=> SELECT id, emp_lname FROM Company_A
EXCEPT
SELECT emp_lname, id FROM Company_B;
ERROR 3429: For 'EXCEPT', types int and varchar are inconsistent
DETAIL: Columns: id and emp_lname
Using the VMart example database, the following query returns information about all Connecticut-based customers who bought items through stores and whose purchases amounted to more than $500, except for those customers who paid cash:
=> SELECT customer_key, customer_name FROM public.customer_dimension
WHERE customer_key IN (SELECT customer_key FROM store.store_sales_fact
WHERE sales_dollar_amount > 500
EXCEPT
SELECT customer_key FROM store.store_sales_fact
WHERE tender_type = 'Cash')
AND customer_state = 'CT';
customer_key | customer_name
--------------+----------------------
15084 | Doug V. Lampert
21730 | Juanita F. Peterson
24412 | Mary U. Garnett
25840 | Ben Z. Taylor
29940 | Brian B. Dobisz
32225 | Ruth T. McNulty
33127 | Darlene Y. Rodriguez
40000 | Steve L. Lewis
44383 | Amy G. Jones
46495 | Kevin H. Taylor
(10 rows)
See also
2 - FROM clause
A comma-separated list of data sources to query.
A comma-separated list of data sources to query.
Syntax
FROM dataset[,...] [ TABLESAMPLE(percent) ]
Parameters
dataset
``
- A set of data to query, one of the following:
-
`TABLESAMPLE(percent)`
- Specifies to return a random sampling of records, where
percent
specifies the approximate sampling size. The percent
value must be between 0 and 100, exclusive, and can include decimal values. The number of records returned is not guaranteed to be the exact percentage specified.
All rows of the data have equal opportunities to be selected. Vertica performs sampling before applying other query filters.
Examples
Count all records in customer_dimension
table:
=> SELECT COUNT(*) FROM customer_dimension;
COUNT
-------
50000
(1 row)
Return a small sampling of rows in table customer_dimension
:
=> SELECT customer_name, customer_state FROM customer_dimension TABLESAMPLE(0.5) WHERE customer_state='IL';
customer_name | customer_state
---------------------+----------------
Amy Y. McNulty | IL
Daniel C. Nguyen | IL
Midori O. Greenwood | IL
Meghan U. Lampert | IL
Tiffany Y. Lang | IL
Laura S. King | IL
Steve T. Nguyen | IL
Craig S. Webber | IL
Luigi A. Lewis | IL
Mark W. Williams | IL
(10 rows)
2.1 - Joined-table
Specifies how to join tables.
Specifies how to join tables.
Syntax
table-reference [ join-type ] JOIN table-reference [ TABLESAMPLE(percent) ] [ ON join-predicate ]
Arguments
table-reference
- A table name, optionally qualified.
join-type
- One of the following:
TABLESAMPLE(
percent
)
- Use simple random sampling to return an approximate percentage of records. The percentage value must be greater than 0 and less than 100. All rows in the total potential return set are equally eligible to be included in the sampling. Vertica performs this sampling before other filters in the query are applied. The number of records returned is not guaranteed to be exactly
percent
.
The TABLESAMPLE
option is valid only with user-defined tables and Data Collector (DC) tables. Views and system tables are not supported.
ON
join-predicate
- Specifies the columns to join on.
Invalid for
NATURAL
and CROSS
joins, required for all other join types.
Alternative JOIN syntax options
Vertica supports two older join syntax conventions:
For details, see Join Syntax.
Examples
The following SELECT
statement qualifies its JOIN
clause with the TABLESAMPLE
option:
=> SELECT user_id.id, user_name.name FROM user_name TABLESAMPLE(50)
JOIN user_id TABLESAMPLE(50) ON user_name.id = user_id.id;
id | name
------+--------
489 | Markus
2234 | Cato
763 | Pompey
(3 rows)
2.2 - Table-reference
A temporary name used for references to table.
Syntax
[[database.]schema.]table[ [AS] alias]
Parameters
[
database
.]
schema
Database and schema. The default schema is public
. If you specify a database, it must be the current database.
table
- A table in the logical schema.
[AS]
alias
- A temporary name used for references to
table
.
3 - GROUP BY clause
Use the GROUP BY clause with aggregate functions in a SELECT statement to collect data across multiple records.
Use the GROUP BY
clause with aggregate functions in a SELECT
statement to collect data across multiple records. Vertica groups the results into one or more sets of rows that match an expression.
The GROUP BY
clause without aggregates is similar to using SELECT DISTINCT
.
ROLLUP
is an extension to the GROUP BY
clause. ROLLUP
performs subtotal aggregations.
Syntax
GROUP BY [/*+GBYTYPE(algorithm)*/] { expression | aggregate-expression }[,...]
Arguments
-
/*+GBYTYPE(algorithm)*/
- Specifies which algorithm has precedence for implementing this
GROUP BY
clause, over the algorithm the Vertica query optimizer might otherwise choose. You can set algorithm
to one of the following values:
For more information about both algorithms, see GROUP BY implementation options.
expression
- Any expression, including constants and column references in the tables specified in the FROM clause. For example:
column,... column, (expression)
aggregate-expression
- An ordered list of columns, expressions,
CUBE, GROUPING SETS
, or ROLLUP
aggregates.
You can include CUBE
and ROLLUP
aggregates within a GROUPING SETS
aggregate. CUBE
and ROLLUP
aggregates can result in a large amount of output. In that case, use GROUPING SETS
to return only certain results.
You cannot include any aggregates within a CUBE
or ROLLUP
expression.
You can append multiple GROUPING SETS
, CUBE
, or ROLLUP
aggregates in the same query. For example:
GROUP BY a,b,c,d, ROLLUP(a,b)
GROUP BY a,b,c,d, CUBE((a,b),c,d)
GROUP BY a,b,c,d, CUBE(a,b), ROLLUP (c,d)
GROUP BY ROLLUP(a), CUBE(b), GROUPING SETS(c)
GROUP BY a,b,c,d, GROUPING SETS ((a,d),(b,c),CUBE(a,b))
GROUP BY a,b,c,d, GROUPING SETS ((a,d),(b,c),(a,b),(a),(b),())
Usage considerations
-
expression cannot include aggregate functions. However, you can use the GROUP BY clause with CUBE, GROUPING SETS, and ROLLUP
to return summary values for each group.
-
When you create a GROUP BY clause, you must include all non-aggregated columns that appear in the SELECT
list.
-
If the GROUP BY
clause includes a WHERE
clause, Vertica ignores all rows that do not satisfy the WHERE
clause.
Examples
This example shows how to use the WHERE
clause with GROUP BY
. In this case, the example retrieves all employees whose last name begins with S, and ignores all rows that do not meet this criteria. The GROUP BY
clause uses the ILIKE
function to retrieve only last names beginning with S. The aggregate function SUM
computes the total vacation days for each group.
=> SELECT employee_last_name, SUM(vacation_days)
FROM employee_dimension
WHERE employee_last_name ILIKE 'S%'
GROUP BY employee_last_name;
employee_last_name | SUM
--------------------+------
Sanchez | 2892
Smith | 2672
Stein | 2660
(3 rows)
The GROUP BY
clause in the following example groups results by vendor region, and vendor region's biggest deal:
=> SELECT vendor_region, MAX(deal_size) AS "Biggest Deal"
FROM vendor_dimension
GROUP BY vendor_region;
vendor_region | Biggest Deal
---------------+--------------
East | 990889
MidWest | 699163
NorthWest | 76101
South | 854136
SouthWest | 609807
West | 964005
(6 rows)
The following query modifies the previous one with a HAVING
clause, which specifies to return only groups whose maximum deal size exceeds $900,000:
=> SELECT vendor_region, MAX(deal_size) as "Biggest Deal"
FROM vendor_dimension
GROUP BY vendor_region
HAVING MAX(deal_size) > 900000;
vendor_region | Biggest Deal
---------------+--------------
East | 990889
West | 964005
(2 rows)
You can use the GROUP BY
clause with one-dimensional arrays of scalar types. In the following example, grants is an ARRAY[VARCHAR] and grant_values is an ARRAY[INT].
=> SELECT department, grants, SUM(apply_sum(grant_values))
FROM employees
GROUP BY grants, department;
department | grants | SUM
------------+--------------------------+--------
Physics | ["US-7376","DARPA-1567"] | 235000
Astronomy | ["US-7376","DARPA-1567"] | 9000
Physics | ["US-7376"] | 30000
(3 rows)
The GROUP BY
clause without aggregates is similar to using SELECT DISTINCT
. For example, the following two queries return the same results:
=> SELECT DISTINCT household_id FROM customer_dimension;
=> SELECT household_id FROM customer_dimension GROUP BY household_id;
See also
3.1 - CUBE aggregate
Automatically performs all possible aggregations of the specified columns, as an extension to the GROUP BY clause.
Automatically performs all possible aggregations of the specified columns, as an extension to the GROUP BY clause.
You can use the ROLLUP clause with three grouping functions:
Syntax
GROUP BY group-expression[,...]
Parameters
group-expression
``
- One or both of the following:
-
An expression that is not an aggregate or a grouping function that includes constants and column references in FROM
-specified tables. For example:
column1, (column2+1), column3+column4
-
A multilevel expression, one of the following:
-
ROLLUP
-
CUBE
-
GROUPING SETS
Restrictions
- GROUP BY CUBE does not order data. If you want to sort data, use the ORDER BY clause. The ORDER BY clause must come after the GROUP BY clause.
- You can use CUBE inside a GROUPING SETS expression, but not inside a ROLLUP expression or another CUBE expression.
Levels of CUBE aggregation
If n is the number of grouping columns, CUBE creates 2n levels of aggregations. For example:
CUBE (A, B, C) creates all possible groupings, resulting in eight groups:
- (A, B, C)
- (A, B)
- (A, C)
- (B, C)
- (A)
- (B)
- (C)
- ()
If you increase the number of CUBE columns, the number of CUBE groupings increases exponentially. The CUBE query may be resource intensive and produce combinations that are not of interest. In that case, consider using theGROUPING SETS aggregate, which allows you to choose specific groupings.
Examples
Using CUBE to return all groupings
Suppose you have a table that contains information about family expenses for books and electricity:
=> SELECT * FROM expenses ORDER BY Category, Year;
Year | Category | Amount
------+-------------+--------
2005 | Books | 39.98
2007 | Books | 29.99
2008 | Books | 29.99
2005 | Electricity | 109.99
2006 | Electricity | 109.99
2007 | Electricity | 229.98
To aggregate the data by both Category and Year using the CUBE aggregate:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY CUBE(Category, Year) ORDER BY 1, 2, GROUPING_ID();
Category | Year | SUM
-------------+------+--------
Books | 2005 | 39.98
Books | 2007 | 29.99
Books | 2008 | 29.99
Books | | 99.96
Electricity | 2005 | 109.99
Electricity | 2006 | 109.99
Electricity | 2007 | 229.98
Electricity | | 449.96
| 2005 | 149.97
| 2006 | 109.99
| 2007 | 259.97
| 2008 | 29.99
| | 549.92
The results include subtotals for each category and year, and a grand total ($549.92).
Using CUBE with the HAVING clause
This example shows how you can restrict the GROUP BY results, use the HAVING clause with the CUBE aggregate. This query returns only the category totals and the full total:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY CUBE(Category,Year) HAVING GROUPING(Year)=1;
Category | Year | SUM
-------------+------+--------
Books | | 99.96
Electricity | | 449.96
| | 549.92
The next query returns only the aggregations for the two categories for each year. The GROUPING ID function specifies to omit the grand total ($549.92):
=> SELECT Category, Year, SUM (Amount) FROM expenses
GROUP BY CUBE(Category,Year) HAVING GROUPING_ID(Category,Year)<2
ORDER BY 1, 2, GROUPING_ID();
Category | Year | SUM
------------+------+--------
Books | 2005 | 39.98
Books | 2007 | 29.99
Books | 2008 | 29.99
Books | | 99.96
Electrical | 2005 | 109.99
Electrical | 2006 | 109.99
Electrical | 2007 | 229.98
Electrical | | 449.96
See also
3.2 - GROUPING SETS aggregate
The GROUPING SETS aggregate is an extension to the GROUP BY clause that automatically performs subtotal aggregations on groupings that you specify.
The GROUPING SETS
aggregate is an extension to the
GROUP BY
clause that automatically performs subtotal aggregations on groupings that you specify.
You can use the GROUPING SETS
clause with three grouping functions:
To sort data, use the
ORDER BY
clause. The ORDER BY
clause must follow the GROUP BY
clause.
Syntax
GROUP BY group-expression[,...]
Parameters
group-expression
``
- One or both of the following:
-
An expression that is not an aggregate or a grouping function that includes constants and column references in FROM
-specified tables. For example:
column1, (column2+1), column3+column4
-
A multilevel expression, one of the following:
-
ROLLUP
-
CUBE
-
GROUPING SETS
Defining the groupings
GROUPING SETS allows you to specify exactly which groupings you want in the results. You can also concatenate the groupings as follows:
The following example clauses result in the groupings shown.
This clause... |
Defines groupings... |
...GROUP BY GROUPING SETS(A,B,C,D)... |
(A), (B), (C), (D) |
...GROUP BY GROUPING SETS((A),(B),(C),(D))... |
(A), (B), (C), (D) |
...GROUP BY GROUPING SETS((A,B,C,D))... |
(A, B, C, D) |
...GROUP BY GROUPING SETS(A,B),GROUPING SETS(C,D)... |
(A, C), (B, C), (A, D), (B, C) |
...GROUP BY GROUPING SETS((A,B)),GROUPING SETS(C,D)... |
(A, B, C), (A, B, D) |
...GROUP BY GROUPING SETS(A,B),GROUPING SETS(ROLLUP(C,D))... |
(A,B), (A,B,C), (A,B,C,D) |
...GROUP BY A,B,C,GROUPING SETS(ROLLUP(C, D))... |
(A, B, C, D), (A, B, C), (A, B, C)
The clause contains two groups (A, B, C). In the HAVING clause, use the GROUP_ID function as a predicate, to eliminate the second grouping.
|
Example: selecting groupings
This example shows how to select only those groupings you want. Suppose you want to aggregate on columns only, and you do not need the grand total. The first query omits the total. In the second query, you add () to the GROUPING SETS list to get the total. Use the ORDER BY clause to sort the results by grouping:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY GROUPING SETS((Category, Year), (Year))
ORDER BY 1, 2, GROUPING_ID();
Category | Year | SUM
------------+------+--------
Books | 2005 | 39.98
Books | 2007 | 29.99
Books | 2008 | 29.99
Electrical | 2005 | 109.99
Electrical | 2006 | 109.99
Electrical | 2007 | 229.98
| 2005 | 149.97
| 2006 | 109.99
| 2007 | 259.97
| 2008 | 29.99
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY GROUPING SETS((Category, Year), (Year), ())
ORDER BY 1, 2, GROUPING_ID();
Category | Year | SUM
------------+------+--------
Books | 2005 | 39.98
Books | 2007 | 29.99
Books | 2008 | 29.99
Electrical | 2005 | 109.99
Electrical | 2006 | 109.99
Electrical | 2007 | 229.98
| 2005 | 149.97
| 2006 | 109.99
| 2007 | 259.97
| 2008 | 29.99
| | 549.92
See also
3.3 - ROLLUP aggregate
Automatically performs subtotal aggregations as an extension to the GROUP BY clause.
Automatically performs subtotal aggregations as an extension to the GROUP BY clause. ROLLUP
performs these aggregations across multiple dimensions, at different levels, within a single SQL query.
You can use the ROLLUP
clause with three grouping functions:
Syntax
ROLLUP
grouping-expression
[,...]
Parameters
group-expression
- One or both of the following:
-
An expression that is not an aggregate or a grouping function that includes constants and column references in FROM
-specified tables. For example:
column1, (column2+1), column3+column4
-
A multilevel expression, one of the following:
-
ROLLUP
-
CUBE
-
GROUPING SETS
Restrictions
GROUP BY ROLLUP
does not sort results. To sort data, an ORDER BY
clause must follow the GROUP BY
clause.
Levels of aggregation
If n is the number of grouping columns, ROLLUP
creates n+1 levels of subtotals and grand total. Because ROLLUP
removes the right-most column at each step, specify column order carefully.
Suppose that ROLLUP(A, B, C)
creates four groups:
Because ROLLUP
removes the right-most column at each step, there are no groups for (A, C)
and (B, C)
.
If you enclose two or more columns in parentheses, GROUP BY
treats them as a single entity. For example:
-
ROLLUP(A, B, C)
creates four groups:
(A, B, C)
(A, B)
(A)
()
-
ROLLUP((A, B), C)
treats (A, B)
as a single entity and creates three groups:
(A, B, C)
(A, B)
()
Example: aggregating the full data set
The following example shows how to use the GROUP BY
clause to determine family expenses for electricity and books over several years. The SUM
aggregate function computes the total amount of money spent in each category per year.
Suppose you have a table that contains information about family expenses for books and electricity:
=> SELECT * FROM expenses ORDER BY Category, Year;
Year | Category | Amount
------+-------------+--------
2005 | Books | 39.98
2007 | Books | 29.99
2008 | Books | 29.99
2005 | Electricity | 109.99
2006 | Electricity | 109.99
2007 | Electricity | 229.98
For the expenses
table, ROLLUP
computes the subtotals in each category between 2005–2007:
-
Books: $99.96
-
Electricity: $449.96
-
Grand total: $549.92.
Use the ORDER BY
clause to sort the results:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY ROLLUP(Category, Year) ORDER BY 1,2, GROUPING_ID();
Category | Year | SUM
-------------+------+--------
Books | 2005 | 39.98
Books | 2007 | 29.99
Books | 2008 | 29.99
Books | | 99.96
Electricity | 2005 | 109.99
Electricity | 2006 | 109.99
Electricity | 2007 | 229.98
Electricity | | 449.96
| | 549.92
Example: using ROLLUP with the HAVING clause
This example shows how to use the HAVING clause with ROLLUP
to restrict the GROUP BY
results. The following query produces only those ROLLUP
categories where year
is subtotaled, based on the expression in the GROUPING
function:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY ROLLUP(Category,Year) HAVING GROUPING(Year)=1
ORDER BY 1, 2, GROUPING_ID();
Category | Year | SUM
-------------+------+--------
Books | | 99.96
Electricity | | 449.96
| | 549.92
The next example rolls up on (Category
, Year
), but not on the full results. The GROUPING_ID
function specifies to aggregate less than three levels:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY ROLLUP(Category,Year) HAVING GROUPING_ID(Category,Year)<3
ORDER BY 1, 2, GROUPING_ID();
Category | Year | SUM
-------------+------+--------
Books | 2005 | 39.98
Books | 2007 | 29.99
Books | 2008 | 29.99
Books | | 99.96
Electricity | 2005 | 109.99
Electricity | 2006 | 109.99
Electricity | 2007 | 229.98
Electricity | | 449.96
See also
4 - HAVING clause
Filters the results of a GROUP BY clause.
Filters the results of a GROUP BY clause. Semantically, the HAVING clause occurs after the GROUP BY operation. It was added to the SQL standard because a WHERE clause cannot specify aggregate functions.
Syntax
HAVING condition[,...]
Parameters
condition
- Unambiguously references a grouping column, unless the reference appears in an aggregate function.
Examples
The following example returns the employees with salaries greater than $800,000:
=> SELECT employee_last_name, MAX(annual_salary) as highest_salary FROM employee_dimension
GROUP BY employee_last_name HAVING MAX(annual_salary) > 800000 ORDER BY highest_salary DESC;
employee_last_name | highest_salary
--------------------+----------------
Sanchez | 992363
Vogel | 983634
Vu | 977716
Lewis | 957949
Taylor | 953373
King | 937765
Gauthier | 927335
Garnett | 903104
Bauer | 901181
Jones | 885395
Rodriguez | 861647
Young | 846657
Greenwood | 837543
Overstreet | 831317
Garcia | 811231
(15 rows)
5 - INTERSECT clause
Calculates the intersection of the results of two or more SELECT queries.
Calculates the intersection of the results of two or more SELECT queries. INTERSECT returns distinct values by both the query on the left and right sides of the INTERSECT operand.
Syntax
select-stmt
INTERSECT query[...]
[ order-by-clause [ offset-clause ]]
[ limit-clause ]
Notes
-
Use the INTERSECT clause to return all elements that are common to the results of all the SELECT queries. The INTERSECT query operates on the results of two or more SELECT queries. INTERSECT returns only the rows that are returned by all the specified queries.
-
You cannot use the ALL keyword with an INTERSECT query.
-
The results of each SELECT query must be union compatible; they must return the same number of columns, and the corresponding columns must have compatible data types. For example, you cannot use the INTERSECT clause on a column of type INTEGER and a column of type VARCHAR. If the SELECT queries do not meet these criteria, Vertica returns an error.
Note
The
Data type coercion chart lists the data types that can be cast to other data types. If one data type can be cast to the other, those two data types are compatible.
-
Order the results of an INTERSECT operation by using an ORDER BY clause. In the ORDER BY list, specify the column names from the leftmost SELECT statement or specify integers that indicate the position of the columns by which to sort.
-
You can use INTERSECT in FROM, WHERE, and HAVING clauses.
-
The rightmost ORDER BY, LIMIT, or OFFSET clauses in an INTERSECT query do not need to be enclosed in parentheses because the rightmost query specifies that Vertica perform the operation on the results of the INTERSECT operation. Any ORDER BY, LIMIT, or OFFSET clauses contained in SELECT queries that appear earlier in the INTERSECT query must be enclosed in parentheses.
-
The order by column names is from the first select.
-
Vertica supports INTERSECT noncorrelated subquery predicates. For example:
=> SELECT * FROM T1
WHERE T1.x IN
(SELECT MAX(c1) FROM T2
INTERSECT
SELECT MAX(cc1) FROM T3
INTERSECT
SELECT MAX(d1) FROM T4);
Examples
Consider the following three tables:
Company_A
id emp_lname dept sales
------+------------+----------------+-------
1234 | Stephen | auto parts | 1000
5678 | Alice | auto parts | 2500
9012 | Katherine | floral | 500
3214 | Smithson | sporting goods | 1500
Company_B
id emp_lname dept sales
------+------------+-------------+-------
4321 | Marvin | home goods | 250
9012 | Katherine | home goods | 500
8765 | Bob | electronics | 20000
3214 | Smithson | home goods | 1500
Company_C
id | emp_lname | dept | sales
------+-----------+----------------+-------
3214 | Smithson | sporting goods | 1500
5432 | Madison | sporting goods | 400
7865 | Cleveland | outdoor | 1500
1234 | Stephen | floral | 1000
The following query returns the IDs and last names of employees that exist in both Company_A and Company_B:
=> SELECT id, emp_lname FROM Company_A
INTERSECT
SELECT id, emp_lname FROM Company_B;
id | emp_lname
------+-----------
3214 | Smithson
9012 | Katherine
(2 rows)
The following query returns the same two employees in descending order of sales:
=> SELECT id, emp_lname, sales FROM Company_A
INTERSECT
SELECT id, emp_lname, sales FROM Company_B
ORDER BY sales DESC;
id | emp_lname | sales
------+-----------+-------
3214 | Smithson | 1500
9012 | Katherine | 500
(2 rows)
The following query returns the employee who works for both companies whose sales in Company_B are greater than 1000:
=> SELECT id, emp_lname, sales FROM Company_A
INTERSECT
(SELECT id, emp_lname, sales FROM company_B WHERE sales > 1000)
ORDER BY sales DESC;
id | emp_lname | sales
------+-----------+-------
3214 | Smithson | 1500
(1 row)
In the following query returns the ID and last name of the employee who works for all three companies:
=> SELECT id, emp_lname FROM Company_A
INTERSECT
SELECT id, emp_lname FROM Company_B
INTERSECT
SELECT id, emp_lname FROM Company_C;
id | emp_lname
------+-----------
3214 | Smithson
(1 row)
The following query shows the results of a mismatched data types; these two queries are not union compatible:
=> SELECT id, emp_lname FROM Company_A
INTERSECT
SELECT emp_lname, id FROM Company_B;
ERROR 3429: For 'INTERSECT', types int and varchar are inconsistent
DETAIL: Columns: id and emp_lname
Using the VMart example database, the following query returns information about all Connecticut-based customers who bought items online and whose purchase amounts were between $400 and $500:
=> SELECT customer_key, customer_name from public.customer_dimension
WHERE customer_key IN (SELECT customer_key
FROM online_sales.online_sales_fact
WHERE sales_dollar_amount > 400
INTERSECT
SELECT customer_key FROM online_sales.online_sales_fact
WHERE sales_dollar_amount > 500)
AND customer_state = 'CT' ORDER BY customer_key;
customer_key | customer_name
--------------+------------------------
39 | Sarah S. Winkler
44 | Meghan H. Overstreet
70 | Jack X. Cleveland
103 | Alexandra I. Vu
110 | Matt . Farmer
173 | Mary R. Reyes
188 | Steve G. Williams
233 | Theodore V. McNulty
250 | Marcus E. Williams
294 | Samantha V. Young
313 | Meghan P. Pavlov
375 | Sally N. Vu
384 | Emily R. Smith
387 | Emily L. Garcia
...
The previous query and the next one are equivalent, and return the same results:
=> SELECT customer_key,customer_name FROM public.customer_dimension
WHERE customer_key IN (SELECT customer_key
FROM online_sales.online_sales_fact
WHERE sales_dollar_amount > 400
AND sales_dollar_amount < 500)
AND customer_state = 'CT' ORDER BY customer_key;
See also
6 - INTO TABLE clause
Creates a table from a query result set.
Creates a table from a query result set.
Syntax
Permanent table:
INTO [TABLE] [[database.]schema.]table
Temporary table:
INTO [scope] TEMP[ORARY] [TABLE] [[database.]schema.]table
[ ON COMMIT { DELETE | PRESERVE } ROWS ]
Parameters
scope
- Specifies visibility of a temporary table definition:
-
GLOBAL
(default): The table definition is visible to all sessions, and persists until you explicitly drop the table.
-
LOCAL
: The table definition is visible only to the session in which it is created, and is dropped when the session ends.
Regardless of this setting, retention of temporary table data is set by the keywords ON COMMIT DELETE ROWS
and ON COMMIT PRESERVE ROWS
(see below).
For more information, see Creating temporary tables.
[
database
.]
schema
Database and schema. The default schema is public
. If you specify a database, it must be the current database.
table
- The name of the table to create.
-
ON COMMIT { DELETE | PRESERVE } ROWS
- Specifies whether data is transaction- or session-scoped:
-
DELETE
(default) marks the temporary table for transaction-scoped data. Vertica removes all table data after each commit.
-
PRESERVE
marks the temporary table for session-scoped data, which is preserved beyond the lifetime of a single transaction. Vertica removes all table data when the session ends.
Examples
The following SELECT
statement has an INTO TABLE
clause that creates table newTable
from customer_dimension
:
=> SELECT * INTO TABLE newTable FROM customer_dimension;
The following SELECT
statement creates temporary table newTempTable
. By default, temporary tables are created at a global scope, so its definition is visible to other sessions and persists until it is explicitly dropped. No customer_dimension
data is copied into the new table, and Vertica issues a warning accordingly:
=> SELECT * INTO TEMP TABLE newTempTable FROM customer_dimension;
WARNING 4102: No rows are inserted into table "public"."newTempTable" because
ON COMMIT DELETE ROWS is the default for create temporary table
HINT: Use "ON COMMIT PRESERVE ROWS" to preserve the data in temporary table
CREATE TABLE
The following SELECT
statement creates local temporary table newTempTableLocal
. This table is visible only to the session in which it was created, and is automatically dropped when the session ends. The INTO TABLE
clause includes ON COMMIT PRESERVE ROWS
, so Vertica copies all selection data into the new table:
=> SELECT * INTO LOCAL TEMP TABLE newTempTableLocal ON COMMIT PRESERVE ROWS
FROM customer_dimension;
CREATE TABLE
7 - LIMIT clause
Specifies the maximum number of result set rows to return, either from the entire result set, or from windows of a partitioned result set.
Specifies the maximum number of result set rows to return, either from the entire result set, or from windows of a partitioned result set.
Syntax
Applied to entire result set:
LIMIT { num-rows | ALL }
Applied to windows of a partitioned result set:
LIMIT num-rows OVER ( PARTITION BY column-expr-x, ORDER BY column-expr-y [ASC | DESC]
Parameters
num-rows
- The maximum number of rows to return.
ALL
- Returns all rows, valid only when LIMIT is applied to the entire result set.
OVER()
- Specifies how to partition and sort input data with respect to the current row. The input data is the result set that the query returns after it evaluates FROM, WHERE, GROUP BY, and HAVING clauses.
For details, see Using LIMIT with Window Partitioning below.
Limiting returned rows
LIMIT specifies to return only top-k rows from the queried dataset. Row precedence is determined by the query's ORDER BY clause.
Important
The following dependencies apply:
-
Always use an ORDER BY clause with LIMIT. Otherwise, the query returns an undefined subset of the result set. The ORDER BY clause must precede LIMIT.
-
When a SELECT statement specifies both LIMIT and OFFSET, Vertica first processes the OFFSET, and then applies LIMIT to the remaining rows.
For example, the following query returns the first 10 rows of data in table customer_dimension
, as ordered by columns store_region
and number_of_employees
:
=> SELECT store_region, store_city||', '||store_state location, store_name, number_of_employees
FROM store.store_dimension WHERE number_of_employees <= 12 ORDER BY store_region, number_of_employees LIMIT 10;
store_region | location | store_name | number_of_employees
--------------+----------------+------------+---------------------
East | Stamford, CT | Store219 | 12
East | New Haven, CT | Store66 | 12
East | New York, NY | Store122 | 12
MidWest | South Bend, IN | Store134 | 10
MidWest | Evansville, IN | Store30 | 11
MidWest | Green Bay, WI | Store27 | 12
South | Mesquite, TX | Store124 | 10
South | Cape Coral, FL | Store18 | 11
South | Beaumont, TX | Store226 | 11
South | Houston, TX | Store33 | 11
(10 rows)
Using LIMIT with window partitioning
You can use LIMIT to apply window partitioning on query results, and limit the number of rows that are returned in each window:
SELECT ... FROM dataset LIMIT num-rows OVER ( PARTITION BY column-expr-x, ORDER BY column-expr-y [ASC | DESC] )
where querying dataset
returns num-rows
rows in each column-expr-x
partition with the highest or lowest values of column-expr-y
.
For example, the following statement queries table store.store_dimension
and specifies window partitioning on the result set. LIMIT is set to 2, so each window partition can display no more than two rows. The OVER
clause specifies to partition the result set by store_region
, where each partition window displays for one region the two stores with the smallest number of employees:
=> SELECT store_region, store_city||', '||store_state location, store_name, number_of_employees FROM store.store_dimension
LIMIT 2 OVER (PARTITION BY store_region ORDER BY number_of_employees ASC);
store_region | location | store_name | number_of_employees
--------------+---------------------+------------+---------------------
West | Norwalk, CA | Store43 | 10
West | Lancaster, CA | Store95 | 11
East | Stamford, CT | Store219 | 12
East | New York, NY | Store122 | 12
SouthWest | North Las Vegas, NV | Store170 | 10
SouthWest | Phoenix, AZ | Store228 | 11
NorthWest | Bellevue, WA | Store200 | 19
NorthWest | Portland, OR | Store39 | 22
MidWest | South Bend, IN | Store134 | 10
MidWest | Evansville, IN | Store30 | 11
South | Mesquite, TX | Store124 | 10
South | Beaumont, TX | Store226 | 11
(12 rows)
8 - MATCH clause
A SQL extension that lets you screen large amounts of historical data in search of event patterns, the MATCH clause provides subclasses for analytic partitioning and ordering and matches rows from the result table based on a pattern you define.
A SQL extension that lets you screen large amounts of historical data in search of event patterns, the MATCH clause provides subclasses for analytic partitioning and ordering and matches rows from the result table based on a pattern you define.
You specify a pattern as a regular expression, which is composed of event types defined in the DEFINE subclause, where each event corresponds to a row in the input table. Then you can search for the pattern within a sequence of input events. Pattern matching returns the contiguous sequence of rows that conforms to PATTERN subclause. For example, pattern P
(A B* C
) consist of three event types: A, B, and C. When Vertica finds a match in the input table, the associated pattern instance must be an event of type A followed by 0 or more events of type B, and an event of type C.
Pattern matching is particularly useful for clickstream analysis where you might want to identify users' actions based on their Web browsing behavior (page clicks). For details, see Event series pattern matching.
Syntax
MATCH ( [ PARTITION BY table-column ] ORDER BY table-column
DEFINE event-name AS boolean-expr [,...]
PATTERN pattern-name AS ( regexp)
[ rows-match-clause ] )
Arguments
PARTITION BY
- Defines the window data scope in which the pattern, defined in the PATTERN subclause, is matched. The partition clause partitions the data by matched patterns defined in the PATTERN subclause. For each partition, data is sorted by the ORDER BY clause. If the partition clause is omitted, the entire data set is considered a single partition.
ORDER BY
- Defines the window data scope in which the pattern, defined in the PATTERN subclause, is matched. For each partition, the order clause specifies how the input data is ordered for pattern matching.
Note
The ORDER BY clause is mandatory.
DEFINE
- Defines the boolean expressions that make up the event types in the regular expressions. For example:
DEFINE
Entry AS RefURL NOT ILIKE '%website2.com%' AND PageURL ILIKE
'%website2.com%',
Onsite AS PageURL ILIKE '%website2.com%' AND Action='V',
Purchase AS PageURL ILIKE '%website2.com%' AND Action='P'
The DEFINE subclause accepts a maximum of 52 events. See Event series pattern matching for examples.
event-name
- Name of the event to evaluate for each row—in the earlier example,
Entry, Onsite, Purchase
.
Note
Event names are case insensitive and follow the same naming conventions as those used for tables and columns.
boolean-expr
- Expression that returns true or false. boolean_expr can include Logical operators and relational (comparison) operators. For example:
Purchase AS PageURL ILIKE '%website2.com%' AND Action = 'P'
PATTERN
pattern-name
- Name of the pattern defined in the PATTERN subclause; for example, P is the pattern name defined below:
PATTERN P AS (...)
A PATTERN is a search pattern that is comprised of a name and a regular expression.
Note
Vertica supports one pattern per query.
regexp
- A regular expression comprised of event types defined in the
DEFINE
subclause and one or more quantifiers below. When Vertica evaluates the MATCH
clause, the regular expression identifies the rows that meet the expression criteria.
-
*
: Match 0 or more times
-
*?
: Match 0 or more times, not greedily
-
+
: Match 1 or more times
-
+?
: Match 1 or more times, not greedily
-
?
: Match 0 or 1 time
-
??
: Match 0 or 1 time, not greedily
-
*+
: Match 0 or more times, possessive
-
++
: Match 1 or more times, possessive
-
?+
: Match 0 or 1 time, possessive
-
|
: Alternation. Matches expression before or after the vertical bar. Similar to a Boolean OR.
rows-match-clause
- Specifies how to resolve more than one event evaluating to true for a single row, one of the following:
-
ROWS MATCH ALL EVENTS
: If more than one event evaluates to true for a single row, Vertica returns this error :
ERROR: pattern events must be mutually exclusive
HINT: try using ROWS MATCH FIRST EVENT
-
ROWS MATCH FIRST EVENT
: If more than one event evaluates to true for a given row, Vertica uses the first event in the SQL statement for that row.
Pattern semantic evaluation
-
The semantic evaluating ordering of the SQL clauses is: FROM -> WHERE -> PATTERN MATCH -> SELECT.
-
Data is partitioned as specified in the PARTITION BY clause. If the partition clause is omitted, the entire data set is considered a single partition.
-
For each partition, the order clause specifies how the input data is ordered for pattern matching.
-
Events are evaluated for each row. A row could have 0, 1, or N
events evaluate to true. If more than one event evaluates to true for the same row, Vertica returns a run-time error unless you specify ROWS MATCH FIRST EVENT. If you specify ROWS MATCH FIRST EVENT and more than one event evaluates to TRUE for a single row, Vertica chooses the event that was defined first in the SQL statement to be the event it uses for the row.
-
Vertica performs pattern matching by finding the contiguous sequence of rows that conforms to the pattern defined in the PATTERN subclause.
For each match, Vertica outputs the rows that contribute to the match. Rows not part of the match (do not satisfy one or more predicates) are not output.
-
Vertica reports only non-overlapping matches. If an overlap occurs, Vertica chooses the first match found in the input stream. After finding the match, Vertica looks for the next match, starting at the end of the previous match.
-
Vertica reports the longest possible match, not a subset of a match. For example, consider pattern: AB with input: AAAB. Because A uses the greedy regular expression quantifier (), Vertica reports all A inputs (AAAB), not AAB, AB, or B.
Notes and restrictions
-
DISTINCT and GROUP BY/HAVING clauses are not allowed in pattern match queries.
-
The following expressions are not allowed in the DEFINE subclause:
-
Subqueries, such as DEFINE X AS c IN ELECT c FROM table
-
Analytic functions, such as DEFINE X AS c <
LEA1) OVER (ORDER BY 1)
-
Aggregate functions, such as DEFINE X AS c <
MA1)
-
You cannot use the same pattern name to define a different event; for example, the following is not allowed for X:
DEFINE X AS c1 < 3
X AS c1 >= 3
-
Used with MATCH clause, Vertica MATCH clause functions provide additional data about the patterns it finds. For example, you can use the functions to return values representing the name of the event that matched the input row, the sequential number of the match, or a partition-wide unique identifier for the instance of the pattern that matched.
Examples
For examples, see Event series pattern matching.
See also
8.1 - Event series pattern matching
The SQL MATCH clause syntax lets you screen large amounts of historical data in search of event patterns.
The SQL MATCH clause syntax lets you screen large amounts of historical data in search of event patterns. You specify a pattern as a regular expression and can then search for the pattern within a sequence of input events. MATCH provides subclauses for analytic data partitioning and ordering, and the pattern matching occurs on a contiguous set of rows.
Pattern matching is particularly useful for clickstream analysis where you might want to identify users' actions based on their Web browsing behavior (page clicks). A typical online clickstream funnel is:
Company home page -> product home page -> search -> results -> purchase online
Using this clickstream funnel, you can search for a match on the user's sequence of web clicks and identify that user:
-
Landed on the company home page
-
Navigated to the product page
-
Ran a search
-
Clicked a link from the search results
-
Made a purchase
Clickstream funnel schema
The examples in this topic use this clickstream funnel and the following clickstream_log
table schema:
=> CREATE TABLE clickstream_log (
uid INT, --user ID
sid INT, --browsing session ID, produced by previous sessionization computation
ts TIME, --timestamp that occurred during the user's page visit
refURL VARCHAR(20), --URL of the page referencing PageURL
pageURL VARCHAR(20), --URL of the page being visited
action CHAR(1) --action the user took after visiting the page ('P' = Purchase, 'V' = View)
);
INSERT INTO clickstream_log VALUES (1,100,'12:00','website1.com','website2.com/home', 'V');
INSERT INTO clickstream_log VALUES (1,100,'12:01','website2.com/home','website2.com/floby', 'V');
INSERT INTO clickstream_log VALUES (1,100,'12:02','website2.com/floby','website2.com/shamwow', 'V');
INSERT INTO clickstream_log values (1,100,'12:03','website2.com/shamwow','website2.com/buy', 'P');
INSERT INTO clickstream_log values (2,100,'12:10','website1.com','website2.com/home', 'V');
INSERT INTO clickstream_log values (2,100,'12:11','website2.com/home','website2.com/forks', 'V');
INSERT INTO clickstream_log values (2,100,'12:13','website2.com/forks','website2.com/buy', 'P');
COMMIT;
Here's the clickstream_log table's output:
=> SELECT * FROM clickstream_log;
uid | sid | ts | refURL | pageURL | action
-----+-----+----------+----------------------+----------------------+--------
1 | 100 | 12:00:00 | website1.com | website2.com/home | V
1 | 100 | 12:01:00 | website2.com/home | website2.com/floby | V
1 | 100 | 12:02:00 | website2.com/floby | website2.com/shamwow | V
1 | 100 | 12:03:00 | website2.com/shamwow | website2.com/buy | P
2 | 100 | 12:10:00 | website1.com | website2.com/home | V
2 | 100 | 12:11:00 | website2.com/home | website2.com/forks | V
2 | 100 | 12:13:00 | website2.com/forks | website2.com/buy | P
(7 rows)
Examples
This example includes the Vertica MATCH clause functions to analyze users' browsing history over website2.com. It identifies patterns where the user performed the following tasks:
-
Landed on website2.com from another web site (Entry)
-
Browsed to any number of other pages (Onsite)
-
Made a purchase (Purchase)
In the following statement, pattern P (Entry Onsite* Purchase
) consist of three event types: Entry, Onsite, and Purchase. When Vertica finds a match in the input table, the associated pattern instance must be an event of type Entry followed by 0 or more events of type Onsite, and an event of type Purchase
=> SELECT uid,
sid,
ts,
refurl,
pageurl,
action,
event_name(),
pattern_id(),
match_id()
FROM clickstream_log
MATCH
(PARTITION BY uid, sid ORDER BY ts
DEFINE
Entry AS RefURL NOT ILIKE '%website2.com%' AND PageURL ILIKE '%website2.com%',
Onsite AS PageURL ILIKE '%website2.com%' AND Action='V',
Purchase AS PageURL ILIKE '%website2.com%' AND Action = 'P'
PATTERN
P AS (Entry Onsite* Purchase)
ROWS MATCH FIRST EVENT);
In the output below, the first four rows represent the pattern for user 1's browsing activity, while the following three rows show user 2's browsing habits.
uid | sid | ts | refurl | pageurl | action | event_name | pattern_id | match_id
-----+-----+----------+----------------------+----------------------+--------+------------+------------+----------
1 | 100 | 12:00:00 | website1.com | website2.com/home | V | Entry | 1 | 1
1 | 100 | 12:01:00 | website2.com/home | website2.com/floby | V | Onsite | 1 | 2
1 | 100 | 12:02:00 | website2.com/floby | website2.com/shamwow | V | Onsite | 1 | 3
1 | 100 | 12:03:00 | website2.com/shamwow | website2.com/buy | P | Purchase | 1 | 4
2 | 100 | 12:10:00 | website1.com | website2.com/home | V | Entry | 1 | 1
2 | 100 | 12:11:00 | website2.com/home | website2.com/forks | V | Onsite | 1 | 2
2 | 100 | 12:13:00 | website2.com/forks | website2.com/buy | P | Purchase | 1 | 3
(7 rows)
See also
9 - MINUS clause
MINUS is an alias for EXCEPT.
MINUS is an alias for EXCEPT.
10 - OFFSET clause
Omits a specified number of rows from the beginning of the result set.
Omits a specified number of rows from the beginning of the result set.
Syntax
OFFSET rows
Parameters
start-row
- Specifies the first row to include in the result set. All preceding rows are omitted.
Dependencies
-
Use an ORDER BY clause with OFFSET. Otherwise, the query returns an undefined subset of the result set.
-
OFFSET must follow the ORDER BY clause in a SELECT statement or UNION clause.
-
When a SELECT statement or UNION clause specifies both LIMIT and OFFSET, Vertica first processes the OFFSET statement, and then applies the LIMIT statement to the remaining rows.
Examples
The following query returns 14 rows from the customer_dimension
table:
=> SELECT customer_name, customer_gender FROM customer_dimension
WHERE occupation='Dancer' AND customer_city = 'San Francisco' ORDER BY customer_name;
customer_name | customer_gender
----------------------+-----------------
Amy X. Lang | Female
Anna H. Li | Female
Brian O. Weaver | Male
Craig O. Pavlov | Male
Doug Z. Goldberg | Male
Harold S. Jones | Male
Jack E. Perkins | Male
Joseph W. Overstreet | Male
Kevin . Campbell | Male
Raja Y. Wilson | Male
Samantha O. Brown | Female
Steve H. Gauthier | Male
William . Nielson | Male
William Z. Roy | Male
(14 rows)
If you modify the previous query to specify an offset of 8 (OFFSET 8
), Vertica skips the first eight rows of the previous result set. The query returns the following results:
=> SELECT customer_name, customer_gender FROM customer_dimension
WHERE occupation='Dancer' AND customer_city = 'San Francisco' ORDER BY customer_name OFFSET 8;
customer_name | customer_gender
-------------------+-----------------
Kevin . Campbell | Male
Raja Y. Wilson | Male
Samantha O. Brown | Female
Steve H. Gauthier | Male
William . Nielson | Male
William Z. Roy | Male
(6 rows)
11 - ORDER BY clause
Sorts a query result set on one or more columns or column expressions.
Sorts a query result set on one or more columns or column expressions. Vertica uses the current locale and collation sequence to compare and sort string values.
Note
Vertica projection data is always stored sorted by the ASCII (binary) collating sequence.
Syntax
ORDER BY expression [ ASC | DESC ] [,...]
Parameters
expression
- One of the following:
-
Name or ordinal number of a SELECT list item. The ordinal number refers to the position of the result column, counting from the left beginning at one. Use them to order by a column whose name is not unique. Ordinal numbers are invalid for an ORDER BY clause of an analytic function's OVER clause.
-
Arbitrary expression formed from columns that do not appear in the SELECT
list
-
CASE expression.
Note
You cannot use DISTINCT on a collection column if it is also included in the sort order.
- ASC | DESC
- Specifies whether to sort values in ascending or descending order. NULL values are either first or last in the sort order, depending on data type:
-
INTEGER, INT, DATE/TIME: NULL has the smallest value.
-
FLOAT, BOOLEAN, CHAR, VARCHAR, ARRAY, SET: NULL has the largest value
Examples
The follow example returns all the city and deal size for customer Metamedia, sorted by deal size in descending order.
=> SELECT customer_city, deal_siz FROM customer_dimension WHERE customer_name = 'Metamedia'
ORDER BY deal_size DESC;
customer_city | deal_size
------------------+-----------
El Monte | 4479561
Athens | 3815416
Ventura | 3792937
Peoria | 3227765
Arvada | 2671849
Coral Springs | 2643674
Fontana | 2374465
Rancho Cucamonga | 2214002
Wichita Falls | 2117962
Beaumont | 1898295
Arvada | 1321897
Waco | 1026854
Joliet | 945404
Hartford | 445795
(14 rows)
The following example uses a transform function. It returns an error because the ORDER BY column is not in the window partition.
=> CREATE TABLE t(geom geometry(200), geog geography(200));
=> SELECT PolygonPoint(geom) OVER(PARTITION BY geom)
AS SEL_0 FROM t ORDER BY geog;
ERROR 2521: Cannot specify anything other than user defined transforms and partitioning expressions in the ORDER BY list
The following example, using the same table, corrects this error.
=> SELECT PolygonPoint(geom) OVER(PARTITION BY geom)
AS SEL_0 FROM t ORDER BY geom;
The following example uses an array in the ORDER BY clause.
=> SELECT * FROM employees
ORDER BY grant_values;
id | department | grants | grant_values
----+------------+--------------------------+----------------
36 | Astronomy | ["US-7376","DARPA-1567"] | [5000,4000]
36 | Physics | ["US-7376","DARPA-1567"] | [10000,25000]
33 | Physics | ["US-7376"] | [30000]
42 | Physics | ["US-7376","DARPA-1567"] | [65000,135000]
(4 rows)
12 - TIMESERIES clause
Provides gap-filling and interpolation (GFI) computation, an important component of time series analytics computation.
Provides gap-filling and interpolation (GFI) computation, an important component of time series analytics computation. See Time series analytics for details and examples.
Syntax
TIMESERIES slice-time AS 'length-and-time-unit-expr' OVER (
[ PARTITION BY column-expr[,...] ] ORDER BY time-expr ) [ ORDER BY table-column[,...] ]
Parameters
slice-time
- A time column produced by the
TIMESERIES
clause, which stores the time slice start times generated from gap filling.
Note: This parameter is an alias, so you can use any name that an alias would take.
length-and-time-unit-expr
- An
INTERVAL DAY TO SECOND
literal that specifies the length of time unit of time slice computation. For example:
`TIMESERIES slice_time AS '3 seconds' ...
OVER()
- Specifies partitioning and ordering for the function.
OVER()
also specifies that the time series function operates on a query result set—that is, the rows that are returned after the FROM
, WHERE
, GROUP BY
, and HAVING
clauses are evaluated.
-
PARTITION BY (column-expr`[,...] )`
- Partitions the data by the specified column expressions. Gap filling and interpolation is performed on each partition separately.
ORDER BY
time-expr
- Sorts the data by the
TIMESTAMP
expression time-expr
, which computes the time information of the time series data.
Note
The TIMESERIES
clause requires an ORDER BY
operation on the timestamp column.
Notes
If the window-partition-clause
is not specified in TIMESERIES OVER(), for each defined time slice, exactly one output record is produced; otherwise, one output record is produced per partition per time slice. Interpolation is computed there.
Given a query block that contains a TIMESERIES clause, the following are the semantic phases of execution (after evaluating the FROM and the optional WHERE clauses):
-
Compute time-expression.
-
Perform the same computation as the TIME_SLICE() function on each input record based on the result of time-exp
and 'length-and-time-unit-expr'
.
-
Perform gap filling to generate time slices missing from the input.
-
Name the result of this computation as slice_time
, which represents the generated "time series" column (alias) after gap filling.
-
Partition the data by expression
, slice-time
. For each partition, do step 4.
-
Sort the data by time-expr
. Interpolation is computed here.
There is semantic overlap between the TIMESERIES clause and the TIME_SLICE function with the following key differences:
-
TIMESERIES
only supports the interval qualifier DAY TO SECOND
; it does not allow YEAR TO MONTH
.
-
Unlike TIME_SLICE
, the time slice length and time unit expressed in *length-and-time-unit-expr
*must be constants so gaps in the time slices are well-defined.
-
TIMESERIES
performs gap filling; the TIME_SLICE
function does not.
-
TIME_SLICE
can return the start or end time of a time slice, depending on the value of its fourth input parameter (start-or-end
). TIMESERIES
, on the other hand, always returns the start time of each time slice. To output the end time of each time slice, write a SELECT
statement like the following:
=> SELECT slice_time + <slice_length>;
Restrictions
-
When the TIMESERIES
clause occurs in a SQL query block, only the following clauses can be used in the same query block:
-
SELECT
-
FROM
-
WHERE
-
ORDER BY
GROUP BY
and HAVING
clauses are not allowed. If a GROUP BY
operation is needed before or after gap-filling and interpolation (GFI), use a subquery and place the GROUP BY
In the outer query. For example:
=> SELECT symbol, AVG(first_bid) as avg_bid FROM (
SELECT symbol, slice_time, TS_FIRST_VALUE(bid1) AS first_bid
FROM Tickstore
WHERE symbol IN ('MSFT', 'IBM')
TIMESERIES slice_time AS '5 seconds' OVER (PARTITION BY symbol ORDER BY ts)
) AS resultOfGFI
GROUP BY symbol;
-
When the TIMESERIES
clause is present in the SQL query block, the SELECT
list can include only the following:
For example, the following two queries return a syntax error because bid1
is not a PARTITION BY
or GROUP BY
column:
=> SELECT bid, symbol, TS_FIRST_VALUE(bid) FROM Tickstore
TIMESERIES slice_time AS '5 seconds' OVER (PARTITION BY symbol ORDER BY ts);
ERROR: column "Tickstore.bid" must appear in the PARTITION BY list of Timeseries clause or be used in a Timeseries Output function
=> SELECT bid, symbol, AVG(bid) FROM Tickstore
GROUP BY symbol;
ERROR: column "Tickstore.bid" must appear in the GROUP BY clause or be used in an aggregate function
Examples
For examples, see Gap filling and interpolation (GFI).
See also
13 - UNION clause
Combines the results of multiple SELECT statements.
Combines the results of multiple SELECT statements. You can include UNION in FROM, WHERE, and HAVING clauses.
Syntax
select { UNION [ ALL | DISTINCT ] select }[...]
[ order-by-clause [ offset-clause ]]
[ limit-clause ]
Arguments
select
- A SELECT statement that returns one or more rows, depending on whether you specify keywords DISTINCT or ALL.
The first SELECT statement can include the LABEL hint. Vertica ignores LABEL hints in subsequent SELECT statements.
Each SELECT statement can specify its own ORDER BY, LIMIT, and OFFSET clauses. A SELECT statement with one or more of these clauses must be enclosed by parentheses. See also: ORDER BY, LIMIT, and OFFSET Clauses in UNION.
DISTINCT
, ALL
- How to return duplicate rows:
-
DISTINCT (default) returns only unique rows.
-
ALL concatenates all rows, including duplicates. For best performance, use UNION ALL.
UNION ALL supports columns of complex types; UNION DISTINCT does not.
Requirements
-
Each row of the UNION result set must be in the result set of at least one of its SELECT statements.
-
Each SELECT statement must specify the same number of columns.
-
Data types of corresponding SELECT statement columns must be compatible, otherwise Vertica returns an error.
ORDER BY, LIMIT, and OFFSET clauses in UNION
A UNION statement can specify its own ORDER BY, LIMIT, and OFFSET clauses, as in the following example:
=> SELECT id, emp_name FROM company_a UNION ALL SELECT id, emp_name FROM company_b ORDER BY emp_name LIMIT 2;
id | emp_name
------+----------
5678 | Alice
8765 | Bob
(2 rows)
Each SELECT statement in a UNION clause can specify its own ORDER BY, LIMIT, and OFFSET clauses. Vertica processes the SELECT statement clauses before it processes the UNION clauses. In the following example, Vertica processes the individual queries and then concatenates the two result sets:
=> (SELECT id, emp_name FROM company_a ORDER BY emp_name LIMIT 2)
UNION ALL
(SELECT id, emp_name FROM company_b ORDER BY emp_name LIMIT 2);
id | emp_name
------+-----------
5678 | Alice
9012 | Katherine
8765 | Bob
9012 | Katherine
(4 rows)
The following requirements and restrictions determine how Vertica processes a UNION clause that contains ORDER BY, LIMIT, and OFFSET clauses:
-
A UNION's ORDER BY clause must specify columns from the first (leftmost) SELECT statement.
-
ORDER BY must precede LIMIT and OFFSET.
-
When a SELECT or UNION statement specifies both LIMIT and OFFSET, Vertica first processes the OFFSET statement, and then applies the LIMIT statement to the remaining rows.
Vertica supports UNION in noncorrelated subquery predicates. For example:
=> SELECT DISTINCT customer_key, customer_name FROM public.customer_dimension
WHERE customer_key IN
(SELECT customer_key FROM store.store_sales_fact WHERE sales_dollar_amount > 500
UNION ALL
SELECT customer_key FROM online_sales.online_sales_fact WHERE sales_dollar_amount > 500)
AND customer_state = 'CT';
customer_key | customer_name
--------------+------------------------
7021 | Luigi T. Dobisz
1971 | Betty V. Dobisz
46284 | Ben C. Gauthier
33885 | Tanya Y. Taylor
5449 | Sarah O. Robinson
29059 | Sally Z. Fortin
11200 | Foodhope
15582 | John J. McNulty
24638 | Alexandra F. Jones
...
UNION ALL with complex types
You can use UNION ALL with complex types. Consider a table with the following definition:
=> CREATE TABLE restaurants(
name VARCHAR, cuisine VARCHAR,
locations ARRAY[ROW(city VARCHAR(50), state VARCHAR(2)),50],
menu ARRAY[ROW(item VARCHAR(50), price FLOAT),100] );
Suppose you are in a new city looking for a place to eat. The database has information about the following restaurants:
=> SELECT name, cuisine FROM restaurants
WHERE CONTAINS(locations,ROW('Pittsburgh', 'PA'));
name | cuisine
-------------------+----------
Bakersfield Tacos | Mexican
Bob's pizzeria | Italian
Succulent Steaks | American
Sushi House | Asian
Villa Milano | Italian
(5 rows)
Suppose you are hungry for Italian food. If you cannot have Italian, you want something inexpensive. The following query uses two SELECT clauses from the same table, one finding menu items for Italian restaurants and one finding menu items under $10:
=> WITH menu_entries AS
(SELECT name, cuisine,
EXPLODE(menu USING PARAMETERS skip_partitioning=true) AS (idx, menu_entry)
FROM restaurants WHERE CONTAINS(locations,ROW('Pittsburgh', 'PA')))
SELECT name, cuisine, menu_entry FROM menu_entries WHERE cuisine = 'Italian'
UNION ALL
SELECT name, cuisine, menu_entry FROM menu_entries WHERE menu_entry.price <= 10;
name | cuisine | menu_entry
-------------------+---------+--------------------------------------------
Bob's pizzeria | Italian | {"item":"cheese pizza","price":8.25}
Bob's pizzeria | Italian | {"item":"spinach pizza","price":10.5}
Villa Milano | Italian | {"item":"pasta carbonara","price":24.99}
Villa Milano | Italian | {"item":"eggplant parmesan","price":23.49}
Villa Milano | Italian | {"item":"herbed salmon","price":28.99}
Bakersfield Tacos | Mexican | {"item":"veggie taco","price":9.95}
Bob's pizzeria | Italian | {"item":"cheese pizza","price":8.25}
(7 rows)
You cannot use LIMIT OVER with UNION ALL if the selected columns are of complex types. In this case, the statement returns an error like "Multi-value expressions are not supported in this context". You can still use LIMIT OVER in a single SELECT statement by using parentheses to make the scoping explicit.
Examples
The examples that follow use these two tables:
=> SELECT * FROM company_a;
ID emp_name dept sales
------+------------+-------------+-------
1234 | Stephen | auto parts | 1000
5678 | Alice | auto parts | 2500
9012 | Katherine | floral | 500
=> SELECT * FROM company_b;
ID emp_name dept sales
------+------------+-------------+-------
4321 | Marvin | home goods | 250
9012 | Katherine | home goods | 500
8765 | Bob | electronics | 20000
The following query finds all employee IDs and names from the two tables. The UNION statement uses DISTINCT to combine unique IDs and last names of employees. Katherine works for both companies, so she appears only once in the result set. DISTINCT is the default and can be omitted:
=> SELECT id, emp_name FROM company_a
UNION DISTINCT SELECT id, emp_name FROM company_b ORDER BY id;
id | emp_name
------+-----------
1234 | Stephen
4321 | Marvin
5678 | Alice
8765 | Bob
9012 | Katherine
(5 rows)
If the UNION statement instead uses ALL, the query returns two records for Katherine:
=> SELECT id, emp_name FROM company_a
UNION ALL SELECT id, emp_name FROM company_b ORDER BY id;
id | emp_name
------+-----------
1234 | Stephen
5678 | Alice
9012 | Katherine
4321 | Marvin
9012 | Katherine
8765 | Bob
(6 rows)
The following query returns the top two salespeople in each company. Each SELECT statement specifies its own ORDER BY and LIMIT clauses, and the UNION statement concatenates the result sets as returned by each query:
=> (SELECT id, emp_name, sales FROM company_a ORDER BY sales DESC LIMIT 2)
UNION ALL
(SELECT id, emp_name, sales FROM company_b ORDER BY sales DESC LIMIT 2);
id | emp_name | sales
------+-----------+-------
8765 | Bob | 20000
5678 | Alice | 2500
1234 | Stephen | 1000
9012 | Katherine | 500
(4 rows)
The following query returns all employees in both companies with an overall ordering. The ORDER BY clause is part of the UNION statement:
=> SELECT id, emp_name, sales FROM company_a
UNION
SELECT id, emp_name, sales FROM company_b
ORDER BY sales;
id | emp_name | sales
------+-----------+-------
4321 | Marvin | 250
9012 | Katherine | 500
1234 | Stephen | 1000
5678 | Alice | 2500
8765 | Bob | 20000
(5 rows)
The following query groups total sales by department within each company. Each SELECT statement has its own GROUP BY clause. UNION combines the aggregate results from each query:
=> (SELECT 'Company A' as company, dept, SUM(sales) FROM company_a
GROUP BY dept)
UNION
(SELECT 'Company B' as company, dept, SUM(sales) FROM company_b
GROUP BY dept)
ORDER BY 1;
company | dept | sum
-----------+-------------+-------
Company A | auto parts | 3500
Company A | floral | 500
Company B | electronics | 20000
Company B | home goods | 750
(4 rows)
See also
14 - WHERE clause
Specifies which rows to include in a query's result set.
Specifies which rows to include in a query's result set.
Syntax
WHERE boolean-expression [ subquery ]...
Arguments
boolean-expression
- An expression that returns true or false. The result set only includes rows that evaluate to true. The expression can include boolean operators and the following predicate elements:
Use parentheses to group expressions, predicates, and boolean operators. For example:
... WHERE NOT (A=1 AND B=2) OR C=3;
Examples
The following example returns the names of all customers in the Eastern region whose name starts with the string Amer
:
=> SELECT DISTINCT customer_name
FROM customer_dimension
WHERE customer_region = 'East'
AND customer_name ILIKE 'Amer%';
customer_name
---------------
Americare
Americom
Americore
Americorp
Ameridata
Amerigen
Amerihope
Amerimedia
Amerishop
Ameristar
Ameritech
(11 rows)
15 - WITH clause
A WITH clause defines one or more named common table expressions (CTEs), where each CTE encapsulates a result set that can be referenced by another CTE in the same WITH clause, or by the primary query.
A WITH clause defines one or more named common table expressions (CTEs), where each CTE encapsulates a result set that can be referenced by another CTE in the same WITH clause, or by the primary query. Vertica can evaluate WITH clauses in two ways:
- Inline expansion (default): Vertica evaluates each WITH clause every time it is referenced by the primary query.
- Materialization: Vertica evaluates each WITH clause once, stores results in a temporary table, and references this table as often as the query requires.
In both cases, WITH clauses can help simplify complicated queries and avoid statement repetition.
Syntax
WITH [ /*+ENABLE_WITH_CLAUSE_MATERIALIZATION */ ] [ RECURSIVE ] {
cte-identifier [ ( column-aliases ) ] AS (
[ subordinate-WITH-clause ]
query-expression )
} [,...]
Arguments
/*+ENABLE_WITH_CLAUSE_MATERIALIZATION*/
- Enables materialization of all queries in the current WITH clause. Otherwise, materialization is set by configuration parameter WithClauseMaterialization, by default set to 0 (disabled). If WithClauseMaterialization is disabled, materialization is automatically cleared when the primary query of the WITH clause returns. For details, see Materialization of WITH clause.
RECURSIVE
- Specifies to iterate over the WITH clause's own result set, through repeated execution of an embedded UNION or UNION ALL statement. For details, see WITH clause recursion.
cte-identifier
- Identifies a common table expression (CTE) within a WITH clause. This identifier is available to CTEs of the same WITH clause, and of parent and child WITH clauses (if any). CTE identifiers of the outermost (primary) WITH clause are also available to the primary query.
All CTE identifiers of the same WITH clause must be unique. For example, the following WITH clause defines two CTEs, so they require unique identifiers: regional_sales
and top_regions
:
WITH
-- query sale amounts for each region
regional_sales AS (SELECT ... ),
top_regions AS ( SELECT ... )
)
column-aliases
- A comma-delimited list of result set column aliases. The list of aliases must map to all column expressions in the CTE query. If omitted, result set columns can only be referenced by the names used in the query.
In the following example, the revenue
CTE specifies two column aliases: vkey
and total_revenue
. These map to column vendor_key
and aggregate expression SUM(total_order_cost)
, respectively. The primary query references these aliases:
WITH revenue ( vkey, total_revenue ) AS (
SELECT vendor_key, SUM(total_order_cost)
FROM store.store_orders_fact
GROUP BY vendor_key ORDER BY vendor_key)
SELECT v.vendor_name, v.vendor_address, v.vendor_city, r.total_revenue
FROM vendor_dimension v JOIN revenue r ON v.vendor_key = r.vkey
WHERE r.total_revenue = (SELECT MAX(total_revenue) FROM revenue )
ORDER BY vendor_name;
subordinate-WITH-clause
- A WITH clause that is nested within the current one. CTEs of this WITH clause can only reference CTEs of the same clause, and of parent and child WITH clauses.
Important
The primary query can only reference CTEs in the outermost WITH clause. It cannot reference the CTEs of any nested WITH clause.
query-expression
- The query of a given CTE.
Restrictions
WITH clauses only support SELECT and INSERT statements. They do not support UPDATE or DELETE statements.
Examples
Single WITH clause with single CTE
The following SQL defines a WITH clause with one CTE, revenue
, which aggregates data in table store.store_orders_fact
. The primary query references the WITH clause result set twice: in its JOIN
clause and predicate:
-- define WITH clause
WITH revenue ( vkey, total_revenue ) AS (
SELECT vendor_key, SUM(total_order_cost)
FROM store.store_orders_fact
GROUP BY vendor_key ORDER BY 1)
-- End WITH clause
-- primary query
SELECT v.vendor_name, v.vendor_address, v.vendor_city, r.total_revenue
FROM vendor_dimension v JOIN revenue r ON v.vendor_key = r.vkey
WHERE r.total_revenue = (SELECT MAX(total_revenue) FROM revenue )
ORDER BY vendor_name;
vendor_name | vendor_address | vendor_city | total_revenue
------------------+----------------+-------------+---------------
Frozen Suppliers | 471 Mission St | Peoria | 49877044
(1 row)
Single WITH clause and multiple CTEs
In the following example, the WITH clause contains two CTEs:
The primary query aggregates sales by region and departments in the top_regions
result set:
WITH
-- query sale amounts for each region
regional_sales (region, total_sales) AS (
SELECT sd.store_region, SUM(of.total_order_cost) AS total_sales
FROM store.store_dimension sd JOIN store.store_orders_fact of ON sd.store_key = of.store_key
GROUP BY store_region ),
-- query previous result set
top_regions AS (
SELECT region, total_sales
FROM regional_sales ORDER BY total_sales DESC LIMIT 3
)
-- primary query
-- aggregate sales in top_regions result set
SELECT sd.store_region AS region, pd.department_description AS department, SUM(of.total_order_cost) AS product_sales
FROM store.store_orders_fact of
JOIN store.store_dimension sd ON sd.store_key = of.store_key
JOIN public.product_dimension pd ON of.product_key = pd.product_key
WHERE sd.store_region IN (SELECT region FROM top_regions)
GROUP BY ROLLUP (region, department) ORDER BY region, product_sales DESC, GROUPING_ID();
region | department | product_sales
---------+----------------------------------+---------------
East | | 1716917786
East | Meat | 189837962
East | Produce | 170607880
East | Photography | 162271618
East | Frozen Goods | 141077867
East | Gifts | 137604397
East | Bakery | 136497842
East | Liquor | 130410463
East | Canned Goods | 128683257
East | Cleaning supplies | 118996326
East | Dairy | 118866901
East | Seafood | 109986665
East | Medical | 100404891
East | Pharmacy | 71671717
MidWest | | 1287550770
MidWest | Meat | 141446607
MidWest | Produce | 125156100
MidWest | Photography | 122666753
MidWest | Frozen Goods | 105893534
MidWest | Gifts | 103088595
MidWest | Bakery | 102844467
MidWest | Canned Goods | 97647270
MidWest | Liquor | 97306898
MidWest | Cleaning supplies | 90775242
MidWest | Dairy | 89065443
MidWest | Seafood | 82541528
MidWest | Medical | 76674814
MidWest | Pharmacy | 52443519
West | | 2159765937
West | Meat | 235841506
West | Produce | 215277204
West | Photography | 205949467
West | Frozen Goods | 178311593
West | Bakery | 172824555
West | Gifts | 172134780
West | Liquor | 164798022
West | Canned Goods | 163330813
West | Cleaning supplies | 148776443
West | Dairy | 145244575
West | Seafood | 139464407
West | Medical | 126184049
West | Pharmacy | 91628523
| | 5164234493
(43 rows)
INSERT statement that includes WITH clause
The following SQL uses a WITH clause to insert data from a JOIN query into table total_store_sales
:
CREATE TABLE total_store_sales (store_key int, region VARCHAR(20), store_sales numeric (12,2));
INSERT INTO total_store_sales
WITH store_sales AS (
SELECT sd.store_key, sd.store_region::VARCHAR(20), SUM (of.total_order_cost)
FROM store.store_dimension sd JOIN store.store_orders_fact of ON sd.store_key = of.store_key
GROUP BY sd.store_region, sd.store_key ORDER BY sd.store_region, sd.store_key)
SELECT * FROM store_sales;
=> SELECT * FROM total_store_sales ORDER BY region, store_key;
store_key | region | store_sales
-----------+-----------+-------------
2 | East | 47668303.00
6 | East | 48136354.00
12 | East | 46673113.00
22 | East | 48711211.00
24 | East | 48603836.00
31 | East | 46836469.00
36 | East | 48461449.00
37 | East | 48018279.00
41 | East | 48713084.00
44 | East | 47808362.00
49 | East | 46990023.00
50 | East | 47643329.00
9 | MidWest | 46851087.00
15 | MidWest | 48787354.00
27 | MidWest | 48497620.00
29 | MidWest | 47639234.00
30 | MidWest | 49013483.00
38 | MidWest | 48856012.00
42 | MidWest | 47297912.00
45 | MidWest | 48544521.00
46 | MidWest | 48887255.00
4 | NorthWest | 47580215.00
39 | NorthWest | 47136892.00
47 | NorthWest | 48477574.00
8 | South | 48131455.00
13 | South | 47605422.00
17 | South | 46054367.00
...
(50 rows)
15.1 - Inline expansion of WITH clause
By default, Vertica uses inline expansion to evaluate WITH clauses.
By default, Vertica uses inline expansion to evaluate WITH clauses. Vertica evaluates each WITH clause every time it is referenced by the primary query. Inline expansion often works best if the query does not reference the same WITH clause multiple times, or if some local optimizations are possible after inline expansion.
Example
The following example shows a WITH clause that is a good candidate for inline expansion. The WITH clause is used in a query that obtains order information for all 2007 orders shipped between December 01-07:
-- Begin WITH
WITH store_orders_fact_new AS(
SELECT * FROM store.store_orders_fact WHERE date_shipped between '2007-12-01' and '2007-12-07')
-- End WITH
-- Begin primary query
SELECT store_key, product_key, product_version, SUM(quantity_ordered*unit_price) AS total_price
FROM store_orders_fact_new
GROUP BY store_key, product_key, product_version
ORDER BY total_price DESC;
store_key | product_key | product_version | total_price
-----------+-------------+-----------------+-------------
232 | 1855 | 2 | 29008
125 | 8500 | 4 | 28812
139 | 3707 | 2 | 28812
212 | 3203 | 1 | 28000
236 | 8023 | 4 | 27548
123 | 10598 | 2 | 27146
34 | 8888 | 4 | 27100
203 | 2243 | 1 | 27027
117 | 13932 | 2 | 27000
84 | 768 | 1 | 26936
123 | 1038 | 1 | 26885
106 | 18932 | 1 | 26864
93 | 10395 | 3 | 26790
162 | 13073 | 1 | 26754
15 | 3679 | 1 | 26675
52 | 5957 | 5 | 26656
190 | 8114 | 3 | 26611
5 | 7772 | 1 | 26588
139 | 6953 | 3 | 26572
202 | 14735 | 1 | 26404
133 | 2740 | 1 | 26312
198 | 8545 | 3 | 26287
221 | 7582 | 2 | 26280
127 | 9468 | 3 | 26224
63 | 8115 | 4 | 25960
171 | 2088 | 1 | 25650
250 | 11210 | 3 | 25608
...
Vertica processes the query as follows:
-
Expands the WITH clause reference to store_orders_fact_new
within the primary query.
-
After expanding the WITH clause, evaluates the primary query.
15.2 - Materialization of WITH clause
When materialization is enabled, Vertica evaluates each WITH clause once, stores results in a temporary table, and references this table as often as the query requires.
When materialization is enabled, Vertica evaluates each WITH clause once, stores results in a temporary table, and references this table as often as the query requires. Vertica drops the temporary table after primary query execution completes.
Note
If the primary query returns with an error, temporary tables might be dropped only after the client’s session ends.
Materialization can facilitate better performance when WITH clauses are complex—for example, when the WITH clauses contain JOIN and GROUP BY clauses, and are referenced multiple times in the primary query.
If materialization is enabled, WITH statements perform an auto-commit of the user transaction. This occurs even when using EXPLAIN with the WITH statement.
Enabling WITH clause materialization
WITH materialization is set by configuration parameter WithClauseMaterialization, by default set to 0 (disabled). You can enable and disable materialization by setting WithClauseMaterialization at database and session levels, with ALTER DATABASE and ALTER SESSION, respectively:
-
Database:
=> ALTER DATABASE db-spec SET PARAMETER WithClauseMaterialization={ 0 | 1 };
=> ALTER DATABASE db-spec CLEAR PARAMETER WithClauseMaterialization;
-
Session: Parameter setting remains in effect until you explicitly clear it, or the session ends.
=> ALTER SESSION SET PARAMETER WithClauseMaterialization={ 0 | 1 };
=> ALTER SESSION CLEAR PARAMETER WithClauseMaterialization;
You can also enable WITH materialization for individual queries with the hint ENABLE_WITH_CLAUSE_MATERIALIZATION. Materialization is automatically cleared when the query returns. For example:
=> WITH /*+ENABLE_WITH_CLAUSE_MATERIALIZATION */ revenue AS (
SELECT vendor_key, SUM(total_order_cost) AS total_revenue
FROM store.store_orders_fact
GROUP BY vendor_key ORDER BY 1)
...
Processing WITH clauses using EE5 temp relations
By default, when WITH clause queries are reused, Vertica saves those WITH clause query outputs in EE5 temp relations. However, this option can be changed. EE5 temp relation support is set by configuration parameter EnableWITHTempRelReuseLimit, which can be set in the following ways:
-
0: Disables this feature.
-
1: Force-saves all WITH clause queries into EE5 temp relations, whether or not they are reused.
-
2 (default): Saves only reused WITH clause queries into EE5 temp relations.
-
3 or more: Saves WITH clause queries into EE5 temp relations only when they are used at least this number of times.
EnableWITHTempRelReuseLimit can be set at database and session levels, with ALTER DATABASE and ALTER SESSION, respectively. When WithClauseMaterialization is set to 1, that setting overrides any EnableWITHTempRelReuseLimit settings.
Note that for WITH queries with complex types, temp relations are disabled.
Example
The following example shows a WITH clause that is a good candidate for materialization. The query obtains data for the vendor who has the highest combined order cost for all orders:
-- Enable materialization
=> ALTER SESSION SET PARAMETER WithClauseMaterialization=1;
-- Define WITH clause
=> WITH revenue AS (
SELECT vendor_key, SUM(total_order_cost) AS total_revenue
FROM store.store_orders_fact
GROUP BY vendor_key ORDER BY 1)
-- End WITH clause
-- Primary query
=> SELECT vendor_name, vendor_address, vendor_city, total_revenue
FROM vendor_dimension v, revenue r
WHERE v.vendor_key = r.vendor_key AND total_revenue = (SELECT MAX(total_revenue) FROM revenue )
ORDER BY vendor_name;
vendor_name | vendor_address | vendor_city | total_revenue
------------------+----------------+-------------+---------------
Frozen Suppliers | 471 Mission St | Peoria | 49877044
(1 row)
Vertica processes this query as follows:
-
WITH clause revenue
evaluates its SELECT statement from table store.store_orders_fact
.
-
Results of the revenue
clause are stored in a local temporary table.
-
Whenever the revenue
clause statement is referenced, the results stored in the table are used.
-
The temporary table is dropped when query execution is complete.
15.3 - WITH clause recursion
A WITH clause that includes the RECURSIVE option iterates over its own output through repeated execution of a UNION or UNION ALL query.
A WITH clause that includes the RECURSIVE option iterates over its own output through repeated execution of a UNION or UNION ALL query. Recursive queries are useful when working with self-referential data—hierarchies such as manager-subordinate relationships, or tree-structured data such as taxonomies.
The configuration parameter WithClauseRecursionLimit—by default set to 8—sets the maximum depth of recursion. You can set this parameter at database and session scopes with ALTER DATABASE and ALTER SESSION, respectively. Recursion continues until it reaches the configured maximum depth, or until the last iteration returns with no data.
You specify a recursive WITH clause as follows:
WITH [ /*+ENABLE_WITH_CLAUSE_MATERIALIZATION*/ ] RECURSIVE
cte-identifier [ ( column-aliases ) ] AS (
non-recursive-term
UNION [ ALL ]
recursive-term
)
Non-recursive and recursive terms are separated by UNION or UNION ALL:
-
The non-recursive-term
query sets its result set in cte-identifier
, which is subject to recursion in recursive-term
.
-
The UNION statement's recursive-term
recursively iterates over its own output. When recursion is complete, the results of all iterations are compiled and set in cte-identifier
.
For example:
=> ALTER SESSION SET PARAMETER WithClauseRecursionLimit=4; -- maximum recursion depth = 4
=> WITH RECURSIVE nums (n) AS (
SELECT 1 -- non-recursive (base) term
UNION ALL
SELECT n+1 FROM nums -- recursive term
)
SELECT n FROM nums; -- primary query
This simple query executes as follows:
-
Executes the WITH RECURSIVE clause:
-
Evaluates the non-recursive term SELECT 1, and places the result set—1—in nums
.
-
Iterates over the UNION ALL query (SELECT n+1) until the number of iterations is greater than the configuration parameter WithClauseRecursionLimit.
-
Combines the results of all UNION queries and sets the result set in nums
, and then exits to the primary query.
-
Executes the primary query SELECT n FROM nums:
n
---
1
2
3
4
5
(5 rows)
In this case , WITH RECURSIVE clause exits after four iterations as per WithClauseRecursionLimit. If you restore WithClauseRecursionLimit to its default value of 8, then the clause exits after eight iterations:
=> ALTER SESSION CLEAR PARAMETER WithClauseRecursionLimit;
=> WITH RECURSIVE nums (n) AS (
SELECT 1
UNION ALL
SELECT n+1 FROM nums
)
SELECT n FROM nums;
n
---
1
2
3
4
5
6
7
8
9
(9 rows)
Important
Be careful to set WithClauseRecursionLimit only as high as needed to traverse the deepest hierarchies. Vertica sets no limit on this parameter; however, a high value can incur considerable overhead that adversely affects performance and exhausts system resources.
If a high recursion count is required, then consider enabling materialization. For details, see WITH RECURSIVE Materialization.
Restrictions
The following restrictions apply:
-
The SELECT list of a non-recursive term cannot include the wildcard *
(asterisk) or the function MATCH_COLUMNS.
-
A recursive term can reference the target CTE only once.
-
Recursive reference cannot appear within an outer join.
-
Recursive reference cannot appear within a subquery.
-
WITH clauses do not support UNION options ORDER BY, LIMIT, and OFFSET.
Examples
A small software company maintains the following data on employees and their managers:
=> SELECT * FROM personnel.employees ORDER BY emp_id;
emp_id | fname | lname | section_id | section_name | section_leader | leader_id
--------+-----------+-----------+------------+---------------------+------------------+-----------
0 | Stephen | Mulligan | 0 | | |
1 | Michael | North | 201 | Development | Zoe Black | 3
2 | Megan | Berry | 202 | QA | Richard Chan | 18
3 | Zoe | Black | 101 | Product Development | Renuka Patil | 24
4 | Tim | James | 203 | IT | Ebuka Udechukwu | 17
5 | Bella | Tucker | 201 | Development | Zoe Black | 3
6 | Alexandra | Climo | 202 | QA | Richard Chan | 18
7 | Leonard | Gray | 203 | IT | Ebuka Udechukwu | 17
8 | Carolyn | Henderson | 201 | Development | Zoe Black | 3
9 | Ryan | Henderson | 201 | Development | Zoe Black | 3
10 | Frank | Tucker | 205 | Sales | Benjamin Glover | 29
11 | Nathan | Ferguson | 102 | Sales Marketing | Eric Redfield | 28
12 | Kevin | Rampling | 101 | Product Development | Renuka Patil | 24
13 | Tuy Kim | Duong | 201 | Development | Zoe Black | 3
14 | Dwipendra | Sing | 204 | Tech Support | Sarah Feldman | 26
15 | Dylan | Wijman | 206 | Documentation | Kevin Rampling | 12
16 | Tamar | Sasson | 207 | Marketing | Nathan Ferguson | 11
17 | Ebuka | Udechukwu | 101 | Product Development | Renuka Patil | 24
18 | Richard | Chan | 101 | Product Development | Renuka Patil | 24
19 | Maria | del Rio | 201 | Development | Zoe Black | 3
20 | Hua | Song | 204 | Tech Support | Sarah Feldman | 26
21 | Carmen | Lopez | 204 | Tech Support | Sarah Feldman | 26
22 | Edgar | Mejia | 206 | Documentation | Kevin Rampling | 12
23 | Riad | Salim | 201 | Development | Zoe Black | 3
24 | Renuka | Patil | 100 | Executive Office | Stephen Mulligan | 0
25 | Rina | Dsouza | 202 | QA | Richard Chan | 18
26 | Sarah | Feldman | 101 | Product Development | Renuka Patil | 24
27 | Max | Mills | 102 | Sales Marketing | Eric Redfield | 28
28 | Eric | Redfield | 100 | Executive Office | Stephen Mulligan | 0
29 | Benjamin | Glover | 102 | Sales Marketing | Eric Redfield | 28
30 | Dominic | King | 205 | Sales | Benjamin Glover | 29
32 | Ryan | Metcalfe | 206 | Documentation | Kevin Rampling | 12
33 | Piers | Paige | 201 | Development | Zoe Black | 3
34 | Nicola | Kelly | 207 | Marketing | Nathan Ferguson | 11
(34 rows)
You can query this data for employee-manager relationships through WITH RECURSIVE. For example, the following query's WITH RECURSIVE clause gets employee-manager relationships for employee Eric Redfield, including all employees who report directly and indirectly to him:
WITH RECURSIVE managers (employeeID, employeeName, sectionID, section, lead, leadID)
AS (SELECT emp_id, fname||' '||lname, section_id, section_name, section_leader, leader_id
FROM personnel.employees WHERE fname||' '||lname = 'Eric Redfield'
UNION
SELECT emp_id, fname||' '||lname AS employee_name, section_id, section_name, section_leader, leader_id FROM personnel.employees e
JOIN managers m ON m.employeeID = e.leader_id)
SELECT employeeID, employeeName, lead AS 'Reports to', section, leadID from managers ORDER BY sectionID, employeeName;
The WITH RECURSIVE clause defines the CTE managers
, and then executes in two phases:
-
The non-recursive term populates managers
with data that it queries from personnel.employees
.
-
The recursive term's UNION query iterates over its own output until, on the fourth cycle, it finds no more data. The results of all iterations are then compiled and set in managers
, and the WITH CLAUSE exits to the primary query.
The primary query returns three levels of data from managers
—one for each recursive iteration:
Similarly, the following query iterates over the same data to get all employee-manager relationships for employee Richard Chan, who is one level lower in the company chain of command:
WITH RECURSIVE managers (employeeID, employeeName, sectionID, section, lead, leadID)
AS (SELECT emp_id, fname||' '||lname, section_id, section_name, section_leader, leader_id
FROM personnel.employees WHERE fname||' '||lname = 'Richard Chan'
UNION
SELECT emp_id, fname||' '||lname AS employee_name, section_id, section_name, section_leader, leader_id FROM personnel.employees e
JOIN managers m ON m.employeeID = e.leader_id)
SELECT employeeID, employeeName, lead AS 'Reports to', section, leadID from managers ORDER BY sectionID, employeeName;
The WITH RECURSIVE clause executes as before, except this time it finds no more data after two iterations and exits. Accordingly, the primary query returns two levels of data from managers
:
WITH RECURSIVE materialization
By default, materialization is disabled. In this case, Vertica rewrites the WITH RECURSIVE query into subqueries, as many as necessary for the required level of recursion.
If recursion is very deep, the high number of query rewrites is liable to incur considerable overhead that adversely affects performance and exhausts system resources. In this case, consider enabling materialization, either with the configuration parameter WithClauseMaterialization, or the hint ENABLE_WITH_CLAUSE_MATERIALIZATION. In either case, intermediate result sets from all recursion levels are written to local temporary tables. When recursion is complete, the intermediate results in all temporary tables are compiled and passed on to the primary query.
Note
If materialization is not possible, you can improve throughput on a resource pool that handles recursive queries by setting its
EXECUTIONPARALLELISM parameter to 1.