This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Subqueries
A subquery is a SELECT statement embedded within another SELECT statement.
A subquery is a SELECT
statement embedded within another SELECT
statement. The embedded subquery is often referenced as the query's inner statement, while the containing query is typically referenced as the query's statement, or outer query block. A subquery returns data that the outer query uses as a condition to determine what data to retrieve. There is no limit to the number of nested subqueries you can create.
Like any query, a subquery returns records from a table that might consist of a single column and record, a single column with multiple records, or multiple columns and records. Queries can be noncorrelated or correlated. You can even use them to update or delete records in a table based on values that are stored in other database tables.
1 - Subqueries used in search conditions
Subqueries are used as search conditions in order to filter results.
Subqueries are used as search conditions in order to filter results. They specify the conditions for the rows returned from the containing query's select-list, a query expression, or the subquery itself. The operation evaluates to TRUE, FALSE, or UNKNOWN (NULL).
Syntax
search-condition {
[ { AND | OR | NOT } { predicate | ( search-condition ) } ]
}[,... ]
predicate
{ expression comparison-operator expression
| string-expression [ NOT ] { LIKE | ILIKE | LIKEB | ILIKEB } string-expression
| expression IS [ NOT ] NULL
| expression [ NOT ] IN ( subquery | expression[,... ] )
| expression comparison-operator [ ANY | SOME ] ( subquery )
| expression comparison-operator ALL ( subquery )
| expression OR ( subquery )
| [ NOT ] EXISTS ( subquery )
| [ NOT ] IN ( subquery )
}
Parameters
search-condition |
Specifies the search conditions for the rows returned from one of the following:
If the subquery is used with an UPDATE or DELETE statement, UPDATE specifies the rows to update and DELETE specifies the rows to delete.
|
{ AND | OR | NOT } |
Keywords that specify the logical operators that combine conditions, or in the case of NOT, negate conditions.
-
AND : Combines two conditions and evaluates to TRUE when both of the conditions are TRUE.
-
OR : Combines two conditions and evaluates to TRUE when either condition is TRUE.
-
NOT : Negates the Boolean expression specified by the predicate.
|
predicate |
An expression that returns TRUE, FALSE, or UNKNOWN (NULL). |
expression |
A column name, constant, function, or scalar subquery, or combination of column names, constants, and functions connected by operators or subqueries. |
comparison-operator |
An operator that tests conditions between two expressions, one of the following:
This operator... |
Tests whether... |
< |
The first expression is less than the second. |
> |
The first expression is greater than the second. |
<= |
The first expression is less than or equal to the second expression. |
>= |
The first expression is greater than or equal to the second expression. |
= |
Two expressions are equal. |
<=> |
Like the = operator, but returns TRUE (instead of UNKNOWN) if both expressions evaluate to UNKNOWN, and FALSE (instead of UNKNOWN) if one expression evaluates to UNKNOWN. |
<>
|
Two expressions are not equal. |
!=
|
|
string-expression |
A character string with optional wildcard (* ) characters. |
[ NOT ] { LIKE | ILIKE | LIKEB | ILIKEB } |
Indicates that the character string following the predicate is to be used (or not used) for pattern matching. |
IS [ NOT ] NULL |
Searches for values that are null or are not null. |
ALL |
Used with a comparison operator and a subquery. Returns TRUE for the left-hand predicate if all values returned by the subquery satisfy the comparison operation, or FALSE if not all values satisfy the comparison or if the subquery returns no rows to the outer query block. |
ANY | SOME |
ANY and SOME are synonyms and are used with a comparison operator and a subquery. Either returns TRUE for the left-hand predicate if any value returned by the subquery satisfies the comparison operation, or FALSE if no values in the subquery satisfy the comparison or if the subquery returns no rows to the outer query block. Otherwise, the expression is UNKNOWN. |
[ NOT ] EXISTS |
Used with a subquery to test for the existence of records that the subquery returns. |
[ NOT ] IN |
Searches for an expression on the basis of an expression's exclusion or inclusion from a list. The list of values is enclosed in parentheses and can be a subquery or a set of constants. |
1.1 - Logical operators AND and OR
The AND and OR logical operators combine two conditions.
The AND and OR logical operators combine two conditions. AND evaluates to TRUE when both of the conditions joined by the AND keyword are matched, and OR evaluates to TRUE when either condition joined by OR is matched.
OR subqueries (complex expressions)
Vertica supports subqueries in more complex expressions using OR; for example:
-
More than one subquery in the conjunct expression:
(SELECT MAX(b) FROM t1) + SELECT (MAX FROM t2) a IN (SELECT a FROM t1) OR b IN (SELECT x FROM t2)
-
An OR clause in the conjunct expression involves at least one subquery:
a IN (SELECT a FROM t1) OR b IN (SELECT x FROM t2) a IN (SELECT a from t1) OR b = 5
a = (SELECT MAX FROM t2) OR b = 5
-
One subquery is present but it is part of a another expression:
x IN (SELECT a FROM t1) = (x = (SELECT MAX FROM t2) (x IN (SELECT a FROM t1) IS NULL
How AND queries are evaluated
Vertica treats expressions separated by AND (conjunctive) operators individually. For example if the WHERE clause were:
WHERE (a IN (SELECT a FROM t1) OR b IN (SELECT x FROM t2)) AND (c IN (SELECT a FROM t1))
the query would be interpreted as two conjunct expressions:
-
(a IN (SELECT a FROM t1) OR b IN (SELECT x FROM t2))
-
(c IN (SELECT a FROM t1))
The first expression is considered a complex subquery, whereas the second expression is not.
Examples
The following list shows some of the ways you can filter complex conditions in the WHERE clause:
-
OR expression between a subquery and a non-subquery condition:
=> SELECT x FROM t WHERE x > (SELECT SUM(DISTINCT x) FROM t GROUP BY y) OR x < 9;
-
OR expression between two subqueries:
=> SELECT * FROM t WHERE x=(SELECT x FROM t) OR EXISTS(SELECT x FROM tt);
-
Subquery expression:
=> SELECT * FROM t WHERE x=(SELECT x FROM t)+1 OR x<>(SELECT x FROM t)+1;
-
OR expression with [NOT] IN subqueries:
=> SELECT * FROM t WHERE NOT (EXISTS (SELECT x FROM t)) OR x >9;
-
OR expression with IS [NOT] NULL subqueries:
=> SELECT * FROM t WHERE (SELECT * FROM t)IS NULL OR (SELECT * FROM tt)IS NULL;
-
OR expression with boolean column and subquery that returns Boolean data type:
=> SELECT * FROM t2 WHERE x = (SELECT x FROM t2) OR x;
Note
To return TRUE, the argument of OR must be a Boolean data type.
-
OR expression in the CASE statement:
=> SELECT * FROM t WHERE CASE WHEN x=1 THEN x > (SELECT * FROM t)
OR x < (SELECT * FROM t2) END ;
-
Analytic function, NULL-handling function, string function, math function, and so on:
=> SELECT x FROM t WHERE x > (SELECT COALESCE (x,y) FROM t GROUP BY x,y) OR
x < 9;
-
In user-defined functions (assuming f()
is one):
=> SELECT * FROM t WHERE x > 5 OR x = (SELECT f(x) FROM t);
-
Use of parentheses at different places to restructure the queries:
=> SELECT x FROM t WHERE (x = (SELECT x FROM t) AND y = (SELECT y FROM t))
OR (SELECT x FROM t) =1;
-
Multicolumn subqueries:
=> SELECT * FROM t WHERE (x,y) = (SELECT x,y FROM t) OR x > 5;
-
Constant/NULL on lefthand side of subquery:
=> SELECT * FROM t WHERE x > 5 OR 5 = (SELECT x FROM t);
See also
1.2 - In place of an expression
Subqueries that return a single value (unlike a list of values returned by IN subqueries) can be used just about anywhere an expression is allowed in SQL.
Subqueries that return a single value (unlike a list of values returned by IN subqueries) can be used just about anywhere an expression is allowed in SQL. It can be a column name, a constant, a function, a scalar subquery, or a combination of column names, constants, and functions connected by operators or subqueries.
For example:
=> SELECT c1 FROM t1 WHERE c1 = ANY (SELECT c1 FROM t2) ORDER BY c1;
=> SELECT c1 FROM t1 WHERE COALESCE((t1.c1 > ANY (SELECT c1 FROM t2)), TRUE);
=> SELECT c1 FROM t1 GROUP BY c1 HAVING
COALESCE((t1.c1 <> ALL (SELECT c1 FROM t2)), TRUE);
Multi-column expressions are also supported:
=> SELECT c1 FROM t1 WHERE (t1.c1, t1.c2) = ALL (SELECT c1, c2 FROM t2);
=> SELECT c1 FROM t1 WHERE (t1.c1, t1.c2) <> ANY (SELECT c1, c2 FROM t2);
Vertica returns an error on queries where more than one row would be returned by any subquery used as an expression:
=> SELECT c1 FROM t1 WHERE c1 = (SELECT c1 FROM t2) ORDER BY c1;
ERROR: more than one row returned by a subquery used as an expression
See also
1.3 - Comparison operators
Vertica supports Boolean subquery expressions in the WHERE clause with any of the following operators:.
Vertica supports Boolean subquery expressions in the WHERE
clause with any of the following operators:
>
<
>=
<=
=
<>
<=>
WHERE
clause subqueries filter results and take the following form:
SELECT <column, ...> FROM <table>
WHERE <condition> (SELECT <column, ...> FROM <table> WHERE <condition>);
These conditions are available for all data types where comparison makes sense. All Comparison operators are binary operators that return values of TRUE, FALSE, or UNKNOWN (NULL).
Expressions that correlate to just one outer table in the outer query block are supported, and these correlated expressions can be comparison operators.
The following subquery scenarios are supported:
SELECT * FROM T1 WHERE T1.x = (SELECT MAX(c1) FROM T2);
SELECT * FROM T1 WHERE T1.x >= (SELECT MAX(c1) FROM T2 WHERE T1.y = T2.c2);
SELECT * FROM T1 WHERE T1.x <= (SELECT MAX(c1) FROM T2 WHERE T1.y = T2.c2);
See also
Subquery restrictions
1.4 - LIKE pattern matching
Vertica supports LIKE pattern-matching conditions in subqueries and take the following form:.
Vertica supports LIKE
pattern-matching conditions in subqueries and take the following form:
string-expression [ NOT ] { LIKE | ILIKE | LIKEB | ILIKEB } string-expression
The following command searches for customers whose company name starts with "Ev" and returns the total count:
=> SELECT COUNT(*) FROM customer_dimension WHERE customer_name LIKE
(SELECT 'Ev%' FROM customer_dimension LIMIT 1);
count
-------
153
(1 row)
Vertica also supports single-row subqueries as the pattern argument for LIKEB and ILIKEB predicates; for example:
=> SELECT * FROM t1 WHERE t1.x LIKEB (SELECT t2.x FROM t2);
The following symbols are substitutes for the LIKE keywords:
~~ LIKE
~# LIKEB
~~* ILIKE
~#* ILIKEB
!~~ NOT LIKE
!~# NOT LIKEB
!~~* NOT ILIKE
!~#* NOT IILIKEB
Note
The ESCAPE
keyword is not valid for the above symbols.
See LIKE predicate for additional examples.
1.5 - ANY and ALL
You typically use comparison operators (=, >, < , etc.) only on subqueries that return one row.
You typically use comparison operators (=
, >
, <
, etc.) only on subqueries that return one row. With ANY
and ALL
operators, you can make comparisons on subqueries that return multiple rows.
These subqueries take the following form:
expression comparison-operator { ANY | ALL } (subquery)
ANY
and ALL
evaluate whether any or all of the values returned by a subquery match the left-hand expression.
Equivalent operators
You can use following operators instead of ANY
or ALL
:
This operator... |
Is equivalent to: |
SOME |
ANY |
IN |
= ANY |
NOT IN |
<> ALL |
Example data
Examples below use the following tables and data:
CREATE TABLE t1 (c1 int, c2 VARCHAR(8)); |
CREATE TABLE t2 (c1 int, c2 VARCHAR(8)); |
=> SELECT * FROM t1 ORDER BY c1;
c1 | c2
----+-----
1 | cab
1 | abc
2 | fed
2 | def
3 | ihg
3 | ghi
4 | jkl
5 | mno
(8 rows)
|
=> SELECT * FROM t2 ORDER BY c1;
c1 | c2
----+-----
1 | abc
2 | fed
3 | jkl
3 | stu
3 | zzz
(5 rows)
|
ANY subqueries
Subqueries that use the ANY
keyword return true when any value retrieved in the subquery matches the value of the left-hand expression.
Examples
An ANY
subquery within an expression:
=> SELECT c1, c2 FROM t1 WHERE COALESCE((t1.c1 > ANY (SELECT c1 FROM t2)));
c1 | c2
----+-----
2 | fed
2 | def
3 | ihg
3 | ghi
4 | jkl
5 | mno
(6 rows)
ANY
noncorrelated subqueries without aggregates:
=> SELECT c1 FROM t1 WHERE c1 = ANY (SELECT c1 FROM t2) ORDER BY c1;
c1
----
1
1
2
2
3
3
(6 rows)
ANY
noncorrelated subqueries with aggregates:
=> SELECT c1, c2 FROM t1 WHERE c1 <> ANY (SELECT MAX(c1) FROM t2) ORDER BY c1;
c1 | c2
----+-----
1 | cab
1 | abc
2 | fed
2 | def
4 | jkl
5 | mno
(6 rows)
=> SELECT c1 FROM t1 GROUP BY c1 HAVING c1 <> ANY (SELECT MAX(c1) FROM t2) ORDER BY c1;
c1
----
1
2
4
5
(4 rows)
ANY
noncorrelated subqueries with aggregates and a GROUP BY
clause:
=> SELECT c1, c2 FROM t1 WHERE c1 <> ANY (SELECT MAX(c1) FROM t2 GROUP BY c2) ORDER BY c1;
c1 | c2
----+-----
1 | cab
1 | abc
2 | fed
2 | def
3 | ihg
3 | ghi
4 | jkl
5 | mno
(8 rows)
ANY
noncorrelated subqueries with a GROUP BY
clause:
=> SELECT c1, c2 FROM t1 WHERE c1 <=> ANY (SELECT c1 FROM t2 GROUP BY c1) ORDER BY c1;
c1 | c2
----+-----
1 | cab
1 | abc
2 | fed
2 | def
3 | ihg
3 | ghi
(6 rows)
ANY
correlated subqueries with no aggregates or GROUP BY
clause:
=> SELECT c1, c2 FROM t1 WHERE c1 >= ANY (SELECT c1 FROM t2 WHERE t2.c2 = t1.c2) ORDER BY c1;
c1 | c2
----+-----
1 | abc
2 | fed
4 | jkl
(3 rows)
ALL subqueries
A subquery that uses the ALL
keyword returns true when all values retrieved by the subquery match the left-hand expression, otherwise it returns false.
Examples
ALL
noncorrelated subqueries without aggregates:
=> SELECT c1, c2 FROM t1 WHERE c1 >= ALL (SELECT c1 FROM t2) ORDER BY c1;
c1 | c2
----+-----
3 | ihg
3 | ghi
4 | jkl
5 | mno
(4 rows)
ALL
noncorrelated subqueries with aggregates:
=> SELECT c1, c2 FROM t1 WHERE c1 = ALL (SELECT MAX(c1) FROM t2) ORDER BY c1;
c1 | c2
----+-----
3 | ihg
3 | ghi
(2 rows)
=> SELECT c1 FROM t1 GROUP BY c1 HAVING c1 <> ALL (SELECT MAX(c1) FROM t2) ORDER BY c1;
c1
----
1
2
4
5
(4 rows)
ALL
noncorrelated subqueries with aggregates and a GROUP BY
clause:
=> SELECT c1, c2 FROM t1 WHERE c1 <= ALL (SELECT MAX(c1) FROM t2 GROUP BY c2) ORDER BY c1;
c1 | c2
----+-----
1 | cab
1 | abc
(2 rows)
ALL
noncorrelated subqueries with a GROUP BY
clause:
=> SELECT c1, c2 FROM t1 WHERE c1 <> ALL (SELECT c1 FROM t2 GROUP BY c1) ORDER BY c1;
c1 | c2
----+-----
4 | jkl
5 | mno
(2 rows)
NULL handling
Vertica supports multicolumn <> ALL
subqueries where the columns are not marked NOT NULL
. If any column contains a NULL
value, Vertica returns a run-time error.
Vertica does not support = ANY
subqueries that are nested within another expression if any column values are NULL
.
See also
Subquery restrictions
1.6 - EXISTS and NOT EXISTS
The EXISTS predicate is one of the most common predicates used to build conditions that use noncorrelated and correlated subqueries.
The EXISTS
predicate is one of the most common predicates used to build conditions that use noncorrelated and correlated subqueries. Use EXISTS
to identify the existence of a relationship without regard for the quantity. For example, EXISTS
returns true if the subquery returns any rows, and [NOT] EXISTS
returns true if the subquery returns no rows.
[NOT] EXISTS
subqueries take the following form:
expression [ NOT ] EXISTS ( subquery )
The EXISTS
condition is considered to be met if the subquery returns at least one row. Since the result depends only on whether any records are returned, and not on the contents of those records, the output list of the subquery is normally uninteresting. A common coding convention is to write all EXISTS
tests as follows:
EXISTS (SELECT 1 WHERE ...)
In the above fragment, SELECT 1
returns the value 1 for every record in the query. If the query returns, for example, five records, it returns 5 ones. The system doesn't care about the real values in those records; it just wants to know if a row is returned.
Alternatively, a subquery’s select list that uses EXISTS
might consist of the asterisk (*). You do not need to specify column names, because the query tests for the existence or nonexistence of records that meet the conditions specified in the subquery.
EXISTS (SELECT * WHERE ...)
Notes
-
If EXISTS (
subquery
)
returns at least 1 row, the result is TRUE.
-
If EXISTS (
subquery
)
returns no rows, the result is FALSE.
-
If NOT EXISTS (subquery)
returns at least 1 row, the result is FALSE.
-
If NOT EXISTS (subquery)
returns no rows, the result is TRUE.
Examples
The following query retrieves the list of all the customers who purchased anything from any of the stores amounting to more than 550 dollars:
=> SELECT customer_key, customer_name, customer_state
FROM public.customer_dimension WHERE EXISTS
(SELECT 1 FROM store.store_sales_fact
WHERE customer_key = public.customer_dimension.customer_key
AND sales_dollar_amount > 550)
AND customer_state = 'MA' ORDER BY customer_key;
customer_key | customer_name | customer_state
--------------+--------------------+----------------
14818 | William X. Nielson | MA
18705 | James J. Goldberg | MA
30231 | Sarah N. McCabe | MA
48353 | Mark L. Brown | MA
(4 rows)
Whether you use EXISTS or IN subqueries depends on which predicates you select in outer and inner query blocks. For example, to get a list of all the orders placed by all stores on January 2, 2003 for vendors with records in the vendor table:
=> SELECT store_key, order_number, date_ordered
FROM store.store_orders_fact WHERE EXISTS
(SELECT 1 FROM public.vendor_dimension
WHERE public.vendor_dimension.vendor_key = store.store_orders_fact.vendor_key)
AND date_ordered = '2012-01-02';
store_key | order_number | date_ordered
-----------+--------------+--------------
37 | 2559 | 2012-01-02
16 | 552 | 2012-01-02
35 | 1156 | 2012-01-02
13 | 3885 | 2012-01-02
25 | 554 | 2012-01-02
21 | 2687 | 2012-01-02
49 | 3251 | 2012-01-02
19 | 2922 | 2012-01-02
26 | 1329 | 2012-01-02
40 | 1183 | 2012-01-02
(10 rows)
The above query looks for existence of the vendor and date ordered. To return a particular value, rather than simple existence, the query looks for orders placed by the vendor who got the best deal on January 4, 2004:
=> SELECT store_key, order_number, date_ordered
FROM store.store_orders_fact ord, public.vendor_dimension vd
WHERE ord.vendor_key = vd.vendor_key AND vd.deal_size IN
(SELECT MAX(deal_size) FROM public.vendor_dimension)
AND date_ordered = '2013-01-04';
store_key | order_number | date_ordered
-----------+--------------+--------------
166 | 36008 | 2013-01-04
113 | 66017 | 2013-01-04
198 | 75716 | 2013-01-04
27 | 150241 | 2013-01-04
148 | 182207 | 2013-01-04
9 | 188567 | 2013-01-04
45 | 202416 | 2013-01-04
24 | 250295 | 2013-01-04
121 | 251417 | 2013-01-04
(9 rows)
See also
1.7 - IN and NOT IN
While you cannot equate a single value to a set of values, you can check to see if a single value is found within that set of values.
While you cannot equate a single value to a set of values, you can check to see if a single value is found within that set of values. Use the IN
clause for multiple-record, single-column subqueries. After the subquery returns results introduced by IN
or NOT IN
, the outer query uses them to return the final result.
[NOT] IN
subqueries take the following form:
{ expression [ NOT ] IN ( subquery )| expression [ NOT ] IN ( expression ) }
There is no limit to the number of parameters passed to the IN
clause of the SELECT
statement; for example:
=> SELECT * FROM tablename WHERE column IN (a, b, c, d, e, ...);
Vertica also supports queries where two or more outer expressions refer to different inner expressions:
=> SELECT * FROM A WHERE (A.x,A.x) IN (SELECT B.x, B.y FROM B);
Examples
The following query uses the VMart schema to illustrate the use of outer expressions referring to different inner expressions:
=> SELECT product_description, product_price FROM product_dimension
WHERE (product_dimension.product_key, product_dimension.product_key) IN
(SELECT store.store_orders_fact.order_number,
store.store_orders_fact.quantity_ordered
FROM store.store_orders_fact);
product_description | product_price
-----------------------------+---------------
Brand #72 box of candy | 326
Brand #71 vanilla ice cream | 270
(2 rows)
To find all products supplied by stores in MA, first create the inner query and run it to ensure that it works as desired. The following query returns all stores located in MA:
=> SELECT store_key FROM store.store_dimension WHERE store_state = 'MA';
store_key
-----------
13
31
(2 rows)
Then create the outer or main query that specifies all distinct products that were sold in stores located in MA. This statement combines the inner and outer queries using the IN
predicate:
=> SELECT DISTINCT s.product_key, p.product_description
FROM store.store_sales_fact s, public.product_dimension p
WHERE s.product_key = p.product_key
AND s.product_version = p.product_version
AND s.store_key IN
(SELECT store_key
FROM store.store_dimension
WHERE store_state = 'MA')
ORDER BY s.product_key;
product_key | product_description
-------------+---------------------------------------
1 | Brand #1 white bread
1 | Brand #4 vegetable soup
3 | Brand #9 wheelchair
5 | Brand #15 cheddar cheese
5 | Brand #19 bleach
7 | Brand #22 canned green beans
7 | Brand #23 canned tomatoes
8 | Brand #24 champagne
8 | Brand #25 chicken nuggets
11 | Brand #32 sausage
... ...
(281 rows)
When using NOT IN
, the subquery returns a list of zero or more values in the outer query where the comparison column does not match any of the values returned from the subquery. Using the previous example, NOT IN
returns all the products that are not supplied from MA.
Notes
Vertica supports multicolumn NOT IN
subqueries in which the columns are not marked NOT NULL
. If one of the columns is found to contain a NULL
value during query execution, Vertica returns a run-time error.
Similarly, IN
subqueries nested within another expression are not supported if any of the column values are NULL
. For example, if in the following statement column x
from either table contained a NULL
value, Vertica returns a run-time error:
=> SELECT * FROM t1 WHERE (x IN (SELECT x FROM t2)) IS FALSE;
ERROR: NULL value found in a column used by a subquery
See also
2 - Subqueries in the SELECT list
Subqueries can occur in the select list of the containing query.
Subqueries can occur in the select list of the containing query. The results from the following statement are ordered by the first column (customer_name). You could also write ORDER BY 2
and specify that the results be ordered by the select-list subquery.
=> SELECT c.customer_name, (SELECT AVG(annual_income) FROM customer_dimension
WHERE deal_size = c.deal_size) AVG_SAL_DEAL FROM customer_dimension c
ORDER BY 1;
customer_name | AVG_SAL_DEAL
---------------+--------------
Goldstar | 603429
Metatech | 628086
Metadata | 666728
Foodstar | 695962
Verihope | 715683
Veridata | 868252
Bettercare | 879156
Foodgen | 958954
Virtacom | 991551
Inicorp | 1098835
...
Notes
-
Scalar subqueries in the select-list return a single row/column value. These subqueries use Boolean comparison operators: =, >, <, <>, <=, >=.
If the query is correlated, it returns NULL if the correlation results in 0 rows. If the query returns more than one row, the query errors out at run time and Vertica displays an error message that the scalar subquery must only return 1 row.
-
Subquery expressions such as [NOT] IN, [NOT] EXISTS, ANY/SOME, or ALL always return a single Boolean value that evaluates to TRUE, FALSE, or UNKNOWN; the subquery itself can have many rows. Most of these queries can be correlated or noncorrelated.
Note
ALL subqueries cannot be correlated.
-
Subqueries in the ORDER BY and GROUP BY clauses are supported; for example, the following statement says to order by the first column, which is the select-list subquery:
=> SELECT (SELECT MAX(x) FROM t2 WHERE y=t1.b) FROM t1 ORDER BY 1;
See also
3 - WITH clauses
WITH clauses are concomitant queries within a larger, primary query.
WITH clauses are concomitant queries within a larger, primary query. Vertica can evaluate WITH clauses in two ways:
-
Inline expansion (default): Vertica evaluates each WITH clause every time it is referenced by the primary query.
-
Materialization: Vertica evaluates each WITH clause once, stores results in a temporary table, and references this table as often as the query requires.
See WITH clause for details on syntax options and requirements.
3.1 - Inline expansion of WITH clause
By default, Vertica uses inline expansion to evaluate WITH clauses.
By default, Vertica uses inline expansion to evaluate WITH clauses. Vertica evaluates each WITH clause every time it is referenced by the primary query. Inline expansion often works best if the query does not reference the same WITH clause multiple times, or if some local optimizations are possible after inline expansion.
Example
The following example shows a WITH clause that is a good candidate for inline expansion. The WITH clause is used in a query that obtains order information for all 2018 orders shipped between December 01-07:
-- Enable materialization
ALTER SESSION SET PARAMETER WithClauseMaterialization=1;
-- Begin WITH
WITH /*+ENABLE_WITH_CLAUSE_MATERIALIZATION */
store_orders_fact_new AS(
SELECT * FROM store.store_orders_fact WHERE date_shipped between '2018-12-01' and '2018-12-07')
-- End WITH
-- Begin primary query
SELECT store_key, product_key, product_version, SUM(quantity_ordered*unit_price) AS total_price
FROM store_orders_fact_new
GROUP BY store_key, product_key, product_version
ORDER BY total_price DESC;
store_key | product_key | product_version | total_price
-----------+-------------+-----------------+-------------
135 | 14815 | 2 | 30000
154 | 19202 | 1 | 29106
232 | 1855 | 2 | 29008
20 | 4804 | 3 | 28500
11 | 16741 | 3 | 28200
169 | 12374 | 1 | 28120
50 | 9395 | 5 | 27538
34 | 8888 | 4 | 27100
142 | 10331 | 2 | 27027
106 | 18932 | 1 | 26864
190 | 8229 | 1 | 26460
198 | 8545 | 3 | 26287
38 | 17426 | 1 | 26280
5 | 10095 | 1 | 26224
41 | 2342 | 1 | 25920
87 | 5574 | 1 | 25443
219 | 15271 | 1 | 25146
60 | 14223 | 1 | 25026
97 | 16324 | 2 | 24864
234 | 17681 | 1 | 24795
195 | 16532 | 1 | 24794
83 | 9597 | 2 | 24661
149 | 7164 | 5 | 24518
142 | 11022 | 4 | 24400
202 | 12712 | 1 | 24380
13 | 18154 | 1 | 24273
7 | 3793 | 3 | 24250
...
Vertica processes the query as follows:
-
Expands the WITH clause reference to store_orders_fact_new
within the primary query.
-
After expanding the WITH clause, evaluates the primary query.
3.2 - Materialization of WITH clause
When materialization is enabled, Vertica evaluates each WITH clause once, stores results in a temporary table, and references this table as often as the query requires.
When materialization is enabled, Vertica evaluates each WITH clause once, stores results in a temporary table, and references this table as often as the query requires. Vertica drops the temporary table after primary query execution completes.
Note
If the primary query returns with an error, temporary tables might be dropped only after the client’s session ends.
Materialization can facilitate better performance when WITH clauses are complex—for example, when the WITH clauses contain JOIN and GROUP BY clauses, and are referenced multiple times in the primary query.
If materialization is enabled, WITH statements perform an auto-commit of the user transaction. This occurs even when using EXPLAIN with the WITH statement.
Enabling materialization
WITH materialization is set by configuration parameter WithClauseMaterialization, by default set to 0 (disabled). You can enable and disable materialization by setting WithClauseMaterialization at database and session levels, with ALTER DATABASE and ALTER SESSION, respectively:
-
Database:
=> ALTER DATABASE db-spec SET PARAMETER WithClauseMaterialization={ 0 | 1 };
=> ALTER DATABASE db-spec CLEAR PARAMETER WithClauseMaterialization;
-
Session: Parameter setting remains in effect until you explicitly clear it, or the session ends.
=> ALTER SESSION SET PARAMETER WithClauseMaterialization={ 0 | 1 };
=> ALTER SESSION CLEAR PARAMETER WithClauseMaterialization;
You can also enable WITH materialization for individual queries with the hint ENABLE_WITH_CLAUSE_MATERIALIZATION. Materialization is automatically cleared when the query returns. For example:
=> WITH /*+ENABLE_WITH_CLAUSE_MATERIALIZATION */ revenue AS (
SELECT vendor_key, SUM(total_order_cost) AS total_revenue
FROM store.store_orders_fact
GROUP BY vendor_key ORDER BY 1)
...
EE5 temp relation support for materialized WITH clause
By default, when WITH clause queries are reused, Vertica saves those WITH clause query outputs in EE5 temp relations. However, this option can be changed. EE5 temp relation support for WITH materialization is set by configuration parameter EnableWITHTempRelReuseLimit, which can be set in the following ways:
-
0: Disables this feature.
-
1: Force-saves all WITH clause queries into EE5 temp relations, whether or not they are reused.
-
2 (default): Enables this feature only when queries are reused.
EnableWITHTempRelReuseLimit can be set at database and session levels, with ALTER DATABASE and ALTER SESSION, respectively.
Example
The following example shows a WITH clause that is a good candidate for materialization. The query obtains data for the vendor who has the highest combined order cost for all orders:
-- Enable materialization
=> ALTER SESSION SET PARAMETER WithClauseMaterialization=1;
-- Define WITH clause
=> WITH revenue AS (
SELECT vendor_key, SUM(total_order_cost) AS total_revenue
FROM store.store_orders_fact
GROUP BY vendor_key ORDER BY 1)
-- End WITH clause
-- Primary query
=> SELECT vendor_name, vendor_address, vendor_city, total_revenue
FROM vendor_dimension v, revenue r
WHERE v.vendor_key = r.vendor_key AND total_revenue = (SELECT MAX(total_revenue) FROM revenue )
ORDER BY vendor_name;
vendor_name | vendor_address | vendor_city | total_revenue
------------------+----------------+-------------+---------------
Frozen Suppliers | 471 Mission St | Peoria | 49877044
(1 row)
Vertica processes this query as follows:
-
WITH clause revenue
evaluates its SELECT statement from table store.store_orders_fact
.
-
Results of the revenue
clause are stored in a local temporary table.
-
Whenever the revenue
clause statement is referenced, the results stored in the table are used.
-
The temporary table is dropped when query execution is complete.
3.3 - WITH clause recursion
For example:.
A WITH clause that includes the RECURSIVE option iterates over its own output through repeated execution of a UNION or UNION ALL query. Recursive queries are useful when working with self-referential data—hierarchies such as manager-subordinate relationships, or tree-structured data such as taxonomies.
The configuration parameter WithClauseRecursionLimit—by default set to 8—sets the maximum depth of recursion. You can set this parameter at database and session scopes with ALTER DATABASE and ALTER SESSION, respectively. Recursion continues until it reaches the configured maximum depth, or until the last iteration returns with no data.
You specify a recursive WITH clause as follows:
WITH [ /*+ENABLE_WITH_CLAUSE_MATERIALIZATION*/ ] RECURSIVE
cte-identifier [ ( column-aliases ) ] AS (
non-recursive-term
UNION [ ALL ]
recursive-term
)
Non-recursive and recursive terms are separated by UNION or UNION ALL:
-
The non-recursive-term
query sets its result set in cte-identifier
, which is subject to recursion in recursive-term
.
-
The UNION statement's recursive-term
recursively iterates over its own output. When recursion is complete, the results of all iterations are compiled and set in cte-identifier
.
For example:
=> ALTER SESSION SET PARAMETER WithClauseRecursionLimit=4; -- maximum recursion depth = 4
=> WITH RECURSIVE nums (n) AS (
SELECT 1 -- non-recursive (base) term
UNION ALL
SELECT n+1 FROM nums -- recursive term
)
SELECT n FROM nums; -- primary query
This simple query executes as follows:
-
Executes the WITH RECURSIVE clause:
-
Evaluates the non-recursive term SELECT 1, and places the result set—1—in nums
.
-
Iterates over the UNION ALL query (SELECT n+1) until the number of iterations is greater than the configuration parameter WithClauseRecursionLimit.
-
Combines the results of all UNION queries and sets the result set in nums
, and then exits to the primary query.
-
Executes the primary query SELECT n FROM nums:
n
---
1
2
3
4
5
(5 rows)
In this case , WITH RECURSIVE clause exits after four iterations as per WithClauseRecursionLimit. If you restore WithClauseRecursionLimit to its default value of 8, then the clause exits after eight iterations:
=> ALTER SESSION CLEAR PARAMETER WithClauseRecursionLimit;
=> WITH RECURSIVE nums (n) AS (
SELECT 1
UNION ALL
SELECT n+1 FROM nums
)
SELECT n FROM nums;
n
---
1
2
3
4
5
6
7
8
9
(9 rows)
Important
Be careful to set WithClauseRecursionLimit only as high as needed to traverse the deepest hierarchies. Vertica sets no limit on this parameter; however, a high value can incur considerable overhead that adversely affects performance and exhausts system resources.
If a high recursion count is required, then consider enabling materialization. For details, see WITH RECURSIVE Materialization.
Restrictions
The following restrictions apply:
-
The SELECT list of a non-recursive term cannot include the wildcard *
(asterisk) or the function MATCH_COLUMNS.
-
A recursive term can reference the target CTE only once.
-
Recursive reference cannot appear within an outer join.
-
Recursive reference cannot appear within a subquery.
-
WITH clauses do not support UNION options ORDER BY, LIMIT, and OFFSET.
Examples
A small software company maintains the following data on employees and their managers:
=> SELECT * FROM personnel.employees ORDER BY emp_id;
emp_id | fname | lname | section_id | section_name | section_leader | leader_id
--------+-----------+-----------+------------+---------------------+------------------+-----------
0 | Stephen | Mulligan | 0 | | |
1 | Michael | North | 201 | Development | Zoe Black | 3
2 | Megan | Berry | 202 | QA | Richard Chan | 18
3 | Zoe | Black | 101 | Product Development | Renuka Patil | 24
4 | Tim | James | 203 | IT | Ebuka Udechukwu | 17
5 | Bella | Tucker | 201 | Development | Zoe Black | 3
6 | Alexandra | Climo | 202 | QA | Richard Chan | 18
7 | Leonard | Gray | 203 | IT | Ebuka Udechukwu | 17
8 | Carolyn | Henderson | 201 | Development | Zoe Black | 3
9 | Ryan | Henderson | 201 | Development | Zoe Black | 3
10 | Frank | Tucker | 205 | Sales | Benjamin Glover | 29
11 | Nathan | Ferguson | 102 | Sales Marketing | Eric Redfield | 28
12 | Kevin | Rampling | 101 | Product Development | Renuka Patil | 24
13 | Tuy Kim | Duong | 201 | Development | Zoe Black | 3
14 | Dwipendra | Sing | 204 | Tech Support | Sarah Feldman | 26
15 | Dylan | Wijman | 206 | Documentation | Kevin Rampling | 12
16 | Tamar | Sasson | 207 | Marketing | Nathan Ferguson | 11
17 | Ebuka | Udechukwu | 101 | Product Development | Renuka Patil | 24
18 | Richard | Chan | 101 | Product Development | Renuka Patil | 24
19 | Maria | del Rio | 201 | Development | Zoe Black | 3
20 | Hua | Song | 204 | Tech Support | Sarah Feldman | 26
21 | Carmen | Lopez | 204 | Tech Support | Sarah Feldman | 26
22 | Edgar | Mejia | 206 | Documentation | Kevin Rampling | 12
23 | Riad | Salim | 201 | Development | Zoe Black | 3
24 | Renuka | Patil | 100 | Executive Office | Stephen Mulligan | 0
25 | Rina | Dsouza | 202 | QA | Richard Chan | 18
26 | Sarah | Feldman | 101 | Product Development | Renuka Patil | 24
27 | Max | Mills | 102 | Sales Marketing | Eric Redfield | 28
28 | Eric | Redfield | 100 | Executive Office | Stephen Mulligan | 0
29 | Benjamin | Glover | 102 | Sales Marketing | Eric Redfield | 28
30 | Dominic | King | 205 | Sales | Benjamin Glover | 29
32 | Ryan | Metcalfe | 206 | Documentation | Kevin Rampling | 12
33 | Piers | Paige | 201 | Development | Zoe Black | 3
34 | Nicola | Kelly | 207 | Marketing | Nathan Ferguson | 11
(34 rows)
You can query this data for employee-manager relationships through WITH RECURSIVE. For example, the following query's WITH RECURSIVE clause gets employee-manager relationships for employee Eric Redfield, including all employees who report directly and indirectly to him:
WITH RECURSIVE managers (employeeID, employeeName, sectionID, section, lead, leadID)
AS (SELECT emp_id, fname||' '||lname, section_id, section_name, section_leader, leader_id
FROM personnel.employees WHERE fname||' '||lname = 'Eric Redfield'
UNION
SELECT emp_id, fname||' '||lname AS employee_name, section_id, section_name, section_leader, leader_id FROM personnel.employees e
JOIN managers m ON m.employeeID = e.leader_id)
SELECT employeeID, employeeName, lead AS 'Reports to', section, leadID from managers ORDER BY sectionID, employeeName;
The WITH RECURSIVE clause defines the CTE managers
, and then executes in two phases:
-
The non-recursive term populates managers
with data that it queries from personnel.employees
.
-
The recursive term's UNION query iterates over its own output until, on the fourth cycle, it finds no more data. The results of all iterations are then compiled and set in managers
, and the WITH CLAUSE exits to the primary query.
The primary query returns three levels of data from managers
—one for each recursive iteration:
Similarly, the following query iterates over the same data to get all employee-manager relationships for employee Richard Chan, who is one level lower in the company chain of command:
WITH RECURSIVE managers (employeeID, employeeName, sectionID, section, lead, leadID)
AS (SELECT emp_id, fname||' '||lname, section_id, section_name, section_leader, leader_id
FROM personnel.employees WHERE fname||' '||lname = 'Richard Chan'
UNION
SELECT emp_id, fname||' '||lname AS employee_name, section_id, section_name, section_leader, leader_id FROM personnel.employees e
JOIN managers m ON m.employeeID = e.leader_id)
SELECT employeeID, employeeName, lead AS 'Reports to', section, leadID from managers ORDER BY sectionID, employeeName;
The WITH RECURSIVE clause executes as before, except this time it finds no more data after two iterations and exits. Accordingly, the primary query returns two levels of data from managers
:
WITH RECURSIVE materialization
By default, materialization is disabled. In this case, Vertica rewrites the WITH RECURSIVE query into subqueries, as many as necessary for the required level of recursion.
If recursion is very deep, the high number of query rewrites is liable to incur considerable overhead that adversely affects performance and exhausts system resources. In this case, consider enabling materialization, either with the configuration parameter WithClauseMaterialization, or the hint ENABLE_WITH_CLAUSE_MATERIALIZATION. In either case, intermediate result sets from all recursion levels are written to local temporary tables. When recursion is complete, the intermediate results in all temporary tables are compiled and passed on to the primary query.
Note
If materialization is not possible, you can improve throughput on a resource pool that handles recursive queries by setting its
EXECUTIONPARALLELISM parameter to 1.
4 - Noncorrelated and correlated subqueries
Subqueries can be categorized into two types:.
Subqueries can be categorized into two types:
A noncorrelated subquery executes independently of the outer query. The subquery executes first, and then passes its results to the outer query, For example:
=> SELECT name, street, city, state FROM addresses WHERE state IN (SELECT state FROM states);
Vertica executes this query as follows:
-
Executes the subquery SELECT state FROM states
(in bold).
-
Passes the subquery results to the outer query.
A query's WHERE
and HAVING
clauses can specify noncorrelated subqueries if the subquery resolves to a single row, as shown below:
In WHERE clause
=> SELECT COUNT(*) FROM SubQ1 WHERE SubQ1.a = (SELECT y from SubQ2);
In HAVING clause
=> SELECT COUNT(*) FROM SubQ1 GROUP BY SubQ1.a HAVING SubQ1.a = (SubQ1.a & (SELECT y from SubQ2)
A correlated subquery typically obtains values from its outer query before it executes. When the subquery returns, it passes its results to the outer query.
Note
You can use an outer join to obtain the same effect as a correlated subquery.
In the following example, the subquery needs values from the addresses.state
column in the outer query:
=> SELECT name, street, city, state FROM addresses
WHERE EXISTS (SELECT * FROM states WHERE states.state = addresses.state);
Vertica executes this query as follows:
-
The query extracts and evaluates each addresses.state
value in the outer subquery records.
-
Then the query—using the EXISTS predicate—checks the addresses in the inner (correlated) subquery.
-
Because it uses the EXISTS predicate, the query stops processing when it finds the first match.
When Vertica executes this query, it translates the full query into a JOIN WITH SIPS.
5 - Flattening FROM clause subqueries
FROM clause subqueries are always evaluated before their containing query.
FROM
clause subqueries are always evaluated before their containing query. In some cases, the optimizer flattens FROM
clause subqueries so the query can execute more efficiently.
For example, in order to create a query plan for the following statement, the Vertica query optimizer evaluates all records in table t1
before it evaluates the records in table t0
:
=> SELECT * FROM (SELECT a, MAX(a) AS max FROM (SELECT * FROM t1) AS t0 GROUP BY a);
Given the previous query, the optimizer can internally flatten it as follows:
=> SELECT * FROM (SELECT a, MAX(a) FROM t1 GROUP BY a) AS t0;
Both queries return the same results, but the flattened query runs more quickly.
Flattening views
When a query's FROM
clause specifies a view, the optimizer expands the view by replacing it with the query that the view encapsulates. If the view contains subqueries that are eligible for flattening, the optimizer produces a query plan that flattens those subqueries.
Flattening restrictions
The optimizer cannot create a flattened query plan if a subquery or view contains one of the following elements:
-
Aggregate function
-
Analytic function
-
Outer join (left, right or full)
-
GROUP BY
, ORDER BY
, or HAVING
clause
-
DISTINCT
keyword
-
LIMIT
or OFFSET
clause
-
UNION
, EXCEPT
, or INTERSECT
clause
-
EXISTS
subquery
Examples
If a predicate applies to a view or subquery, the flattening operation can allow for optimizations by evaluating the predicates before the flattening takes place. Two examples follow.
View flattening
In this example, view v1
is defined as follows:
=> CREATE VIEW v1 AS SELECT * FROM a;
The following query specifies this view:
=> SELECT * FROM v1 JOIN b ON x=y WHERE x > 10;
Without flattening, the optimizer evaluates the query as follows:
-
Evalutes the subquery.
-
Applies the predicate WHERE x > 10
.
In contrast, the optimizer can create a flattened query plan by applying the predicate before evaluating the subquery. This reduces the optimizer's work because it returns only the records WHERE x > 10
to the containing query.
Vertica internally transforms the previous query as follows:
=> SELECT * FROM (SELECT * FROM a) AS t1 JOIN b ON x=y WHERE x > 10;
The optimizer then flattens the query:
=> SELECT * FROM a JOIN b ON x=y WHERE x > 10;
Subquery flattening
The following example shows how Vertica transforms FROM
clause subqueries within a WHERE
clause IN
subquery. Given the following query:
=> SELECT * FROM a
WHERE b IN (SELECT b FROM (SELECT * FROM t2)) AS D WHERE x=1;
The optimizer flattens it as follows:
=> SELECT * FROM a
WHERE b IN (SELECT b FROM t2) AS D WHERE x=1;
See also
Subquery restrictions
6 - Subqueries in UPDATE and DELETE statements
You can nest subqueries within UPDATE and DELETE statements.
You can nest subqueries within UPDATE and DELETE statements.
UPDATE subqueries
You can update records in one table according to values in others, by nesting a subquery within an UPDATE
statement. The example below illustrates this through a couple of noncorrelated subqueries. You can reproduce this example with the following tables:
=> CREATE TABLE addresses(cust_id INTEGER, address VARCHAR(2000));
CREATE TABLE
dbadmin=> INSERT INTO addresses VALUES(20,'Lincoln Street'),(30,'Booth Hill Road'),(30,'Beach Avenue'),(40,'Mt. Vernon Street'),(50,'Hillside Avenue');
OUTPUT
--------
5
(1 row)
=> CREATE TABLE new_addresses(new_cust_id integer, new_address Boolean DEFAULT 'T');
CREATE TABLE
dbadmin=> INSERT INTO new_addresses VALUES (20),(30),(80);
OUTPUT
--------
3
(1 row)
=> INSERT INTO new_addresses VALUES (60,'F');
OUTPUT
--------
1
=> COMMIT;
COMMIT
Queries on these tables return the following results:
=> SELECT * FROM addresses;
cust_id | address
---------+-------------------
20 | Lincoln Street
30 | Beach Avenue
30 | Booth Hill Road
40 | Mt. Vernon Street
50 | Hillside Avenue
(5 rows)
=> SELECT * FROM new_addresses;
new_cust_id | new_address
-------------+-------------
20 | t
30 | t
80 | t
60 | f
(4 rows)
-
The following UPDATE statement uses a noncorrelated subquery to join new_addresses
and addresses
records on customer IDs. UPDATE sets the value 'New Address' in the joined addresses
records. The statement output indicates that three rows were updated:
=> UPDATE addresses SET address='New Address'
WHERE cust_id IN (SELECT new_cust_id FROM new_addresses WHERE new_address='T');
OUTPUT
--------
3
(1 row)
-
Query the addresses
table to see the changes for matching customer ID 20 and 30. Addresses for customer ID 40 and 50 are not updated:
=> SELECT * FROM addresses;
cust_id | address
---------+-------------------
40 | Mt. Vernon Street
50 | Hillside Avenue
20 | New Address
30 | New Address
30 | New Address
(5 rows)
=>COMMIT;
COMMIT
DELETE subqueries
You can delete records in one table based according to values in others by nesting a subquery within a DELETE statement.
For example, you want to remove records from new_addresses
that were used earlier to update records in addresses
. The following DELETE statement uses a noncorrelated subquery to join new_addresses
and addresses
records on customer IDs. It then deletes the joined records from table new_addresses
:
=> DELETE FROM new_addresses
WHERE new_cust_id IN (SELECT cust_id FROM addresses WHERE address='New Address');
OUTPUT
--------
2
(1 row)
=> COMMIT;
COMMIT
Querying new_addresses
confirms that the records were deleted:
=> SELECT * FROM new_addresses;
new_cust_id | new_address
-------------+-------------
60 | f
80 | t
(2 rows)
7 - Subquery examples
This topic illustrates some of the subqueries you can write.
This topic illustrates some of the subqueries you can write. The examples use the VMart example database.
Single-row subqueries
Single-row subqueries are used with single-row comparison operators (=, >=, <=, <>, and <=>) and return exactly one row.
For example, the following query retrieves the name and hire date of the oldest employee in the Vmart database:
=> SELECT employee_key, employee_first_name, employee_last_name, hire_date
FROM employee_dimension
WHERE hire_date = (SELECT MIN(hire_date) FROM employee_dimension);
employee_key | employee_first_name | employee_last_name | hire_date
--------------+---------------------+--------------------+------------
2292 | Mary | Bauer | 1956-01-11
(1 row)
Multiple-row subqueries
Multiple-row subqueries return multiple records.
For example, the following IN clause subquery returns the names of the employees making the highest salary in each of the six regions:
=> SELECT employee_first_name, employee_last_name, annual_salary, employee_region
FROM employee_dimension WHERE annual_salary IN
(SELECT MAX(annual_salary) FROM employee_dimension GROUP BY employee_region)
ORDER BY annual_salary DESC;
employee_first_name | employee_last_name | annual_salary | employee_region
---------------------+--------------------+---------------+-------------------
Alexandra | Sanchez | 992363 | West
Mark | Vogel | 983634 | South
Tiffany | Vu | 977716 | SouthWest
Barbara | Lewis | 957949 | MidWest
Sally | Gauthier | 927335 | East
Wendy | Nielson | 777037 | NorthWest
(6 rows)
Multicolumn subqueries
Multicolumn subqueries return one or more columns. Sometimes a subquery's result set is evaluated in the containing query in column-to-column and row-to-row comparisons.
Note
Multicolumn subqueries can use the <>, !=, and = operators but not the <, >, <=, >= operators.
You can substitute some multicolumn subqueries with a join, with the reverse being true as well. For example, the following two queries ask for the sales transactions of all products sold online to customers located in Massachusetts and return the same result set. The only difference is the first query is written as a join and the second is written as a subquery.
Join query: |
Subquery: |
=> SELECT *
FROM online_sales.online_sales_fact
INNER JOIN public.customer_dimension
USING (customer_key)
WHERE customer_state = 'MA';
|
=> SELECT *
FROM online_sales.online_sales_fact
WHERE customer_key IN
(SELECT customer_key
FROM public.customer_dimension
WHERE customer_state = 'MA');
|
The following query returns all employees in each region whose salary is above the average:
=> SELECT e.employee_first_name, e.employee_last_name, e.annual_salary,
e.employee_region, s.average
FROM employee_dimension e,
(SELECT employee_region, AVG(annual_salary) AS average
FROM employee_dimension GROUP BY employee_region) AS s
WHERE e.employee_region = s.employee_region AND e.annual_salary > s.average
ORDER BY annual_salary DESC;
employee_first_name | employee_last_name | annual_salary | employee_region | average
---------------------+--------------------+---------------+-----------------+------------------
Doug | Overstreet | 995533 | East | 61192.786013986
Matt | Gauthier | 988807 | South | 57337.8638902996
Lauren | Nguyen | 968625 | West | 56848.4274914089
Jack | Campbell | 963914 | West | 56848.4274914089
William | Martin | 943477 | NorthWest | 58928.2276119403
Luigi | Campbell | 939255 | MidWest | 59614.9170454545
Sarah | Brown | 901619 | South | 57337.8638902996
Craig | Goldberg | 895836 | East | 61192.786013986
Sam | Vu | 889841 | MidWest | 59614.9170454545
Luigi | Sanchez | 885078 | MidWest | 59614.9170454545
Michael | Weaver | 882685 | South | 57337.8638902996
Doug | Pavlov | 881443 | SouthWest | 57187.2510548523
Ruth | McNulty | 874897 | East | 61192.786013986
Luigi | Dobisz | 868213 | West | 56848.4274914089
Laura | Lang | 865829 | East | 61192.786013986
...
You can also use the EXCEPT, INTERSECT, and UNION [ALL] keywords in FROM, WHERE, and HAVING clauses.
The following subquery returns information about all Connecticut-based customers who bought items through either stores or online sales channel and whose purchases amounted to more than 500 dollars:
=> SELECT DISTINCT customer_key, customer_name FROM public.customer_dimension
WHERE customer_key IN (SELECT customer_key FROM store.store_sales_fact
WHERE sales_dollar_amount > 500
UNION ALL
SELECT customer_key FROM online_sales.online_sales_fact
WHERE sales_dollar_amount > 500)
AND customer_state = 'CT';
customer_key | customer_name
--------------+------------------
200 | Carla Y. Kramer
733 | Mary Z. Vogel
931 | Lauren X. Roy
1533 | James C. Vu
2948 | Infocare
4909 | Matt Z. Winkler
5311 | John Z. Goldberg
5520 | Laura M. Martin
5623 | Daniel R. Kramer
6759 | Daniel Q. Nguyen
...
HAVING clause subqueries
A HAVING clause is used in conjunction with the GROUP BY clause to filter the select-list records that a GROUP BY returns. HAVING clause subqueries must use Boolean comparison operators: =, >, <, <>, <=, >= and take the following form:
SELECT <column, ...>
FROM <table>
GROUP BY <expression>
HAVING <expression>
(SELECT <column, ...>
FROM <table>
HAVING <expression>);
For example, the following statement uses the VMart database and returns the number of customers who purchased lowfat products. Note that the GROUP BY clause is required because the query uses an aggregate (COUNT).
=> SELECT s.product_key, COUNT(s.customer_key) FROM store.store_sales_fact s
GROUP BY s.product_key HAVING s.product_key IN
(SELECT product_key FROM product_dimension WHERE diet_type = 'Low Fat');
The subquery first returns the product keys for all low-fat products, and the outer query then counts the total number of customers who purchased those products.
product_key | count
-------------+-------
15 | 2
41 | 1
66 | 1
106 | 1
118 | 1
169 | 1
181 | 2
184 | 2
186 | 2
211 | 1
229 | 1
267 | 1
289 | 1
334 | 2
336 | 1
(15 rows)
8 - Subquery restrictions
The following restrictions apply to Vertica subqueries:.
The following restrictions apply to Vertica subqueries:
-
Subqueries are not allowed in the defining query of a
CREATE PROJECTION
statement.
-
Subqueries can be used in the SELECT
list, but GROUP BY
or aggregate functions are not allowed in the query if the subquery is not part of the GROUP BY
clause in the containing query. For example, the following two statement returns an error message:
=> SELECT y, (SELECT MAX(a) FROM t1) FROM t2 GROUP BY y;
ERROR: subqueries in the SELECT or ORDER BY are not supported if the
subquery is not part of the GROUP BY
=> SELECT MAX(y), (SELECT MAX(a) FROM t1) FROM t2;
ERROR: subqueries in the SELECT or ORDER BY are not supported if the
query has aggregates and the subquery is not part of the GROUP BY
-
Subqueries are supported within UPDATE
statements with the following exceptions:
-
FROM
clause subqueries require an alias but tables do not. If the table has no alias, the query must refer its columns as table-name
.column-name
. However, column names that are unique among all tables in the query do not need to be qualified by their table name.
-
If the ORDER BY
clause is inside a FROM
clause subquery, rather than in the containing query, the query is liable to return unexpected sort results. This occurs because Vertica data comes from multiple nodes, so sort order cannot be guaranteed unless the outer query block specifies an ORDER BY
clause. This behavior complies with the SQL standard, but it might differ from other databases.
-
Multicolumn subqueries cannot use the <, >, <=, >= comparison operators. They can use <>, !=, and = operators.
-
WHERE
and HAVING
clause subqueries must use Boolean comparison operators: =, >, <, <>, <=, >=. Those subqueries can be noncorrelated and correlated.
-
[NOT] IN
and ANY
subqueries nested in another expression are not supported if any of the column values are NULL. In the following statement, for example, if column x from either table t1
or t2
contains a NULL value, Vertica returns a run-time error:
=> SELECT * FROM t1 WHERE (x IN (SELECT x FROM t2)) IS FALSE;
ERROR: NULL value found in a column used by a subquery
-
Vertica returns an error message during subquery run time on scalar subqueries that return more than one row.
-
Aggregates and GROUP BY clauses are allowed in subqueries, as long as those subqueries are not correlated.
-
Correlated expressions under ALL
and [NOT] IN
are not supported.
-
Correlated expressions under OR
are not supported.
-
Multiple correlations are allowed only for subqueries that are joined with an equality (=) predicate. However, IN
/NOT IN
, EXISTS
/NOT EXISTS
predicates within correlated subqueries are not allowed:
=> SELECT t2.x, t2.y, t2.z FROM t2 WHERE t2.z NOT IN
(SELECT t1.z FROM t1 WHERE t1.x = t2.x);
ERROR: Correlated subquery with NOT IN is not supported
-
Up to one level of correlated subqueries is allowed in the WHERE
clause if the subquery references columns in the immediate outer query block. For example, the following query is not supported because the t2.x = t3.x
subquery can only refer to table t1
in the outer query, making it a correlated expression because t3.x
is two levels out:
=> SELECT t3.x, t3.y, t3.z FROM t3 WHERE t3.z IN (
SELECT t1.z FROM t1 WHERE EXISTS (
SELECT 'x' FROM t2 WHERE t2.x = t3.x) AND t1.x = t3.x);
ERROR: More than one level correlated subqueries are not supported
The query is supported if it is rewritten as follows:
=> SELECT t3.x, t3.y, t3.z FROM t3 WHERE t3.z IN
(SELECT t1.z FROM t1 WHERE EXISTS
(SELECT 'x' FROM t2 WHERE t2.x = t1.x)
AND t1.x = t3.x);