This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Subqueries

A subquery is a SELECT statement embedded within another SELECT statement.

A subquery is a SELECT statement embedded within another SELECT statement. The embedded subquery is often referenced as the query's inner statement, while the containing query is typically referenced as the query's statement, or outer query block. A subquery returns data that the outer query uses as a condition to determine what data to retrieve. There is no limit to the number of nested subqueries you can create.

Like any query, a subquery returns records from a table that might consist of a single column and record, a single column with multiple records, or multiple columns and records. Subqueries can be noncorrelated or correlated. You can also use them to update or delete table records, based on values in other database tables.

1 - Subqueries used in search conditions

Subqueries are used as search conditions in order to filter results.

Subqueries are used as search conditions in order to filter results. They specify the conditions for the rows returned from the containing query's select-list, a query expression, or the subquery itself. The operation evaluates to TRUE, FALSE, or UNKNOWN (NULL).

Syntax

search-condition {
    [ { AND | OR | NOT } {  predicate | ( search-condition ) } ]
   }[,... ]
 predicate
     { expression comparison-operator expression
         | string-expression [ NOT ] { LIKE | ILIKE | LIKEB | ILIKEB } string-expression
         | expression IS [ NOT ] NULL
         | expression [ NOT ] IN ( subquery | expression[,... ] )
         | expression comparison-operator [ ANY | SOME ] ( subquery )
         | expression comparison-operator ALL ( subquery )
         | expression OR ( subquery )
         | [ NOT ] EXISTS ( subquery )
         | [ NOT ] IN ( subquery )
     }

Arguments

search-condition

Specifies the search conditions for the rows returned from one of the following:

  • Containing query's select-list

  • Query expression

  • Subquery

If the subquery is used with an UPDATE or DELETE statement, UPDATE specifies the rows to update and DELETE specifies the rows to delete.

{ AND | OR | NOT }

Logical operators:

  • AND: Combines two conditions and evaluates to TRUE when both of the conditions are TRUE.

  • OR: Combines two conditions and evaluates to TRUE when either condition is TRUE.

  • NOT: Negates the Boolean expression specified by the predicate.

predicate An expression that returns TRUE, FALSE, or UNKNOWN (NULL).
expression A column name, constant, function, or scalar subquery, or combination of column names, constants, and functions connected by operators or subqueries.
comparison-operator

An operator that tests conditions between two expressions, one of the following:

  • <: less than

  • >: greater than

  • <=: less than or equal

  • >=: greater than or equal

  • =: equal; returns UNKNOWN if either expression does

  • <=>: Like the = operator, but returns TRUE (instead of UNKNOWN) if both expressions evaluate to UNKNOWN, and FALSE (instead of UNKNOWN) if one expression evaluates to UNKNOWN.

  • <>: not equal

  • !=: not equal

string-expression A character string with optional wildcard (*) characters.
[ NOT ] { LIKE | ILIKE | LIKEB | ILIKEB } Indicates that the character string following the predicate is to be used (or not used) for pattern matching.
IS [ NOT ] NULL Searches for values that are null or are not null.
ALL Used with a comparison operator and a subquery. Returns TRUE for the left-hand predicate if all values returned by the subquery satisfy the comparison operation, or FALSE if not all values satisfy the comparison or if the subquery returns no rows to the outer query block.
ANY | SOME ANY and SOME are synonyms and are used with a comparison operator and a subquery. Either returns TRUE for the left-hand predicate if any value returned by the subquery satisfies the comparison operation, or FALSE if no values in the subquery satisfy the comparison or if the subquery returns no rows to the outer query block. Otherwise, the expression is UNKNOWN.
[ NOT ] EXISTS Used with a subquery to test for the existence of records that the subquery returns.
[ NOT ] IN Searches for an expression on the basis of an expression's exclusion or inclusion from a list. The list of values is enclosed in parentheses and can be a subquery or a set of constants.

Expressions as subqueries

Subqueries that return a single value (unlike a list of values returned by IN subqueries) can generally be used anywhere an expression is allowed in SQL: a column name, constant, function, scalar subquery, or a combination of column names, constants, and functions connected by operators or subqueries.

For example:

=> SELECT c1 FROM t1 WHERE c1 = ANY (SELECT c1 FROM t2) ORDER BY c1;
=> SELECT c1 FROM t1 WHERE COALESCE((t1.c1 > ANY (SELECT c1 FROM t2)), TRUE);
=> SELECT c1 FROM t1 GROUP BY c1 HAVING
     COALESCE((t1.c1 <> ALL (SELECT c1 FROM t2)), TRUE);

Multi-column expressions are also supported:

=> SELECT c1 FROM t1 WHERE (t1.c1, t1.c2) = ALL (SELECT c1, c2 FROM t2);
=> SELECT c1 FROM t1 WHERE (t1.c1, t1.c2) <> ANY (SELECT c1, c2 FROM t2);

Vertica returns an error on queries where more than one row would be returned by any subquery used as an expression:

=> SELECT c1 FROM t1 WHERE c1 = (SELECT c1 FROM t2) ORDER BY c1;
   ERROR:  more than one row returned by a subquery used as an expression

See also

2 - Subqueries in the SELECT list

Subqueries can occur in the select list of the containing query.

Subqueries can occur in the select list of the containing query. The results from the following statement are ordered by the first column (customer_name). You could also write ORDER BY 2 and specify that the results be ordered by the select-list subquery.

=> SELECT c.customer_name, (SELECT AVG(annual_income) FROM customer_dimension
    WHERE deal_size = c.deal_size) AVG_SAL_DEAL FROM customer_dimension c
     ORDER BY 1;
 customer_name | AVG_SAL_DEAL
---------------+--------------
 Goldstar      |       603429
 Metatech      |       628086
 Metadata      |       666728
 Foodstar      |       695962
 Verihope      |       715683
 Veridata      |       868252
 Bettercare    |       879156
 Foodgen       |       958954
 Virtacom      |       991551
 Inicorp       |      1098835
...

Notes

  • Scalar subqueries in the select-list return a single row/column value. These subqueries use Boolean comparison operators: =, >, <, <>, <=, >=.

    If the query is correlated, it returns NULL if the correlation results in 0 rows. If the query returns more than one row, the query errors out at run time and Vertica displays an error message that the scalar subquery must only return 1 row.

  • Subquery expressions such as [NOT] IN, [NOT] EXISTS, ANY/SOME, or ALL always return a single Boolean value that evaluates to TRUE, FALSE, or UNKNOWN; the subquery itself can have many rows. Most of these queries can be correlated or noncorrelated.

  • Subqueries in the ORDER BY and GROUP BY clauses are supported; for example, the following statement says to order by the first column, which is the select-list subquery:

    => SELECT (SELECT MAX(x) FROM t2 WHERE y=t1.b) FROM t1 ORDER BY 1;
    

See also

3 - Noncorrelated and correlated subqueries

Subqueries can be categorized into two types:.

Subqueries can be categorized into two types:

  • A noncorrelated subquery obtains its results independently of its containing (outer) statement.

  • A correlated subquery requires values from its outer query in order to execute.

Noncorrelated subqueries

A noncorrelated subquery executes independently of the outer query. The subquery executes first, and then passes its results to the outer query, For example:

=> SELECT name, street, city, state FROM addresses WHERE state IN (SELECT state FROM states);

Vertica executes this query as follows:

  1. Executes the subquery SELECT state FROM states (in bold).

  2. Passes the subquery results to the outer query.

A query's WHERE and HAVING clauses can specify noncorrelated subqueries if the subquery resolves to a single row, as shown below:

In WHERE clause

=> SELECT COUNT(*) FROM SubQ1 WHERE SubQ1.a = (SELECT y from SubQ2);

In HAVING clause

=> SELECT COUNT(*) FROM SubQ1 GROUP BY SubQ1.a HAVING SubQ1.a = (SubQ1.a & (SELECT y from SubQ2)

Correlated subqueries

A correlated subquery typically obtains values from its outer query before it executes. When the subquery returns, it passes its results to the outer query. Correlated subqueries generally conform to the following format:

SELECT outer-column[,...] FROM t1 outer
   WHERE outer-column comparison-operator
      (SELECT sq-column[,...] FROM t2 sq
         WHERE sq.expr = outer.expr);

In the following example, the subquery needs values from the addresses.state column in the outer query:

=> SELECT name, street, city, state FROM addresses
     WHERE EXISTS (SELECT * FROM states WHERE states.state = addresses.state);

Vertica executes this query as follows:

  1. Extracts and evaluates each addresses.state value in the outer subquery records.
  2. Using the EXISTS predicate, checks addresses in the inner (correlated) subquery.
  3. Stops processing when it finds the first match.

When Vertica executes this query, it translates the full query into a JOIN WITH SIPS.

4 - Flattening FROM clause subqueries

FROM clause subqueries are always evaluated before their containing query.

FROM clause subqueries are always evaluated before their containing query. In some cases, the optimizer flattens FROM clause subqueries so the query can execute more efficiently.

For example, in order to create a query plan for the following statement, the Vertica query optimizer evaluates all records in table t1 before it evaluates the records in table t0:

=> SELECT * FROM (SELECT a, MAX(a) AS max FROM (SELECT * FROM t1) AS t0 GROUP BY a);

Given the previous query, the optimizer can internally flatten it as follows:

=> SELECT * FROM (SELECT a, MAX(a) FROM t1 GROUP BY a) AS t0;

Both queries return the same results, but the flattened query runs more quickly.

Flattening views

When a query's FROM clause specifies a view, the optimizer expands the view by replacing it with the query that the view encapsulates. If the view contains subqueries that are eligible for flattening, the optimizer produces a query plan that flattens those subqueries.

Flattening restrictions

The optimizer cannot create a flattened query plan if a subquery or view contains one of the following elements:

  • Aggregate function

  • Analytic function

  • Outer join (left, right or full)

  • GROUP BY, ORDER BY, or HAVING clause

  • DISTINCT keyword

  • LIMIT or OFFSET clause

  • UNION, EXCEPT, or INTERSECT clause

  • EXISTS subquery

Examples

If a predicate applies to a view or subquery, the flattening operation can allow for optimizations by evaluating the predicates before the flattening takes place. Two examples follow.

View flattening

In this example, view v1 is defined as follows:

=> CREATE VIEW v1 AS SELECT * FROM a;

The following query specifies this view:

=> SELECT * FROM v1 JOIN b ON x=y WHERE x > 10;

Without flattening, the optimizer evaluates the query as follows:

  1. Evalutes the subquery.

  2. Applies the predicate WHERE x > 10.

In contrast, the optimizer can create a flattened query plan by applying the predicate before evaluating the subquery. This reduces the optimizer's work because it returns only the records WHERE x > 10 to the containing query.

Vertica internally transforms the previous query as follows:

=> SELECT * FROM (SELECT * FROM a) AS t1 JOIN b ON x=y WHERE x > 10;

The optimizer then flattens the query:

=> SELECT * FROM a JOIN b ON x=y WHERE x > 10;

Subquery flattening

The following example shows how Vertica transforms FROM clause subqueries within a WHERE clause IN subquery. Given the following query:

=> SELECT * FROM a
   WHERE b IN (SELECT b FROM (SELECT * FROM t2)) AS D WHERE x=1;

The optimizer flattens it as follows:

=> SELECT * FROM a
   WHERE b IN (SELECT b FROM t2) AS D WHERE x=1;

See also

Subquery restrictions

5 - Subqueries in UPDATE and DELETE statements

You can nest subqueries within UPDATE and DELETE statements.

You can nest subqueries within UPDATE and DELETE statements.

UPDATE subqueries

You can update records in one table according to values in others, by nesting a subquery within an UPDATE statement. The example below illustrates this through a couple of noncorrelated subqueries. You can reproduce this example with the following tables:

=> CREATE TABLE addresses(cust_id INTEGER, address VARCHAR(2000));
CREATE TABLE
dbadmin=> INSERT INTO addresses VALUES(20,'Lincoln Street'),(30,'Booth Hill Road'),(30,'Beach Avenue'),(40,'Mt. Vernon Street'),(50,'Hillside Avenue');
 OUTPUT
--------
      5
(1 row)

=> CREATE TABLE new_addresses(new_cust_id integer, new_address Boolean DEFAULT 'T');
CREATE TABLE
dbadmin=> INSERT INTO new_addresses VALUES (20),(30),(80);
 OUTPUT
--------
      3
(1 row)

=> INSERT INTO new_addresses VALUES (60,'F');
 OUTPUT
--------
      1

=> COMMIT;
COMMIT

Queries on these tables return the following results:

=> SELECT * FROM addresses;
 cust_id |      address
---------+-------------------
      20 | Lincoln Street
      30 | Beach Avenue
      30 | Booth Hill Road
      40 | Mt. Vernon Street
      50 | Hillside Avenue
(5 rows)

=> SELECT * FROM new_addresses;
 new_cust_id | new_address
-------------+-------------
          20 | t
          30 | t
          80 | t
          60 | f
(4 rows)
  1. The following UPDATE statement uses a noncorrelated subquery to join new_addresses and addresses records on customer IDs. UPDATE sets the value 'New Address' in the joined addresses records. The statement output indicates that three rows were updated:

    => UPDATE addresses SET address='New Address'
       WHERE cust_id IN (SELECT new_cust_id FROM new_addresses WHERE new_address='T');
     OUTPUT
    --------
    3
    (1 row)
    
  2. Query the addresses table to see the changes for matching customer ID 20 and 30. Addresses for customer ID 40 and 50 are not updated:

    => SELECT * FROM addresses;
     cust_id |      address
    ---------+-------------------
          40 | Mt. Vernon Street
          50 | Hillside Avenue
          20 | New Address
          30 | New Address
          30 | New Address
    (5 rows)
    
    =>COMMIT;
    COMMIT
    

DELETE subqueries

You can delete records in one table based according to values in others by nesting a subquery within a DELETE statement.

For example, you want to remove records from new_addresses that were used earlier to update records in addresses. The following DELETE statement uses a noncorrelated subquery to join new_addresses and addresses records on customer IDs. It then deletes the joined records from table new_addresses:

=> DELETE FROM new_addresses
    WHERE new_cust_id IN (SELECT cust_id FROM addresses WHERE address='New Address');
 OUTPUT
--------
      2
(1 row)
=> COMMIT;
COMMIT

Querying new_addresses confirms that the records were deleted:

=> SELECT * FROM new_addresses;
 new_cust_id | new_address
-------------+-------------
          60 | f
          80 | t
(2 rows)

6 - Subquery examples

This topic illustrates some of the subqueries you can write.

This topic illustrates some of the subqueries you can write. The examples use the VMart example database.

Single-row subqueries

Single-row subqueries are used with single-row comparison operators (=, >=, <=, <>, and <=>) and return exactly one row.

For example, the following query retrieves the name and hire date of the oldest employee in the Vmart database:

=> SELECT employee_key, employee_first_name, employee_last_name, hire_date
   FROM employee_dimension
   WHERE hire_date = (SELECT MIN(hire_date) FROM employee_dimension);
 employee_key | employee_first_name | employee_last_name | hire_date
--------------+---------------------+--------------------+------------
         2292 | Mary                | Bauer              | 1956-01-11
(1 row)

Multiple-row subqueries

Multiple-row subqueries return multiple records.

For example, the following IN clause subquery returns the names of the employees making the highest salary in each of the six regions:

=> SELECT employee_first_name, employee_last_name, annual_salary, employee_region
    FROM employee_dimension WHERE annual_salary IN
     (SELECT MAX(annual_salary) FROM employee_dimension GROUP BY employee_region)
   ORDER BY annual_salary DESC;
 employee_first_name | employee_last_name | annual_salary |  employee_region
---------------------+--------------------+---------------+-------------------
 Alexandra           | Sanchez            |        992363 | West
 Mark                | Vogel              |        983634 | South
 Tiffany             | Vu                 |        977716 | SouthWest
 Barbara             | Lewis              |        957949 | MidWest
 Sally               | Gauthier           |        927335 | East
 Wendy               | Nielson            |        777037 | NorthWest
(6 rows)

Multicolumn subqueries

Multicolumn subqueries return one or more columns. Sometimes a subquery's result set is evaluated in the containing query in column-to-column and row-to-row comparisons.

You can substitute some multicolumn subqueries with a join, with the reverse being true as well. For example, the following two queries ask for the sales transactions of all products sold online to customers located in Massachusetts and return the same result set. The only difference is the first query is written as a join and the second is written as a subquery.

Join query: Subquery:
=> SELECT *
   FROM online_sales.online_sales_fact
   INNER JOIN public.customer_dimension
   USING (customer_key)
   WHERE customer_state = 'MA';
=> SELECT *
   FROM online_sales.online_sales_fact
   WHERE customer_key IN
     (SELECT customer_key
      FROM public.customer_dimension
      WHERE customer_state = 'MA');

The following query returns all employees in each region whose salary is above the average:

=> SELECT e.employee_first_name, e.employee_last_name, e.annual_salary,
      e.employee_region, s.average
   FROM employee_dimension e,
     (SELECT employee_region, AVG(annual_salary) AS average
      FROM employee_dimension GROUP BY employee_region) AS s
   WHERE  e.employee_region = s.employee_region AND e.annual_salary > s.average
   ORDER BY annual_salary DESC;
 employee_first_name | employee_last_name | annual_salary | employee_region |     average
---------------------+--------------------+---------------+-----------------+------------------
 Doug                | Overstreet         |        995533 | East            |  61192.786013986
 Matt                | Gauthier           |        988807 | South           | 57337.8638902996
 Lauren              | Nguyen             |        968625 | West            | 56848.4274914089
 Jack                | Campbell           |        963914 | West            | 56848.4274914089
 William             | Martin             |        943477 | NorthWest       | 58928.2276119403
 Luigi               | Campbell           |        939255 | MidWest         | 59614.9170454545
 Sarah               | Brown              |        901619 | South           | 57337.8638902996
 Craig               | Goldberg           |        895836 | East            |  61192.786013986
 Sam                 | Vu                 |        889841 | MidWest         | 59614.9170454545
 Luigi               | Sanchez            |        885078 | MidWest         | 59614.9170454545
 Michael             | Weaver             |        882685 | South           | 57337.8638902996
 Doug                | Pavlov             |        881443 | SouthWest       | 57187.2510548523
 Ruth                | McNulty            |        874897 | East            |  61192.786013986
 Luigi               | Dobisz             |        868213 | West            | 56848.4274914089
 Laura               | Lang               |        865829 | East            |  61192.786013986
 ...

You can also use the EXCEPT, INTERSECT, and UNION [ALL] keywords in FROM, WHERE, and HAVING clauses.

The following subquery returns information about all Connecticut-based customers who bought items through either stores or online sales channel and whose purchases amounted to more than 500 dollars:

=> SELECT DISTINCT customer_key, customer_name FROM public.customer_dimension
   WHERE customer_key IN (SELECT customer_key FROM store.store_sales_fact
      WHERE sales_dollar_amount > 500
      UNION ALL
      SELECT customer_key FROM online_sales.online_sales_fact
      WHERE sales_dollar_amount > 500)
   AND customer_state = 'CT';
 customer_key |  customer_name
--------------+------------------
          200 | Carla Y. Kramer
          733 | Mary Z. Vogel
          931 | Lauren X. Roy
         1533 | James C. Vu
         2948 | Infocare
         4909 | Matt Z. Winkler
         5311 | John Z. Goldberg
         5520 | Laura M. Martin
         5623 | Daniel R. Kramer
         6759 | Daniel Q. Nguyen
 ...

HAVING clause subqueries

A HAVING clause is used in conjunction with the GROUP BY clause to filter the select-list records that a GROUP BY returns. HAVING clause subqueries must use Boolean comparison operators: =, >, <, <>, <=, >= and take the following form:

SELECT <column, ...>
FROM <table>
GROUP BY <expression>
HAVING <expression>
  (SELECT <column, ...>
   FROM <table>
   HAVING <expression>);

For example, the following statement uses the VMart database and returns the number of customers who purchased lowfat products. Note that the GROUP BY clause is required because the query uses an aggregate (COUNT).

=> SELECT s.product_key, COUNT(s.customer_key) FROM store.store_sales_fact s
   GROUP BY s.product_key HAVING s.product_key IN
     (SELECT product_key FROM product_dimension WHERE diet_type = 'Low Fat');

The subquery first returns the product keys for all low-fat products, and the outer query then counts the total number of customers who purchased those products.

 product_key | count
 -------------+-------
          15 |     2
          41 |     1
          66 |     1
         106 |     1
         118 |     1
         169 |     1
         181 |     2
         184 |     2
         186 |     2
         211 |     1
         229 |     1
         267 |     1
         289 |     1
         334 |     2
         336 |     1
(15 rows)

7 - Subquery restrictions

The following restrictions apply to Vertica subqueries:.

The following restrictions apply to Vertica subqueries:

  • Subqueries are not allowed in the defining query of a CREATE PROJECTION statement.

  • Subqueries can be used in the SELECT list, but GROUP BY or aggregate functions are not allowed in the query if the subquery is not part of the GROUP BY clause in the containing query. For example, the following two statement returns an error message:

    => SELECT y, (SELECT MAX(a) FROM t1) FROM t2 GROUP BY y;
       ERROR:  subqueries in the SELECT or ORDER BY are not supported if the
       subquery is not part of the GROUP BY
    => SELECT MAX(y), (SELECT MAX(a) FROM t1) FROM t2;
       ERROR:  subqueries in the SELECT or ORDER BY are not supported if the
       query has aggregates and the subquery is not part of the GROUP BY
    
  • Subqueries are supported within UPDATE statements with the following exceptions:

    • You cannot use SET column = {expression} to specify a subquery.

    • The table specified in the UPDATE list cannot also appear in the FROM clause (no self joins).

  • FROM clause subqueries require an alias but tables do not. If the table has no alias, the query must refer its columns as table-name.column-name. However, column names that are unique among all tables in the query do not need to be qualified by their table name.

  • If the ORDER BY clause is inside a FROM clause subquery, rather than in the containing query, the query is liable to return unexpected sort results. This occurs because Vertica data comes from multiple nodes, so sort order cannot be guaranteed unless the outer query block specifies an ORDER BY clause. This behavior complies with the SQL standard, but it might differ from other databases.

  • Multicolumn subqueries cannot use the <, >, <=, >= comparison operators. They can use <>, !=, and = operators.

  • WHERE and HAVING clause subqueries must use Boolean comparison operators: =, >, <, <>, <=, >=. Those subqueries can be noncorrelated and correlated.

  • [NOT] IN and ANY subqueries nested in another expression are not supported if any of the column values are NULL. In the following statement, for example, if column x from either table t1 or t2 contains a NULL value, Vertica returns a run-time error:

    => SELECT * FROM t1 WHERE (x IN (SELECT x FROM t2)) IS FALSE;
    ERROR:  NULL value found in a column used by a subquery
    
  • Vertica returns an error message during subquery run time on scalar subqueries that return more than one row.

  • Aggregates and GROUP BY clauses are allowed in subqueries, as long as those subqueries are not correlated.

  • Correlated expressions under ALL and [NOT] IN are not supported.

  • Correlated expressions under OR are not supported.

  • Multiple correlations are allowed only for subqueries that are joined with an equality (=) predicate. However, IN/NOT IN, EXISTS/NOT EXISTS predicates within correlated subqueries are not allowed:

    => SELECT t2.x, t2.y, t2.z FROM t2 WHERE t2.z NOT IN
           (SELECT t1.z FROM t1 WHERE t1.x = t2.x);
       ERROR: Correlated subquery with NOT IN is not supported
    
  • Up to one level of correlated subqueries is allowed in the WHERE clause if the subquery references columns in the immediate outer query block. For example, the following query is not supported because the t2.x = t3.x subquery can only refer to table t1 in the outer query, making it a correlated expression because t3.x is two levels out:

    => SELECT t3.x, t3.y, t3.z FROM t3 WHERE t3.z IN (
         SELECT t1.z FROM t1 WHERE EXISTS (
            SELECT 'x' FROM t2 WHERE t2.x = t3.x) AND t1.x = t3.x);
    ERROR:  More than one level correlated subqueries are not supported
    

    The query is supported if it is rewritten as follows:

    => SELECT t3.x, t3.y, t3.z FROM t3 WHERE t3.z IN
           (SELECT t1.z FROM t1 WHERE EXISTS
             (SELECT 'x' FROM t2 WHERE t2.x = t1.x)
       AND t1.x = t3.x);