This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Queries

Queries are database operations that retrieve data from one or more tables or views.

Queries are database operations that retrieve data from one or more tables or views. In Vertica, the top-level SELECT statement is the query, and a query nested within another SQL statement is called a subquery.

Vertica is designed to run the same SQL standard queries that run on other databases. However, there are some differences between Vertica queries and queries used in other relational database management systems.

The Vertica transaction model is different from the SQL standard in a way that has a profound effect on query performance. You can:

  • Run a query on a static backup of the database from any specific date and time. Doing so avoids holding locks or blocking other database operations.

  • Use a subset of the standard SQL isolation levels and access modes (read/write or read-only) for a user session.

In Vertica, the primary structure of a SQL query is its statement. Each statement ends with a semicolon, and you can write multiple queries separated by semicolons; for example:

=> CREATE TABLE t1( ..., date_col date NOT NULL, ...);
=> CREATE TABLE t2( ..., state VARCHAR NOT NULL, ...);

1 - Historical queries

Vertica can execute historical queries, which execute on a snapshot of the database taken at a specific timestamp or epoch.

Vertica can execute historical queries, which execute on a snapshot of the database taken at a specific timestamp or epoch. Historical queries can be used to evaluate and possibly recover data that was deleted but has not yet been purged.

You specify a historical query by qualifying the SELECT statement with an AT epoch clause, where epoch is one of the following:

  • EPOCH LATEST: Return data up to but not including the current epoch. The result set includes data from the latest committed DML transaction.

  • EPOCH integer: Return data up to and including the integer-specified epoch.

  • TIME 'timestamp': Return data from the timestamp-specified epoch.

See Epochs for additional information about how Vertica uses epochs.

Historical queries return data only from the specified epoch. Because they do not return the latest data, historical queries hold no locks or blocking write operations.

Query results are private to the transaction and valid only for the length of the transaction. Query execution is the same regardless of the transaction isolation level.

Restrictions

  • The specified epoch, or epoch of the specified timestamp, cannot be less than the Ancient History Mark epoch.

  • Vertica does not support running historical queries on temporary tables.

2 - Temporary tables

You can use the CREATE TEMPORARY TABLE statement to implement certain queries using multiple steps:.

You can use the CREATE TEMPORARY TABLE statement to implement certain queries using multiple steps:

  1. Create one or more temporary tables.

  2. Execute queries and store the result sets in the temporary tables.

  3. Execute the main query using the temporary tables as if they were a normal part of the logical schema.

See CREATE TEMPORARY TABLE in the SQL Reference Manual for details.

3 - SQL queries

All DML (Data Manipulation Language) statements can contain queries.

All DML (Data Manipulation Language) statements can contain queries. This section introduces some of the query types in Vertica, with additional details in later sections.

Simple queries

Simple queries contain a query against one table. Minimal effort is required to process the following query, which looks for product keys and SKU numbers in the product table:

=> SELECT product_key, sku_number FROM public.product_dimension;
product_key  | sku_number
-------------+-----------
43           | SKU-#129
87           | SKU-#250
42           | SKU-#125
49           | SKU-#154
37           | SKU-#107
36           | SKU-#106
86           | SKU-#248
41           | SKU-#121
88           | SKU-#257
40           | SKU-#120
(10 rows)

Tables can contain arrays. You can select the entire array column, an index into it, or the results of a function applied to the array. For more information, see Arrays and sets (collections).

Joins

Joins use a relational operator that combines information from two or more tables. The query's ON clause specifies how tables are combined, such as by matching foreign keys to primary keys. In the following example, the query requests the names of stores with transactions greater than 70 by joining the store key ID from the store schema's sales fact and sales tables:

=> SELECT store_name, COUNT(*) FROM store.store_sales_fact
   JOIN store.store_dimension ON store.store_sales_fact.store_key = store.store_dimension.store_key
   GROUP BY store_name HAVING COUNT(*) > 70 ORDER BY store_name;

 store_name | count
------------+-------
 Store49    |    72
 Store83    |    78
(2 rows)

For more detailed information, see Joins. See also the Multicolumn subqueries section in Subquery examples.

Cross joins

Also known as the Cartesian product, a cross join is the result of joining every record in one table with every record in another table. A cross join occurs when there is no join key between tables to restrict records. The following query, for example, returns all instances of vendor and store names in the vendor and store tables:

=> SELECT vendor_name, store_name FROM public.vendor_dimension
    CROSS JOIN store.store_dimension;
vendor_name         | store_name
--------------------+------------
Deal Warehouse      | Store41
Deal Warehouse      | Store12
Deal Warehouse      | Store46
Deal Warehouse      | Store50
Deal Warehouse      | Store15
Deal Warehouse      | Store48
Deal Warehouse      | Store39
Sundry Wholesale    | Store41
Sundry Wholesale    | Store12
Sundry Wholesale    | Store46
Sundry Wholesale    | Store50
Sundry Wholesale    | Store15
Sundry Wholesale    | Store48
Sundry Wholesale    | Store39
Market Discounters  | Store41
Market Discounters  | Store12
Market Discounters  | Store46
Market Discounters  | Store50
Market Discounters  | Store15
Market Discounters  | Store48
Market Discounters  | Store39
Market Suppliers    | Store41
Market Suppliers    | Store12
Market Suppliers    | Store46
Market Suppliers    | Store50
Market Suppliers    | Store15
Market Suppliers    | Store48
Market Suppliers    | Store39
...                 | ...
(4000 rows)

This example's output is truncated because this particular cross join returned several thousand rows. See also Cross joins.

Subqueries

A subquery is a query nested within another query. In the following example, we want a list of all products containing the highest fat content. The inner query (subquery) returns the product containing the highest fat content among all food products to the outer query block (containing query). The outer query then uses that information to return the names of the products containing the highest fat content.

=> SELECT product_description, fat_content FROM public.product_dimension
   WHERE fat_content IN
     (SELECT MAX(fat_content) FROM public.product_dimension
      WHERE category_description = 'Food' AND department_description = 'Bakery')
   LIMIT 10;
         product_description         | fat_content
-------------------------------------+-------------
 Brand #59110 hotdog buns            |          90
 Brand #58107 english muffins        |          90
 Brand #57135 english muffins        |          90
 Brand #54870 cinnamon buns          |          90
 Brand #53690 english muffins        |          90
 Brand #53096 bagels                 |          90
 Brand #50678 chocolate chip cookies |          90
 Brand #49269 wheat bread            |          90
 Brand #47156 coffee cake            |          90
 Brand #43844 corn muffins           |          90
(10 rows)

For more information, see Subqueries.

Sorting queries

Use the ORDER BY clause to order the rows that a query returns.

Special note about query results

You could get different results running certain queries on one machine or another for the following reasons:

  • Partitioning on a FLOAT type could return nondeterministic results because of the precision, especially when the numbers are close to one another, such as results from the RADIANS() function, which has a very small range of output.

    To get deterministic results, use NUMERIC if you must partition by data that is not an INTEGER type.

  • Most analytics (with analytic aggregations, such as MIN()/MAX()/SUM()/COUNT()/AVG() as exceptions) rely on a unique order of input data to get deterministic result. If the analytic window-order clause cannot resolve ties in the data, results could be different each time you run the query.

    For example, in the following query, the analytic ORDER BY does not include the first column in the query, promotion_key. So for a tie of AVG(RADIANS(cost_dollar_amount)), product_version, the same promotion_key could have different positions within the analytic partition, resulting in a different NTILE() number. Thus, DISTINCT could also have a different result:

    => SELECT COUNT(*) FROM
          (SELECT DISTINCT SIN(FLOOR(MAX(store.store_sales_fact.promotion_key))),
       NTILE(79) OVER(PARTITION BY AVG (RADIANS
          (store.store_sales_fact.cost_dollar_amount ))
       ORDER BY store.store_sales_fact.product_version)
       FROM store.store_sales_fact
       GROUP BY store.store_sales_fact.product_version,
             store.store_sales_fact.sales_dollar_amount ) AS store;
     count
    -------
      1425
    (1 row)
    

    If you add MAX(promotion_key) to analytic ORDER BY, the results are the same on any machine:

    => SELECT COUNT(*) FROM (SELECT DISTINCT MAX(store.store_sales_fact.promotion_key),
        NTILE(79) OVER(PARTITION BY MAX(store.store_sales_fact.cost_dollar_amount)
       ORDER BY store.store_sales_fact.product_version,
       MAX(store.store_sales_fact.promotion_key))
       FROM store.store_sales_fact
       GROUP BY store.store_sales_fact.product_version,
         store.store_sales_fact.sales_dollar_amount) AS store;
    

4 - Arrays and sets (collections)

Tables can include collections (arrays or sets).

Tables can include collections (arrays or sets). An array is an ordered collection of elements that allows duplicate values, and a set is an unordered collection of unique values.

Consider an orders table with columns for product keys, customer keys, order prices, and order date, with some containing arrays. A basic query in Vertica results in the following:

=> SELECT * from orders LIMIT 5;
 orderkey | custkey |        prodkey         |         orderprices         | orderdate
----------+---------+------------------------+-----------------------------+------------
    19626 |      91 | ["P1262","P68","P101"] | ["192.59","49.99","137.49"] | 2021-03-14
    25646 |     716 | ["P997","P31","P101"]  | ["91.39","29.99","147.49"]  | 2021-03-14
    25647 |     716 | ["P12"]                | ["8.99"]                    | 2021-03-14
    19743 |     161 | ["P68","P101"]         | ["49.99","137.49"]          | 2021-03-15
    19888 |     241 | ["P1262","P101"]       | ["197.59","142.49"]         | 2021-03-15
(5 rows)

As shown in this example, array values are returned in JSON format.

Set values are also returned in JSON array format:

=> SELECT custkey,email_addrs FROM customers LIMIT 4;
 custkey |                           email_addrs
---------+------------------------------------------------------------------------
 342176  | ["joe.smith@example.com"]
 342799  | ["bob@example,com","robert.jones@example.com"]
 342845  | ["br92@cs.example.edu"]
 342321  | ["789123@example-isp.com","sjohnson@eng.example.com","sara@johnson.example.name"]

You can access elements of nested arrays (multi-dimensional arrays) with multiple indexes, as shown in the following examples:

=> SELECT host, pingtimes FROM network_tests;
 host |                 pingtimes
------+-------------------------------------------------------
 eng1 | [[24.24,25.27,27.16,24.97], [23.97,25.01,28.12,29.50]]
 eng2 | [[27.12,27.91,28.11,26.95], [29.01,28.99,30.11,31.56]]
 qa1  | [[23.15,25.11,24.63,23.91], [22.85,22.86,23.91,31.52]]
(3 rows)

=> SELECT pingtimes[0] FROM network_tests;
      pingtimes
-------------------------
[24.24,25.27,27.16,24.97]
[27.12,27.91,28.11,26.95]
[23.15,25.11,24.63,23.91]
(3 rows)

=> SELECT pingtimes[0][0] FROM network_tests;
 pingtimes
-----------
24.24
27.12
23.15
(3 rows)

Vertica supports several functions to manipulate arrays and sets.

Consider the same orders table which includes an array of product keys for all items purchased in a single order. You can use the APPLY_COUNT_ELEMENTS function to find out how many items each order contains. The function identifies the number of non-null elements in the prodkey array:

=> SELECT apply_count_elements(prodkey) FROM orders LIMIT 5;
apply_count_elements
--------------------
3
2
2
3
1
(5 rows)

Vertica also supports aggregate functions for collection elements. Now, consider a column in the same table which includes an array of prices for each item purchased in a single order. You can use the APPLY_SUM function to find the total amount spent for each order:

=> SELECT apply_sum(orderprices) from orders LIMIT 5;
apply_sum
-----------
380.07
187.48
340.08
268.87
  8.99
(5 rows)

Most of the array functions operate only on one-dimensional arrays. To use them with multi-dimensional arrays, first dereference one dimension:

=> SELECT apply_max(pingtimes[0]) FROM network_tests;
 apply_max
-----------
 27.16
 28.11
 25.11
(3 rows)

See Collection functions for a comprehensive list of functions.

You can include both column names and literal values in queries. The following example returns the product keys for orders where the number of items in each order is greater than three:

=> SELECT prodkey FROM orders WHERE apply_count_elements(prodkey)>2;
      prodkey
------------------------
 ["P1262","P68","P101"]
 ["P997","P31","P101"]
(2 rows)

Consider a more complex query that returns the customer key, name, email, order key, and product key by joining two tables, cust and orders, for orders that satisfy the condition where the total is greater than 150:

=> SELECT custkey, cust_custname, cust_email, orderkey, prodkey, orderprices from orders
 JOIN cust ON custkey = cust_custkey
 WHERE apply_sum(orderprices)>150 ;
custkey|  cust_custname   |        cust_email         |   orderkey   |                  prodkey                  |        orderprices
-------+------------------+---------------------------+--------------+--------------------------------========---+---------------------------
342799 | "Ananya Patel"   | "ananyapatel98@gmail.com" | "113-341987" | ["MG-7190","VA-4028","EH-1247","MS-7018"] | [60.00,67.00,22.00,14.99]
342845 | "Molly Benton"   | "molly_benton@gmail.com"  | "111-952000" | ["ID-2586","IC-9010","MH-2401","JC-1905"] | [22.00,35.00,90.00,12.00]
342989 | "Natasha Abbasi" | "natsabbasi@live.com"     | "111-685238" | ["HP-4024"]                               | [650.00]
342176 | "Jose Martinez"  | "jmartinez@hotmail.com"   | "113-672238" | ["HP-4768","IC-9010"]                     | [899.00,60.00]
342845 | "Molly Benton"   | "molly_benton@gmail.com"  | "113-864153" | ["AE-7064","VA-4028","GW-1808"]           | [72.00,99.00,185.00]
(5 rows)

Arrays with complex elements

Arrays can contain arrays and structs in any combination, as in the following example:

=> CREATE TABLE orders(
  orderid INT,
  accountid INT,
  shipments ARRAY[
    ROW(
      shipid INT,
      address ROW(
        street VARCHAR,
        city VARCHAR,
        zip INT
        ),
      shipdate DATE
    )
  ]
 );

Some orders consist of more than one shipment. Line breaks have been inserted into the following output for legibility:

=> SELECT * FROM orders;
 orderid | accountid |                          shipments
---------+-----------+---------------------------------------------------------------------------------------------------------------
   99123 |        17 | [{"shipid":1,"address":{"street":"911 San Marcos St","city":"Austin","zip":73344},"shipdate":"2020-11-05"},
            {"shipid":2,"address":{"street":"100 main St Apt 4B","city":"Pasadena","zip":91001},"shipdate":"2020-11-06"}]
   99149 |       139 | [{"shipid":3,"address":{"street":"100 main St Apt 4B","city":"Pasadena","zip":91001},"shipdate":"2020-11-06"}]
   99162 |       139 | [{"shipid":4,"address":{"street":"100 main St Apt 4B","city":"Pasadena","zip":91001},"shipdate":"2020-11-04"},
            {"shipid":5,"address":{"street":"100 Main St Apt 4A","city":"Pasadena","zip":91001},"shipdate":"2020-11-11"}]
(3 rows)

You can use array indexing and struct field selection together in queries:

=> SELECT orderid, shipments[0].shipdate AS ship1, shipments[1].shipdate AS ship2 FROM orders;
 orderid |   ship1    |   ship2
---------+------------+------------
   99123 | 2020-11-05 | 2020-11-06
   99149 | 2020-11-06 |
   99162 | 2020-11-04 | 2020-11-11
(3 rows)

This example selects specific array indices. To access all entries, use EXPLODE.

Some data formats have a map type, which is a set of key/value pairs. Vertica does not directly support querying maps, but you can define a map column as an array of structs and query that. In the following example, the "prods" column in the data is a map:

=> CREATE EXTERNAL TABLE orders
 (orderkey INT,
  custkey INT,
  prods ARRAY[ROW(key VARCHAR(10), value DECIMAL(12,2))],
  orderdate DATE
 ) AS COPY FROM '...' PARQUET;

=> SELECT orderkey, prods FROM orders;
 orderkey |                                              prods
----------+--------------------------------------------------------------------------------------------------
    19626 | [{"key":"P68","value":"49.99"},{"key":"P1262","value":"192.59"},{"key":"P101","value":"137.49"}]
    25646 | [{"key":"P997","value":"91.39"},{"key":"P101","value":"147.49"},{"key":"P31","value":"29.99"}]
    25647 | [{"key":"P12","value":"8.99"}]
    19743 | [{"key":"P68","value":"49.99"},{"key":"P101","value":"137.49"}]
    19888 | [{"key":"P1262","value":"197.59"},{"key":"P101","value":"142.49"}]
(5 rows)

You cannot use mixed columns in CREATE TABLE AS SELECT (CTAS) or in views. This restriction applies for the entire column or for field selection within it.

Ordering and grouping

You can use Comparison operators with collections. Null collections are ordered last. Otherwise, collections are compared element by element until there is a mismatch, and then they are ordered based on the non-matching elements. If all elements are equal up to the length of the shorter one, then the shorter one is ordered first.

You can use collections in the ORDER BY and GROUP BY clauses of queries. The following example shows ordering query results by an array column:

=> CREATE TABLE employees (id INT, department VARCHAR(50), grants ARRAY[VARCHAR], grant_values ARRAY[INT]);

=> COPY employees FROM STDIN;
42|Physics|[US-7376,DARPA-1567]|[65000,135000]
36|Physics|[US-7376,DARPA-1567]|[10000,25000]
33|Physics|[US-7376]|[30000]
36|Astronomy|[US-7376,DARPA-1567]|[5000,4000]
\.

=> SELECT * FROM employees ORDER BY grant_values;
 id | department |          grants          |  grant_values
----+------------+--------------------------+----------------
 36 | Astronomy  | ["US-7376","DARPA-1567"] | [5000,4000]
 36 | Physics    | ["US-7376","DARPA-1567"] | [10000,25000]
 33 | Physics    | ["US-7376"]              | [30000]
 42 | Physics    | ["US-7376","DARPA-1567"] | [65000,135000]
(4 rows)

The following example queries the same table using GROUP BY:

=> CREATE TABLE employees (id INT, department VARCHAR(50), grants ARRAY[VARCHAR], grant_values ARRAY[INT]);

=> COPY employees FROM STDIN;
42|Physics|[US-7376,DARPA-1567]|[65000,135000]
36|Physics|[US-7376,DARPA-1567]|[10000,25000]
33|Physics|[US-7376]|[30000]
36|Astronomy|[US-7376,DARPA-1567]|[5000,4000]
\.

=> SELECT department, grants, SUM(apply_sum(grant_values)) FROM employees GROUP BY grants, department;
 department |          grants          |  SUM
------------+--------------------------+--------
 Physics    | ["US-7376","DARPA-1567"] | 235000
 Astronomy  | ["US-7376","DARPA-1567"] |   9000
 Physics    | ["US-7376"]              |  30000
(3 rows)

See the "Functions and Operators" section on the ARRAY reference page for information on how Vertica orders collections. (The same information is also on the SET reference page.)

Null-handling

Null semantics for collections are consistent with normal columns in most regards. See NULL sort order for more information on null-handling.

The null-safe equality operator (<=>) behaves differently from equality (=) when the collection is null rather than empty. Comparing a collection to NULL strictly is undefined.

=> SELECT ARRAY[1,3] = NULL;
?column?
----------

(1 row)

=> SELECT ARRAY[1,3] <=> NULL;
 ?column?
----------
 f
(1 row)

In the following example, the grants column in the table is null for employee 99.

=> SELECT grants = NULL FROM employees WHERE id=99;
 ?column?
----------

(1 row)

=> SELECT grants <=> NULL FROM employees WHERE id=99;
 ?column?
----------
 t
(1 row)

Empty collections are not null and behave as expected.

=> SELECT ARRAY[]::ARRAY[INT] = ARRAY[]::ARRAY[INT];
 ?column?
----------
 t
(1 row)

Collections are compared element by element. If a comparison depends on a null element, the result is unknown (null), not false. For example, ARRAY[1,2,null]=ARRAY[1,2,null] and ARRAY[1,2,null]=ARRAY[1,2,3] both return null, but ARRAY[1,2,null]=ARRAY[1,4,null] returns false because the second elements do not match.

Out-of-bound indexes into collections return NULL.

=> SELECT prodkey[2] from orders LIMIT 4;
prodkey
---------

"EH-1247"
"MH-2401"

(4 rows)

The results of the query return NULL for two out of four rows, the first and the fourth, since the specified index is greater than the size of those arrays.

Casting

When the data type of an expression value is unambiguous, it is implicitly coerced to match the expected data type. However, there can be ambiguity about the data type of an expression. For example, a date can be interpreted as either a string or a timestamp. Write an explicit cast to avoid the default:

=> SELECT apply_count_elements(ARRAY['2019-01-20','2019-02-12','2019-03-23']::ARRAY[TIMESTAMP]);
apply_count_elements
--------------------
 3
(1 row)

You can cast arrays or sets of one scalar type to arrays or sets of other (compatible) types, following the same rules as for casting scalar values. Casting a collection casts each element of that collection. Casting an array to a set also removes any duplicates.

You can cast arrays (but not sets) with elements that are arrays or structs (or combinations):

=> SELECT shipments::ARRAY[ROW(id INT,addr ROW(VARCHAR,VARCHAR,INT),shipped DATE)]
FROM orders;
                  shipments
---------------------------------------------------------------------------
 [{"id":1,"addr":{"street":"911 San Marcos St","city":"Austin","zip":73344},"shipped":"2020-11-05"},
  {"id":2,"addr":{"street":"100 main St Apt 4B","city":"Pasadena","zip":91001},"shipped":"2020-11-06"}]
 [{"id":3,"addr":{"street":"100 main St Apt 4B","city":"Pasadena","zip":91001},"shipped":"2020-11-06"}]
 [{"id":4,"addr":{"street":"100 main St Apt 4B","city":"Pasadena","zip":91001},"shipped":"2020-11-04"},
  {"id":5,"addr":{"street":"100 Main St Apt 4A","city":"Pasadena","zip":91001},"shipped":"2020-11-11"}]
(3 rows)

You can change the bound of an array or set by casting. When casting to a bounded native array, inputs that are too long are truncated. When casting to a multi-dimensional array, if the new bounds are too small for the data the cast fails:

=> SELECT ARRAY[1,2,3]::ARRAY[VARCHAR,2];
   array
-----------
 ["1","2"]
(1 row)

=> SELECT ARRAY[ARRAY[1,2,3],ARRAY[4,5,6]]::ARRAY[ARRAY[VARCHAR,2],2];
ERROR 9227:  Output array isn't big enough
DETAIL:  Type limit is 4 elements, but value has 6 elements

If you cast to a bounded multi-dimensional array, you must specify the bounds at all levels.

An array or set with a single null element must be explicitly cast because no type can be inferred.

See Data type coercion for more information on casting for data types.

Exploding array columns

You can simplify queries on elements stored in arrays with EXPLODE, a function that takes array columns from a table and expands them. For each exploded array, the results include two columns, one for the array element index, and one for the value at that position. If the function explodes a single array, these columns are named position and value by default. If the function explodes two or more arrays, the columns for each array are named pos_column-name and val_column-name.

The function explodes the first N array columns in the parameter list, defaulting to one, and passes all other columns through.

The following examples illustrate using EXPLODE() with the OVER(PARTITION BEST) clause.

Consider an orders table with columns for order keys, customer keys, product keys, order prices, and email addresses, with some containing arrays. A basic query in Vertica results in the following:

=> SELECT orderkey, custkey, prodkey, orderprices, email_addrs FROM orders LIMIT 5;
  orderkey  | custkey |                    prodkey                    |            orderprices            |                                                  email_addrs
------------+---------+-----------------------------------------------+-----------------------------------+----------------------------------------------------------------------------------------------------------------
 113-341987 |  342799 | ["MG-7190 ","VA-4028 ","EH-1247 ","MS-7018 "] | ["60.00","67.00","22.00","14.99"] | ["bob@example,com","robert.jones@example.com"]
 111-952000 |  342845 | ["ID-2586 ","IC-9010 ","MH-2401 ","JC-1905 "] | ["22.00","35.00",null,"12.00"]    | ["br92@cs.example.edu"]
 111-345634 |  342536 | ["RS-0731 ","SJ-2021 "]                       | ["50.00",null]                    | [null]
 113-965086 |  342176 | ["GW-1808 "]                                  | ["108.00"]                        | ["joe.smith@example.com"]
 111-335121 |  342321 | ["TF-3556 "]                                  | ["50.00"]                         | ["789123@example-isp.com","alexjohnson@example.com","monica@eng.example.com","sara@johnson.example.name",null]
(5 rows)

This example expands the orderprices column for a specified customer, in ascending order. The custkey and email_addrs columns are repeated for each array element.

=> SELECT EXPLODE(orderprices, custkey, email_addrs) OVER(PARTITION BEST) AS (position, orderprices, custkey, email_addrs)
   FROM orders WHERE custkey='342845' ORDER BY orderprices;
 position | orderprices | custkey |         email_addrs
----------+-------------+---------+------------------------------
        2 |             |  342845 | ["br92@cs.example.edu",null]
        3 |       12.00 |  342845 | ["br92@cs.example.edu",null]
        0 |       22.00 |  342845 | ["br92@cs.example.edu",null]
        1 |       35.00 |  342845 | ["br92@cs.example.edu",null]
(4 rows)

When you explode a column that contains null values, the null values are displayed as empty.

You can explode more than one column by specifying the explode_count parameter.

=> SELECT EXPLODE(orderkey, prodkey, orderprices USING PARAMETERS explode_count=2)
OVER(PARTITION BEST)
AS (orderkey,pk_idx,pk_val,ord_idx,ord_val)
FROM orders
WHERE orderkey='113-341987';
  orderkey  | pk_idx |  pk_val  | ord_idx | ord_val
------------+--------+----------+---------+---------
 113-341987 |      0 | MG-7190  |       0 |   60.00
 113-341987 |      0 | MG-7190  |       1 |   67.00
 113-341987 |      0 | MG-7190  |       2 |   22.00
 113-341987 |      0 | MG-7190  |       3 |   14.99
 113-341987 |      1 | VA-4028  |       0 |   60.00
 113-341987 |      1 | VA-4028  |       1 |   67.00
 113-341987 |      1 | VA-4028  |       2 |   22.00
 113-341987 |      1 | VA-4028  |       3 |   14.99
 113-341987 |      2 | EH-1247  |       0 |   60.00
 113-341987 |      2 | EH-1247  |       1 |   67.00
 113-341987 |      2 | EH-1247  |       2 |   22.00
 113-341987 |      2 | EH-1247  |       3 |   14.99
 113-341987 |      3 | MS-7018  |       0 |   60.00
 113-341987 |      3 | MS-7018  |       1 |   67.00
 113-341987 |      3 | MS-7018  |       2 |   22.00
 113-341987 |      3 | MS-7018  |       3 |   14.99
(16 rows)

The following example uses a multi-dimensional array:

=> SELECT name, pingtimes FROM network_tests;
 name |                       pingtimes
------+-------------------------------------------------------
 eng1 | [[24.24,25.27,27.16,24.97],[23.97,25.01,28.12,29.5]]
 eng2 | [[27.12,27.91,28.11,26.95],[29.01,28.99,30.11,31.56]]
 qa1  | [[23.15,25.11,24.63,23.91],[22.85,22.86,23.91,31.52]]
(3 rows)

=> SELECT EXPLODE(name, pingtimes USING PARAMETERS explode_count=1) OVER()
FROM network_tests;
 name | position |           value
------+----------+---------------------------
 eng1 |        0 | [24.24,25.27,27.16,24.97]
 eng1 |        1 | [23.97,25.01,28.12,29.5]
 eng2 |        0 | [27.12,27.91,28.11,26.95]
 eng2 |        1 | [29.01,28.99,30.11,31.56]
 qa1  |        0 | [23.15,25.11,24.63,23.91]
 qa1  |        1 | [22.85,22.86,23.91,31.52]
(6 rows)

See ARRAY and SET for more information on the implementation of these data types in Vertica.

Imploding and filtering arrays

The IMPLODE function is the inverse of EXPLODE: it takes a column and produces an array containing the column's values. Combined with GROUP BY, it can be used to reverse an explode operation.

You can use EXPLODE and IMPLODE together to filter array values. For example, in a set of orders where prices are an array value, you might want to query only the orders with prices below a certain threshold. Consider the following table:

=> SELECT * FROM orders;

 key |      prices
-----+-------------------
 567 | [27.99,18.99]
 789 | [108.0]
 345 | [14.99,35.99]
 123 | [60.0,67.0,14.99]
(4 rows)

You can use EXPLODE to expand the arrays. For clarity, this example creates a new table to hold the results. More typically, you would use EXPLODE and IMPLODE in subqueries instead of creating intermediate tables.

=> CREATE TABLE exploded AS
SELECT EXPLODE(prices,key) OVER (PARTITION BEST)
AS (position, itemprice, itemkey) FROM orders;

=> SELECT * FROM exploded;

 position | itemprice | itemkey
----------+-----------+---------
        0 |       108 |     789
        1 |     35.99 |     345
        0 |     14.99 |     345
        0 |     27.99 |     567
        0 |        60 |     123
        1 |     18.99 |     567
        1 |        67 |     123
        2 |     14.99 |     123
(8 rows)

You can now filter the exploded prices:

=> CREATE TABLE filtered AS
    SELECT position, itemprice, itemkey FROM orders WHERE itemprice < 50.00;

=> SELECT * FROM filtered;

 position | itemprice | itemkey
----------+-----------+---------
        0 |     14.99 |     345
        0 |     27.99 |     567
        1 |     18.99 |     567
        1 |     35.99 |     345
        2 |     14.99 |     123
(5 rows)

Finally, you can use IMPLODE to reconstruct the arrays:

=> SELECT itemkey AS key, IMPLODE(itemprice) AS prices
    FROM filtered GROUP BY itemkey ORDER BY itemkey;
 key |      prices
-----+-------------------
 123 | ["14.99"]
 345 | ["35.99","14.99"]
 567 | ["27.99","18.99"]
(3 rows)

You can modify this query by including a WITHIN GROUP ORDER BY clause, which specifies how to sort array elements within each group:

=> SELECT itemkey AS key, IMPLODE(itemprice) WITHIN GROUP (ORDER BY itemprice) AS prices
    FROM filtered GROUP BY itemkey ORDER BY itemkey;
 key |      prices
-----+-------------------
 123 | ["14.99"]
 345 | ["14.99","35.99"]
 567 | ["18.99","27.99"]
(3 rows)

If IMPLODE were to return an array too large for the column, the function returns an error. To avoid this, you can set the allow_truncate parameter to omit some elements from the results. Truncation never applies to individual elements; for example, the function does not shorten strings.

5 - Rows (structs)

Tables can include columns of the ROW data type.

Tables can include columns of the ROW data type. A ROW, sometimes called a struct, is a set of typed property-value pairs.

Consider a table of customers with columns for name, address, and an ID. The address is a ROW with fields for the elements of an address (street, city, and postal code). As shown in this example, ROW values are returned in JSON format:

=> SELECT * FROM customers ORDER BY accountID;
        name        |                              address                               | accountID
--------------------+--------------------------------------------------------------------+-----------
 Missy Cooper       | {"street":"911 San Marcos St","city":"Austin","zipcode":73344}     |        17
 Sheldon Cooper     | {"street":"100 Main St Apt 4B","city":"Pasadena","zipcode":91001}  |       139
 Leonard Hofstadter | {"street":"100 Main St Apt 4A","city":"Pasadena","zipcode":91001}  |       142
 Leslie Winkle      | {"street":"23 Fifth Ave Apt 8C","city":"Pasadena","zipcode":91001} |       198
 Raj Koothrappali   | {"street":null,"city":"Pasadena","zipcode":91001}                  |       294
 Stuart Bloom       |                                                                    |       482
(6 rows)

Most values are cast to UTF-8 strings, as shown for street and city here. Integers and booleans are cast to JSON Numerics and thus not quoted.

Use dot notation (column.field) to access individual fields:

=> SELECT address.city FROM customers;
   city
----------
 Pasadena
 Pasadena
 Pasadena
 Pasadena
 Austin

(6 rows)

In the following example, the contact information in the customers table has an email field, which is an array of addresses:

=> SELECT name, contact.email FROM customers;
        name        |                    email
--------------------+---------------------------------------------
 Missy Cooper       | ["missy@mit.edu","mcooper@cern.gov"]
 Sheldon Cooper     | ["shelly@meemaw.name","cooper@caltech.edu"]
 Leonard Hofstadter | ["hofstadter@caltech.edu"]
 Leslie Winkle      | []
 Raj Koothrappali   | ["raj@available.com"]
 Stuart Bloom       |
(6 rows)

You can use ROW columns or specific fields to restrict queries, as in the following example:


=> SELECT address FROM customers WHERE address.city ='Pasadena';
                              address
--------------------------------------------------------------------
 {"street":"100 Main St Apt 4B","city":"Pasadena","zipcode":91001}
 {"street":"100 Main St Apt 4A","city":"Pasadena","zipcode":91001}
 {"street":"23 Fifth Ave Apt 8C","city":"Pasadena","zipcode":91001}
 {"street":null,"city":"Pasadena","zipcode":91001}
(4 rows)

You can use the ROW syntax to specify literal values, such as the address in the WHERE clause in the following example:


=> SELECT name,address FROM customers
   WHERE address = ROW('100 Main St Apt 4A','Pasadena',91001);
        name        |                              address
--------------------+-------------------------------------------------------------------
 Leonard Hofstadter | {"street":"100 Main St Apt 4A","city":"Pasadena","zipcode":91001}
(1 row)

You can join on field values as you would from any other column:

=> SELECT accountID,department from customers JOIN employees
   ON customers.name=employees.personal.name;
 accountID | department
-----------+------------
       139 | Physics
       142 | Physics
       294 | Astronomy

You can join on full structs. The following example joins the addresses in the employees and customers tables:

=> SELECT employees.personal.name,customers.accountID FROM employees
JOIN customers ON employees.personal.address=customers.address;
        name        | accountID
--------------------+-----------
 Sheldon Cooper     |       139
 Leonard Hofstadter |       142
(2 rows)

You can cast structs, optionally specifying new field names:

=> SELECT contact::ROW(str VARCHAR, city VARCHAR, zip VARCHAR, email ARRAY[VARCHAR,
20]) FROM customers;
                                                     contact

--------------------------------------------------------------------------------
----------------------------------
 {"str":"911 San Marcos St","city":"Austin","zip":"73344","email":["missy@mit.ed
u","mcooper@cern.gov"]}
 {"str":"100 Main St Apt 4B","city":"Pasadena","zip":"91001","email":["shelly@me
emaw.name","cooper@caltech.edu"]}
 {"str":"100 Main St Apt 4A","city":"Pasadena","zip":"91001","email":["hofstadte
r@caltech.edu"]}
 {"str":"23 Fifth Ave Apt 8C","city":"Pasadena","zip":"91001","email":[]}
 {"str":null,"city":"Pasadena","zip":"91001","email":["raj@available.com"]}

(6 rows)

You can use structs in views and in subqueries, as in the following example:

=> CREATE VIEW neighbors (num_neighbors, area(city, zipcode))
AS SELECT count(*), ROW(address.city, address.zipcode)
FROM customers GROUP BY address.city, address.zipcode;
CREATE VIEW

=> SELECT employees.personal.name, neighbors.area FROM neighbors, employees
WHERE employees.personal.address.zipcode=neighbors.area.zipcode AND neighbors.nu
m_neighbors > 1;
        name        |                area
--------------------+-------------------------------------
 Sheldon Cooper     | {"city":"Pasadena","zipcode":91001}
 Leonard Hofstadter | {"city":"Pasadena","zipcode":91001}
(2 rows)

If a reference is ambiguous, Vertica prefers column names over field names.

You can use many operators and predicates with ROW columns, including JOIN, GROUP BY, ORDER BY, IS [NOT] NULL, and comparison operations in nullable filters. Some operators do not logically apply to structured data and are not supported. See the ROW reference page for a complete list.

6 - Subqueries

A subquery is a SELECT statement embedded within another SELECT statement.

A subquery is a SELECT statement embedded within another SELECT statement. The embedded subquery is often referenced as the query's inner statement, while the containing query is typically referenced as the query's statement, or outer query block. A subquery returns data that the outer query uses as a condition to determine what data to retrieve. There is no limit to the number of nested subqueries you can create.

Like any query, a subquery returns records from a table that might consist of a single column and record, a single column with multiple records, or multiple columns and records. Queries can be noncorrelated or correlated. You can even use them to update or delete records in a table based on values that are stored in other database tables.

6.1 - Subqueries used in search conditions

Subqueries are used as search conditions in order to filter results.

Subqueries are used as search conditions in order to filter results. They specify the conditions for the rows returned from the containing query's select-list, a query expression, or the subquery itself. The operation evaluates to TRUE, FALSE, or UNKNOWN (NULL).

Syntax

search-condition {
    [ { AND | OR | NOT } {  predicate | ( search-condition ) } ]
   }[,... ]
 predicate
     { expression comparison-operator expression
         | string-expression [ NOT ] { LIKE | ILIKE | LIKEB | ILIKEB } string-expression
         | expression IS [ NOT ] NULL
         | expression [ NOT ] IN ( subquery | expression[,... ] )
         | expression comparison-operator [ ANY | SOME ] ( subquery )
         | expression comparison-operator ALL ( subquery )
         | expression OR ( subquery )
         | [ NOT ] EXISTS ( subquery )
         | [ NOT ] IN ( subquery )
     }

Parameters

search-condition

Specifies the search conditions for the rows returned from one of the following:

  • Containing query's select-list

  • Query expression

  • Subquery

If the subquery is used with an UPDATE or DELETE statement, UPDATE specifies the rows to update and DELETE specifies the rows to delete.

{ AND | OR | NOT }

Keywords that specify the logical operators that combine conditions, or in the case of NOT, negate conditions.

  • AND: Combines two conditions and evaluates to TRUE when both of the conditions are TRUE.

  • OR: Combines two conditions and evaluates to TRUE when either condition is TRUE.

  • NOT: Negates the Boolean expression specified by the predicate.

predicate An expression that returns TRUE, FALSE, or UNKNOWN (NULL).
expression A column name, constant, function, or scalar subquery, or combination of column names, constants, and functions connected by operators or subqueries.
comparison-operator

An operator that tests conditions between two expressions, one of the following:

This operator... Tests whether...
< The first expression is less than the second.
> The first expression is greater than the second.
<= The first expression is less than or equal to the second expression.
>= The first expression is greater than or equal to the second expression.
= Two expressions are equal.
<=> Like the = operator, but returns TRUE (instead of UNKNOWN) if both expressions evaluate to UNKNOWN, and FALSE (instead of UNKNOWN) if one expression evaluates to UNKNOWN.

<>

Two expressions are not equal.

!=

string-expression A character string with optional wildcard (*) characters.
[ NOT ] { LIKE | ILIKE | LIKEB | ILIKEB } Indicates that the character string following the predicate is to be used (or not used) for pattern matching.
IS [ NOT ] NULL Searches for values that are null or are not null.
ALL Used with a comparison operator and a subquery. Returns TRUE for the left-hand predicate if all values returned by the subquery satisfy the comparison operation, or FALSE if not all values satisfy the comparison or if the subquery returns no rows to the outer query block.
ANY | SOME ANY and SOME are synonyms and are used with a comparison operator and a subquery. Either returns TRUE for the left-hand predicate if any value returned by the subquery satisfies the comparison operation, or FALSE if no values in the subquery satisfy the comparison or if the subquery returns no rows to the outer query block. Otherwise, the expression is UNKNOWN.
[ NOT ] EXISTS Used with a subquery to test for the existence of records that the subquery returns.
[ NOT ] IN Searches for an expression on the basis of an expression's exclusion or inclusion from a list. The list of values is enclosed in parentheses and can be a subquery or a set of constants.

6.1.1 - Logical operators AND and OR

The AND and OR logical operators combine two conditions.

The AND and OR logical operators combine two conditions. AND evaluates to TRUE when both of the conditions joined by the AND keyword are matched, and OR evaluates to TRUE when either condition joined by OR is matched.

OR subqueries (complex expressions)

Vertica supports subqueries in more complex expressions using OR; for example:

  • More than one subquery in the conjunct expression:

    (SELECT MAX(b) FROM t1) + SELECT (MAX FROM t2) a IN (SELECT a FROM t1) OR b IN (SELECT x FROM t2)
    
  • An OR clause in the conjunct expression involves at least one subquery:

    a IN (SELECT a FROM t1) OR b IN (SELECT x FROM t2) a IN (SELECT a from t1) OR b = 5
    a = (SELECT MAX FROM t2) OR b = 5
    
  • One subquery is present but it is part of a another expression:

    x IN (SELECT a FROM t1) = (x = (SELECT MAX FROM t2) (x IN (SELECT a FROM t1) IS NULL
    

How AND queries are evaluated

Vertica treats expressions separated by AND (conjunctive) operators individually. For example if the WHERE clause were:

  WHERE (a IN (SELECT a FROM t1) OR b IN (SELECT x FROM t2)) AND (c IN (SELECT a FROM t1))

the query would be interpreted as two conjunct expressions:

  1. (a IN (SELECT a FROM t1) OR b IN (SELECT x FROM t2))

  2. (c IN (SELECT a FROM t1))

The first expression is considered a complex subquery, whereas the second expression is not.

Examples

The following list shows some of the ways you can filter complex conditions in the WHERE clause:

  • OR expression between a subquery and a non-subquery condition:

    => SELECT x FROM t WHERE x > (SELECT SUM(DISTINCT x) FROM t GROUP BY y) OR x < 9;
    
  • OR expression between two subqueries:

    => SELECT * FROM t WHERE x=(SELECT x FROM t) OR EXISTS(SELECT x FROM tt);
    
  • Subquery expression:

    => SELECT * FROM t WHERE x=(SELECT x FROM t)+1 OR x<>(SELECT x FROM t)+1;
    
  • OR expression with [NOT] IN subqueries:

    => SELECT * FROM t WHERE NOT (EXISTS (SELECT x FROM t)) OR x >9;
    
  • OR expression with IS [NOT] NULL subqueries:

    => SELECT * FROM t WHERE (SELECT * FROM t)IS NULL OR (SELECT * FROM tt)IS NULL;
    
  • OR expression with boolean column and subquery that returns Boolean data type:

    => SELECT * FROM t2 WHERE x = (SELECT x FROM t2) OR x;
    
  • OR expression in the CASE statement:

    => SELECT * FROM t WHERE CASE WHEN x=1 THEN x > (SELECT * FROM t)
           OR x < (SELECT * FROM t2) END ;
    
  • Analytic function, NULL-handling function, string function, math function, and so on:

    => SELECT x FROM t WHERE x > (SELECT COALESCE (x,y) FROM t GROUP BY x,y) OR
           x < 9;
    
  • In user-defined functions (assuming f() is one):

    => SELECT * FROM t WHERE x > 5 OR x = (SELECT f(x) FROM t);
    
  • Use of parentheses at different places to restructure the queries:

    => SELECT x FROM t WHERE (x = (SELECT x FROM t) AND y = (SELECT y FROM t))
           OR (SELECT x FROM t) =1;
    
  • Multicolumn subqueries:

    => SELECT * FROM t WHERE (x,y) = (SELECT x,y FROM t) OR x > 5;
    
  • Constant/NULL on lefthand side of subquery:

    => SELECT * FROM t WHERE x > 5 OR 5 = (SELECT x FROM t);
    

See also

6.1.2 - In place of an expression

Subqueries that return a single value (unlike a list of values returned by IN subqueries) can be used just about anywhere an expression is allowed in SQL.

Subqueries that return a single value (unlike a list of values returned by IN subqueries) can be used just about anywhere an expression is allowed in SQL. It can be a column name, a constant, a function, a scalar subquery, or a combination of column names, constants, and functions connected by operators or subqueries.

For example:

=> SELECT c1 FROM t1 WHERE c1 = ANY (SELECT c1 FROM t2) ORDER BY c1;
=> SELECT c1 FROM t1 WHERE COALESCE((t1.c1 > ANY (SELECT c1 FROM t2)), TRUE);
=> SELECT c1 FROM t1 GROUP BY c1 HAVING
     COALESCE((t1.c1 <> ALL (SELECT c1 FROM t2)), TRUE);

Multi-column expressions are also supported:

=> SELECT c1 FROM t1 WHERE (t1.c1, t1.c2) = ALL (SELECT c1, c2 FROM t2);
=> SELECT c1 FROM t1 WHERE (t1.c1, t1.c2) <> ANY (SELECT c1, c2 FROM t2);

Vertica returns an error on queries where more than one row would be returned by any subquery used as an expression:

=> SELECT c1 FROM t1 WHERE c1 = (SELECT c1 FROM t2) ORDER BY c1;
   ERROR:  more than one row returned by a subquery used as an expression

See also

6.1.3 - Comparison operators

Vertica supports Boolean subquery expressions in the WHERE clause with any of the following operators:.

Vertica supports Boolean subquery expressions in the WHERE clause with any of the following operators:

> < >= <= = <> <=>

WHERE clause subqueries filter results and take the following form:

SELECT <column, ...> FROM <table>
WHERE <condition> (SELECT <column, ...> FROM <table> WHERE <condition>);

These conditions are available for all data types where comparison makes sense. All Comparison operators are binary operators that return values of TRUE, FALSE, or UNKNOWN (NULL).

Expressions that correlate to just one outer table in the outer query block are supported, and these correlated expressions can be comparison operators.

The following subquery scenarios are supported:

SELECT * FROM T1 WHERE T1.x =  (SELECT MAX(c1) FROM T2);
SELECT * FROM T1 WHERE T1.x >= (SELECT MAX(c1) FROM T2 WHERE T1.y = T2.c2);
SELECT * FROM T1 WHERE T1.x <= (SELECT MAX(c1) FROM T2 WHERE T1.y = T2.c2);

See also

Subquery restrictions

6.1.4 - LIKE pattern matching

Vertica supports LIKE pattern-matching conditions in subqueries and take the following form:.

Vertica supports LIKE pattern-matching conditions in subqueries and take the following form:

string-expression [ NOT ] { LIKE | ILIKE | LIKEB | ILIKEB } string-expression

The following command searches for customers whose company name starts with "Ev" and returns the total count:

=> SELECT COUNT(*) FROM customer_dimension WHERE customer_name LIKE
      (SELECT 'Ev%' FROM customer_dimension LIMIT 1);
 count
-------
   153
(1 row)

Vertica also supports single-row subqueries as the pattern argument for LIKEB and ILIKEB predicates; for example:

=> SELECT * FROM t1 WHERE t1.x LIKEB (SELECT t2.x FROM t2);

The following symbols are substitutes for the LIKE keywords:

~~    LIKE
~#    LIKEB
~~*   ILIKE
~#*   ILIKEB
!~~   NOT LIKE
!~#   NOT LIKEB
!~~*  NOT ILIKE
!~#*  NOT IILIKEB

See LIKE predicate for additional examples.

6.1.5 - ANY and ALL

You typically use comparison operators (=, >, < , etc.) only on subqueries that return one row.

You typically use comparison operators (=, >, < , etc.) only on subqueries that return one row. With ANY and ALL operators, you can make comparisons on subqueries that return multiple rows.

These subqueries take the following form:

expression comparison-operator { ANY | ALL } (subquery)

ANY and ALL evaluate whether any or all of the values returned by a subquery match the left-hand expression.

Equivalent operators

You can use following operators instead of ANY or ALL:

This operator... Is equivalent to:
SOME ANY
IN = ANY
NOT IN <> ALL

Example data

Examples below use the following tables and data:

CREATE TABLE t1 (c1 int, c2 VARCHAR(8)); CREATE TABLE t2 (c1 int, c2 VARCHAR(8));
=> SELECT * FROM t1 ORDER BY c1;
 c1 | c2
----+-----
  1 | cab
  1 | abc
  2 | fed
  2 | def
  3 | ihg
  3 | ghi
  4 | jkl
  5 | mno
(8 rows)
=> SELECT * FROM t2 ORDER BY c1;
 c1 | c2
----+-----
  1 | abc
  2 | fed
  3 | jkl
  3 | stu
  3 | zzz
(5 rows)

ANY subqueries

Subqueries that use the ANY keyword return true when any value retrieved in the subquery matches the value of the left-hand expression.

Examples

An ANY subquery within an expression:

=> SELECT c1, c2 FROM t1 WHERE COALESCE((t1.c1 > ANY (SELECT c1 FROM t2)));
 c1 | c2
----+-----
  2 | fed
  2 | def
  3 | ihg
  3 | ghi
  4 | jkl
  5 | mno
(6 rows)

ANY noncorrelated subqueries without aggregates:

=> SELECT c1 FROM t1 WHERE c1 = ANY (SELECT c1 FROM t2) ORDER BY c1;
 c1
----
  1
  1
  2
  2
  3
  3
(6 rows)

ANY noncorrelated subqueries with aggregates:


=> SELECT c1, c2 FROM t1 WHERE c1 <> ANY (SELECT MAX(c1) FROM t2) ORDER BY c1;
 c1 | c2
----+-----
  1 | cab
  1 | abc
  2 | fed
  2 | def
  4 | jkl
  5 | mno
(6 rows)

=> SELECT c1 FROM t1 GROUP BY c1 HAVING c1 <> ANY (SELECT MAX(c1) FROM t2) ORDER BY c1;
 c1
----
  1
  2
  4
  5
(4 rows)

ANY noncorrelated subqueries with aggregates and a GROUP BY clause:


=> SELECT c1, c2 FROM t1 WHERE c1 <> ANY (SELECT MAX(c1) FROM t2 GROUP BY c2) ORDER BY c1;
 c1 | c2
----+-----
  1 | cab
  1 | abc
  2 | fed
  2 | def
  3 | ihg
  3 | ghi
  4 | jkl
  5 | mno
(8 rows)

ANY noncorrelated subqueries with a GROUP BY clause:

=> SELECT c1, c2 FROM t1 WHERE c1 <=> ANY (SELECT c1 FROM t2 GROUP BY c1) ORDER BY c1;
 c1 | c2
----+-----
  1 | cab
  1 | abc
  2 | fed
  2 | def
  3 | ihg
  3 | ghi
(6 rows)

ANY correlated subqueries with no aggregates or GROUP BY clause:

=> SELECT c1, c2 FROM t1 WHERE c1 >= ANY (SELECT c1 FROM t2 WHERE t2.c2 = t1.c2) ORDER BY c1;
 c1 | c2
----+-----
  1 | abc
  2 | fed
  4 | jkl
(3 rows)

ALL subqueries

A subquery that uses the ALL keyword returns true when all values retrieved by the subquery match the left-hand expression, otherwise it returns false.

Examples

ALL noncorrelated subqueries without aggregates:

=> SELECT c1, c2 FROM t1 WHERE c1 >= ALL (SELECT c1 FROM t2) ORDER BY c1;
 c1 | c2
----+-----
  3 | ihg
  3 | ghi
  4 | jkl
  5 | mno
(4 rows)

ALL noncorrelated subqueries with aggregates:

=> SELECT c1, c2 FROM t1 WHERE c1 = ALL (SELECT MAX(c1) FROM t2) ORDER BY c1;
 c1 | c2
----+-----
  3 | ihg
  3 | ghi
(2 rows)

=> SELECT c1 FROM t1 GROUP BY c1 HAVING c1 <> ALL (SELECT MAX(c1) FROM t2) ORDER BY c1;
 c1
----
  1
  2
  4
  5
(4 rows)

ALL noncorrelated subqueries with aggregates and a GROUP BY clause:


=> SELECT c1, c2 FROM t1 WHERE c1 <= ALL (SELECT MAX(c1) FROM t2 GROUP BY c2) ORDER BY c1;
 c1 | c2
----+-----
  1 | cab
  1 | abc
(2 rows)

ALL noncorrelated subqueries with a GROUP BY clause:

=> SELECT c1, c2 FROM t1 WHERE c1 <> ALL (SELECT c1 FROM t2 GROUP BY c1) ORDER BY c1;
 c1 | c2
----+-----
  4 | jkl
  5 | mno
(2 rows)

NULL handling

Vertica supports multicolumn <> ALL subqueries where the columns are not marked NOT NULL. If any column contains a NULL value, Vertica returns a run-time error.

Vertica does not support = ANY subqueries that are nested within another expression if any column values are NULL.

See also

Subquery restrictions

6.1.6 - EXISTS and NOT EXISTS

The EXISTS predicate is one of the most common predicates used to build conditions that use noncorrelated and correlated subqueries.

The EXISTS predicate is one of the most common predicates used to build conditions that use noncorrelated and correlated subqueries. Use EXISTS to identify the existence of a relationship without regard for the quantity. For example, EXISTS returns true if the subquery returns any rows, and [NOT] EXISTS returns true if the subquery returns no rows.

[NOT] EXISTS subqueries take the following form:

expression [ NOT ] EXISTS ( subquery )

The EXISTS condition is considered to be met if the subquery returns at least one row. Since the result depends only on whether any records are returned, and not on the contents of those records, the output list of the subquery is normally uninteresting. A common coding convention is to write all EXISTS tests as follows:

EXISTS (SELECT 1 WHERE ...)

In the above fragment, SELECT 1 returns the value 1 for every record in the query. If the query returns, for example, five records, it returns 5 ones. The system doesn't care about the real values in those records; it just wants to know if a row is returned.

Alternatively, a subquery’s select list that uses EXISTS might consist of the asterisk (*). You do not need to specify column names, because the query tests for the existence or nonexistence of records that meet the conditions specified in the subquery.

EXISTS (SELECT * WHERE ...)

Notes

  • If EXISTS (subquery) returns at least 1 row, the result is TRUE.

  • If EXISTS (subquery) returns no rows, the result is FALSE.

  • If NOT EXISTS (subquery) returns at least 1 row, the result is FALSE.

  • If NOT EXISTS (subquery) returns no rows, the result is TRUE.

Examples

The following query retrieves the list of all the customers who purchased anything from any of the stores amounting to more than 550 dollars:

=> SELECT customer_key, customer_name, customer_state
   FROM public.customer_dimension WHERE EXISTS
     (SELECT 1 FROM store.store_sales_fact
      WHERE customer_key = public.customer_dimension.customer_key
      AND sales_dollar_amount > 550)
   AND customer_state = 'MA' ORDER BY customer_key;
 customer_key |   customer_name    | customer_state
--------------+--------------------+----------------
        14818 | William X. Nielson | MA
        18705 | James J. Goldberg  | MA
        30231 | Sarah N. McCabe    | MA
        48353 | Mark L. Brown      | MA
(4 rows)

Whether you use EXISTS or IN subqueries depends on which predicates you select in outer and inner query blocks. For example, to get a list of all the orders placed by all stores on January 2, 2003 for vendors with records in the vendor table:

=> SELECT store_key, order_number, date_ordered
   FROM store.store_orders_fact WHERE EXISTS
     (SELECT 1 FROM public.vendor_dimension
      WHERE public.vendor_dimension.vendor_key = store.store_orders_fact.vendor_key)
   AND date_ordered = '2012-01-02';
 store_key | order_number | date_ordered
-----------+--------------+--------------
        37 |         2559 | 2012-01-02
        16 |          552 | 2012-01-02
        35 |         1156 | 2012-01-02
        13 |         3885 | 2012-01-02
        25 |          554 | 2012-01-02
        21 |         2687 | 2012-01-02
        49 |         3251 | 2012-01-02
        19 |         2922 | 2012-01-02
        26 |         1329 | 2012-01-02
        40 |         1183 | 2012-01-02
(10 rows)

The above query looks for existence of the vendor and date ordered. To return a particular value, rather than simple existence, the query looks for orders placed by the vendor who got the best deal on January 4, 2004:

=> SELECT store_key, order_number, date_ordered
   FROM store.store_orders_fact ord, public.vendor_dimension vd
   WHERE ord.vendor_key = vd.vendor_key AND vd.deal_size IN
      (SELECT MAX(deal_size) FROM public.vendor_dimension)
    AND date_ordered = '2013-01-04';
 store_key | order_number | date_ordered
-----------+--------------+--------------
       166 |        36008 | 2013-01-04
       113 |        66017 | 2013-01-04
       198 |        75716 | 2013-01-04
        27 |       150241 | 2013-01-04
       148 |       182207 | 2013-01-04
         9 |       188567 | 2013-01-04
        45 |       202416 | 2013-01-04
        24 |       250295 | 2013-01-04
       121 |       251417 | 2013-01-04
(9 rows)

See also

6.1.7 - IN and NOT IN

While you cannot equate a single value to a set of values, you can check to see if a single value is found within that set of values.

While you cannot equate a single value to a set of values, you can check to see if a single value is found within that set of values. Use the IN clause for multiple-record, single-column subqueries. After the subquery returns results introduced by IN or NOT IN, the outer query uses them to return the final result.

[NOT] IN subqueries take the following form:

{ expression [ NOT ] IN ( subquery )| expression [ NOT ] IN ( expression ) }

There is no limit to the number of parameters passed to the IN clause of the SELECT statement; for example:

=> SELECT * FROM tablename WHERE column IN (a, b, c, d, e, ...);

Vertica also supports queries where two or more outer expressions refer to different inner expressions:

=> SELECT * FROM A WHERE (A.x,A.x) IN (SELECT B.x, B.y FROM B);

Examples

The following query uses the VMart schema to illustrate the use of outer expressions referring to different inner expressions:

=> SELECT product_description, product_price FROM product_dimension
   WHERE (product_dimension.product_key, product_dimension.product_key) IN
      (SELECT store.store_orders_fact.order_number,
         store.store_orders_fact.quantity_ordered
       FROM store.store_orders_fact);
 product_description         | product_price
-----------------------------+---------------
 Brand #72 box of candy      |           326
 Brand #71 vanilla ice cream |           270
(2 rows)

To find all products supplied by stores in MA, first create the inner query and run it to ensure that it works as desired. The following query returns all stores located in MA:

=> SELECT store_key FROM store.store_dimension WHERE store_state = 'MA';
 store_key
-----------
        13
        31
(2 rows)

Then create the outer or main query that specifies all distinct products that were sold in stores located in MA. This statement combines the inner and outer queries using the IN predicate:

=> SELECT DISTINCT s.product_key, p.product_description
   FROM store.store_sales_fact s, public.product_dimension p
   WHERE s.product_key = p.product_key
       AND s.product_version = p.product_version
       AND s.store_key IN
         (SELECT store_key
          FROM store.store_dimension
          WHERE store_state = 'MA')
   ORDER BY s.product_key;
 product_key |          product_description
-------------+---------------------------------------
           1 | Brand #1 white bread
           1 | Brand #4 vegetable soup
           3 | Brand #9 wheelchair
           5 | Brand #15 cheddar cheese
           5 | Brand #19 bleach
           7 | Brand #22 canned green beans
           7 | Brand #23 canned tomatoes
           8 | Brand #24 champagne
           8 | Brand #25 chicken nuggets
          11 | Brand #32 sausage
         ...   ...
(281 rows)

When using NOT IN, the subquery returns a list of zero or more values in the outer query where the comparison column does not match any of the values returned from the subquery. Using the previous example, NOT IN returns all the products that are not supplied from MA.

Notes

Vertica supports multicolumn NOT IN subqueries in which the columns are not marked NOT NULL. If one of the columns is found to contain a NULL value during query execution, Vertica returns a run-time error.

Similarly, IN subqueries nested within another expression are not supported if any of the column values are NULL. For example, if in the following statement column x from either table contained a NULL value, Vertica returns a run-time error:

=> SELECT * FROM t1 WHERE (x IN (SELECT x FROM t2)) IS FALSE;
   ERROR: NULL value found in a column used by a subquery

See also

6.2 - Subqueries in the SELECT list

Subqueries can occur in the select list of the containing query.

Subqueries can occur in the select list of the containing query. The results from the following statement are ordered by the first column (customer_name). You could also write ORDER BY 2 and specify that the results be ordered by the select-list subquery.

=> SELECT c.customer_name, (SELECT AVG(annual_income) FROM customer_dimension
    WHERE deal_size = c.deal_size) AVG_SAL_DEAL FROM customer_dimension c
     ORDER BY 1;
 customer_name | AVG_SAL_DEAL
---------------+--------------
 Goldstar      |       603429
 Metatech      |       628086
 Metadata      |       666728
 Foodstar      |       695962
 Verihope      |       715683
 Veridata      |       868252
 Bettercare    |       879156
 Foodgen       |       958954
 Virtacom      |       991551
 Inicorp       |      1098835
...

Notes

  • Scalar subqueries in the select-list return a single row/column value. These subqueries use Boolean comparison operators: =, >, <, <>, <=, >=.

    If the query is correlated, it returns NULL if the correlation results in 0 rows. If the query returns more than one row, the query errors out at run time and Vertica displays an error message that the scalar subquery must only return 1 row.

  • Subquery expressions such as [NOT] IN, [NOT] EXISTS, ANY/SOME, or ALL always return a single Boolean value that evaluates to TRUE, FALSE, or UNKNOWN; the subquery itself can have many rows. Most of these queries can be correlated or noncorrelated.

  • Subqueries in the ORDER BY and GROUP BY clauses are supported; for example, the following statement says to order by the first column, which is the select-list subquery:

    => SELECT (SELECT MAX(x) FROM t2 WHERE y=t1.b) FROM t1 ORDER BY 1;
    

See also

6.3 - WITH clauses

WITH clauses are concomitant queries within a larger, primary query.

WITH clauses are concomitant queries within a larger, primary query. Vertica can evaluate WITH clauses in two ways:

  • Inline expansion (default): Vertica evaluates each WITH clause every time it is referenced by the primary query.

  • Materialization: Vertica evaluates each WITH clause once, stores results in a temporary table, and references this table as often as the query requires.

See WITH clause for details on syntax options and requirements.

6.3.1 - Inline expansion of WITH clause

By default, Vertica uses inline expansion to evaluate WITH clauses.

By default, Vertica uses inline expansion to evaluate WITH clauses. Vertica evaluates each WITH clause every time it is referenced by the primary query. Inline expansion often works best if the query does not reference the same WITH clause multiple times, or if some local optimizations are possible after inline expansion.

Example

The following example shows a WITH clause that is a good candidate for inline expansion. The WITH clause is used in a query that obtains order information for all 2018 orders shipped between December 01-07:


-- Enable materialization
ALTER SESSION SET PARAMETER WithClauseMaterialization=1;

-- Begin WITH
WITH /*+ENABLE_WITH_CLAUSE_MATERIALIZATION */
  store_orders_fact_new AS(
    SELECT * FROM store.store_orders_fact WHERE date_shipped between '2018-12-01' and '2018-12-07')
-- End WITH
-- Begin primary query
SELECT store_key, product_key, product_version, SUM(quantity_ordered*unit_price) AS total_price
FROM store_orders_fact_new
GROUP BY store_key, product_key, product_version
ORDER BY total_price DESC;

 store_key | product_key | product_version | total_price
-----------+-------------+-----------------+-------------
       135 |       14815 |               2 |       30000
       154 |       19202 |               1 |       29106
       232 |        1855 |               2 |       29008
        20 |        4804 |               3 |       28500
        11 |       16741 |               3 |       28200
       169 |       12374 |               1 |       28120
        50 |        9395 |               5 |       27538
        34 |        8888 |               4 |       27100
       142 |       10331 |               2 |       27027
       106 |       18932 |               1 |       26864
       190 |        8229 |               1 |       26460
       198 |        8545 |               3 |       26287
        38 |       17426 |               1 |       26280
         5 |       10095 |               1 |       26224
        41 |        2342 |               1 |       25920
        87 |        5574 |               1 |       25443
       219 |       15271 |               1 |       25146
        60 |       14223 |               1 |       25026
        97 |       16324 |               2 |       24864
       234 |       17681 |               1 |       24795
       195 |       16532 |               1 |       24794
        83 |        9597 |               2 |       24661
       149 |        7164 |               5 |       24518
       142 |       11022 |               4 |       24400
       202 |       12712 |               1 |       24380
        13 |       18154 |               1 |       24273
         7 |        3793 |               3 |       24250
...

Vertica processes the query as follows:

  1. Expands the WITH clause reference to store_orders_fact_new within the primary query.

  2. After expanding the WITH clause, evaluates the primary query.

6.3.2 - Materialization of WITH clause

When materialization is enabled, Vertica evaluates each WITH clause once, stores results in a temporary table, and references this table as often as the query requires.

When materialization is enabled, Vertica evaluates each WITH clause once, stores results in a temporary table, and references this table as often as the query requires. Vertica drops the temporary table after primary query execution completes.

Materialization can facilitate better performance when WITH clauses are complex—for example, when the WITH clauses contain JOIN and GROUP BY clauses, and are referenced multiple times in the primary query.

If materialization is enabled, WITH statements perform an auto-commit of the user transaction. This occurs even when using EXPLAIN with the WITH statement.

Enabling materialization

WITH materialization is set by configuration parameter WithClauseMaterialization, by default set to 0 (disabled). You can enable and disable materialization by setting WithClauseMaterialization at database and session levels, with ALTER DATABASE and ALTER SESSION, respectively:

  • Database:

    => ALTER DATABASE db-spec SET PARAMETER WithClauseMaterialization={ 0 | 1 };
    => ALTER DATABASE db-spec CLEAR PARAMETER WithClauseMaterialization;
    
  • Session: Parameter setting remains in effect until you explicitly clear it, or the session ends.

    => ALTER SESSION SET PARAMETER WithClauseMaterialization={ 0 | 1 };
    => ALTER SESSION CLEAR PARAMETER WithClauseMaterialization;
    

You can also enable WITH materialization for individual queries with the hint ENABLE_WITH_CLAUSE_MATERIALIZATION. Materialization is automatically cleared when the query returns. For example:


=> WITH /*+ENABLE_WITH_CLAUSE_MATERIALIZATION */ revenue AS (
      SELECT vendor_key, SUM(total_order_cost) AS total_revenue
      FROM store.store_orders_fact
      GROUP BY vendor_key ORDER BY 1)
     ...

EE5 temp relation support for materialized WITH clause

By default, when WITH clause queries are reused, Vertica saves those WITH clause query outputs in EE5 temp relations. However, this option can be changed. EE5 temp relation support for WITH materialization is set by configuration parameter EnableWITHTempRelReuseLimit, which can be set in the following ways:

  • 0: Disables this feature.

  • 1: Force-saves all WITH clause queries into EE5 temp relations, whether or not they are reused.

  • 2 (default): Enables this feature only when queries are reused.

EnableWITHTempRelReuseLimit can be set at database and session levels, with ALTER DATABASE and ALTER SESSION, respectively.

Example

The following example shows a WITH clause that is a good candidate for materialization. The query obtains data for the vendor who has the highest combined order cost for all orders:

-- Enable materialization
=> ALTER SESSION SET PARAMETER WithClauseMaterialization=1;

-- Define WITH clause
=> WITH revenue AS (
      SELECT vendor_key, SUM(total_order_cost) AS total_revenue
      FROM store.store_orders_fact
      GROUP BY vendor_key ORDER BY 1)
-- End WITH clause

-- Primary query
=> SELECT vendor_name, vendor_address, vendor_city, total_revenue
FROM vendor_dimension v, revenue r
WHERE v.vendor_key = r.vendor_key AND total_revenue = (SELECT MAX(total_revenue) FROM revenue )
ORDER BY vendor_name;
   vendor_name    | vendor_address | vendor_city | total_revenue
------------------+----------------+-------------+---------------
 Frozen Suppliers | 471 Mission St | Peoria      |      49877044
(1 row)

Vertica processes this query as follows:

  1. WITH clause revenue evaluates its SELECT statement from table store.store_orders_fact.

  2. Results of the revenue clause are stored in a local temporary table.

  3. Whenever the revenue clause statement is referenced, the results stored in the table are used.

  4. The temporary table is dropped when query execution is complete.

6.3.3 - WITH clause recursion

For example:.

A WITH clause that includes the RECURSIVE option iterates over its own output through repeated execution of a UNION or UNION ALL query. Recursive queries are useful when working with self-referential data—hierarchies such as manager-subordinate relationships, or tree-structured data such as taxonomies.

The configuration parameter WithClauseRecursionLimit—by default set to 8—sets the maximum depth of recursion. You can set this parameter at database and session scopes with ALTER DATABASE and ALTER SESSION, respectively. Recursion continues until it reaches the configured maximum depth, or until the last iteration returns with no data.

You specify a recursive WITH clause as follows:

WITH [ /*+ENABLE_WITH_CLAUSE_MATERIALIZATION*/ ] RECURSIVE
   cte-identifier [ ( column-aliases ) ] AS (
     non-recursive-term
     UNION [ ALL ]
     recursive-term
   )

Non-recursive and recursive terms are separated by UNION or UNION ALL:

  • The non-recursive-term query sets its result set in cte-identifier, which is subject to recursion in recursive-term.

  • The UNION statement's recursive-term recursively iterates over its own output. When recursion is complete, the results of all iterations are compiled and set in cte-identifier.

For example:


=> ALTER SESSION SET PARAMETER WithClauseRecursionLimit=4; -- maximum recursion depth = 4
=> WITH RECURSIVE nums (n) AS (
   SELECT 1 -- non-recursive (base) term
   UNION ALL
     SELECT n+1 FROM nums -- recursive term
  )
SELECT n FROM nums; -- primary query

This simple query executes as follows:

  1. Executes the WITH RECURSIVE clause:

    • Evaluates the non-recursive term SELECT 1, and places the result set—1—in nums.

    • Iterates over the UNION ALL query (SELECT n+1) until the number of iterations is greater than the configuration parameter WithClauseRecursionLimit.

    • Combines the results of all UNION queries and sets the result set in nums, and then exits to the primary query.

  2. Executes the primary query SELECT n FROM nums:

    
     n
    ---
     1
     2
     3
     4
     5
    (5 rows)
    

In this case , WITH RECURSIVE clause exits after four iterations as per WithClauseRecursionLimit. If you restore WithClauseRecursionLimit to its default value of 8, then the clause exits after eight iterations:


=> ALTER SESSION CLEAR PARAMETER WithClauseRecursionLimit;
=> WITH RECURSIVE nums (n) AS (
   SELECT 1
   UNION ALL
     SELECT n+1 FROM nums
  )
SELECT n FROM nums;
 n
---
 1
 2
 3
 4
 5
 6
 7
 8
 9
(9 rows)

Restrictions

The following restrictions apply:

  • The SELECT list of a non-recursive term cannot include the wildcard * (asterisk) or the function MATCH_COLUMNS.

  • A recursive term can reference the target CTE only once.

  • Recursive reference cannot appear within an outer join.

  • Recursive reference cannot appear within a subquery.

  • WITH clauses do not support UNION options ORDER BY, LIMIT, and OFFSET.

Examples

A small software company maintains the following data on employees and their managers:

=> SELECT * FROM personnel.employees ORDER BY emp_id;
 emp_id |   fname   |   lname   | section_id |    section_name     |  section_leader  | leader_id
--------+-----------+-----------+------------+---------------------+------------------+-----------
      0 | Stephen   | Mulligan  |          0 |                     |                  |
      1 | Michael   | North     |        201 | Development         | Zoe Black        |         3
      2 | Megan     | Berry     |        202 | QA                  | Richard Chan     |        18
      3 | Zoe       | Black     |        101 | Product Development | Renuka Patil     |        24
      4 | Tim       | James     |        203 | IT                  | Ebuka Udechukwu  |        17
      5 | Bella     | Tucker    |        201 | Development         | Zoe Black        |         3
      6 | Alexandra | Climo     |        202 | QA                  | Richard Chan     |        18
      7 | Leonard   | Gray      |        203 | IT                  | Ebuka Udechukwu  |        17
      8 | Carolyn   | Henderson |        201 | Development         | Zoe Black        |         3
      9 | Ryan      | Henderson |        201 | Development         | Zoe Black        |         3
     10 | Frank     | Tucker    |        205 | Sales               | Benjamin Glover  |        29
     11 | Nathan    | Ferguson  |        102 | Sales Marketing     | Eric Redfield    |        28
     12 | Kevin     | Rampling  |        101 | Product Development | Renuka Patil     |        24
     13 | Tuy Kim   | Duong     |        201 | Development         | Zoe Black        |         3
     14 | Dwipendra | Sing      |        204 | Tech Support        | Sarah Feldman    |        26
     15 | Dylan     | Wijman    |        206 | Documentation       | Kevin Rampling   |        12
     16 | Tamar     | Sasson    |        207 | Marketing           | Nathan Ferguson  |        11
     17 | Ebuka     | Udechukwu |        101 | Product Development | Renuka Patil     |        24
     18 | Richard   | Chan      |        101 | Product Development | Renuka Patil     |        24
     19 | Maria     | del Rio   |        201 | Development         | Zoe Black        |         3
     20 | Hua       | Song      |        204 | Tech Support        | Sarah Feldman    |        26
     21 | Carmen    | Lopez     |        204 | Tech Support        | Sarah Feldman    |        26
     22 | Edgar     | Mejia     |        206 | Documentation       | Kevin Rampling   |        12
     23 | Riad      | Salim     |        201 | Development         | Zoe Black        |         3
     24 | Renuka    | Patil     |        100 | Executive Office    | Stephen Mulligan |         0
     25 | Rina      | Dsouza    |        202 | QA                  | Richard Chan     |        18
     26 | Sarah     | Feldman   |        101 | Product Development | Renuka Patil     |        24
     27 | Max       | Mills     |        102 | Sales Marketing     | Eric Redfield    |        28
     28 | Eric      | Redfield  |        100 | Executive Office    | Stephen Mulligan |         0
     29 | Benjamin  | Glover    |        102 | Sales Marketing     | Eric Redfield    |        28
     30 | Dominic   | King      |        205 | Sales               | Benjamin Glover  |        29
     32 | Ryan      | Metcalfe  |        206 | Documentation       | Kevin Rampling   |        12
     33 | Piers     | Paige     |        201 | Development         | Zoe Black        |         3
     34 | Nicola    | Kelly     |        207 | Marketing           | Nathan Ferguson  |        11
(34 rows)

You can query this data for employee-manager relationships through WITH RECURSIVE. For example, the following query's WITH RECURSIVE clause gets employee-manager relationships for employee Eric Redfield, including all employees who report directly and indirectly to him:

WITH RECURSIVE managers (employeeID, employeeName, sectionID, section, lead, leadID)
 AS (SELECT emp_id, fname||' '||lname, section_id, section_name, section_leader, leader_id
      FROM personnel.employees WHERE fname||' '||lname = 'Eric Redfield'
 UNION
    SELECT emp_id, fname||' '||lname AS employee_name, section_id, section_name, section_leader, leader_id FROM personnel.employees e
      JOIN managers m ON m.employeeID = e.leader_id)
 SELECT employeeID, employeeName, lead AS 'Reports to', section, leadID from managers ORDER BY sectionID, employeeName;

The WITH RECURSIVE clause defines the CTE managers, and then executes in two phases:

  1. The non-recursive term populates managers with data that it queries from personnel.employees.

  2. The recursive term's UNION query iterates over its own output until, on the fourth cycle, it finds no more data. The results of all iterations are then compiled and set in managers, and the WITH CLAUSE exits to the primary query.

The primary query returns three levels of data from managers—one for each recursive iteration:

Similarly, the following query iterates over the same data to get all employee-manager relationships for employee Richard Chan, who is one level lower in the company chain of command:

WITH RECURSIVE managers (employeeID, employeeName, sectionID, section, lead, leadID)
 AS (SELECT emp_id, fname||' '||lname, section_id, section_name, section_leader, leader_id
      FROM personnel.employees WHERE fname||' '||lname = 'Richard Chan'
 UNION
    SELECT emp_id, fname||' '||lname AS employee_name, section_id, section_name, section_leader, leader_id FROM personnel.employees e
      JOIN managers m ON m.employeeID = e.leader_id)
 SELECT employeeID, employeeName, lead AS 'Reports to', section, leadID from managers ORDER BY sectionID, employeeName;

The WITH RECURSIVE clause executes as before, except this time it finds no more data after two iterations and exits. Accordingly, the primary query returns two levels of data from managers:

WITH RECURSIVE materialization

By default, materialization is disabled. In this case, Vertica rewrites the WITH RECURSIVE query into subqueries, as many as necessary for the required level of recursion.

If recursion is very deep, the high number of query rewrites is liable to incur considerable overhead that adversely affects performance and exhausts system resources. In this case, consider enabling materialization, either with the configuration parameter WithClauseMaterialization, or the hint ENABLE_WITH_CLAUSE_MATERIALIZATION. In either case, intermediate result sets from all recursion levels are written to local temporary tables. When recursion is complete, the intermediate results in all temporary tables are compiled and passed on to the primary query.

6.4 - Noncorrelated and correlated subqueries

Subqueries can be categorized into two types:.

Subqueries can be categorized into two types:

  • A noncorrelated (simple) subquery obtains its results independently of its containing (outer) statement.

  • A correlated subquery requires values from its outer query in order to execute.

Noncorrelated subqueries

A noncorrelated subquery executes independently of the outer query. The subquery executes first, and then passes its results to the outer query, For example:

=> SELECT name, street, city, state FROM addresses WHERE state IN (SELECT state FROM states);

Vertica executes this query as follows:

  1. Executes the subquery SELECT state FROM states (in bold).

  2. Passes the subquery results to the outer query.

A query's WHERE and HAVING clauses can specify noncorrelated subqueries if the subquery resolves to a single row, as shown below:

In WHERE clause

=> SELECT COUNT(*) FROM SubQ1 WHERE SubQ1.a = (SELECT y from SubQ2);

In HAVING clause

=> SELECT COUNT(*) FROM SubQ1 GROUP BY SubQ1.a HAVING SubQ1.a = (SubQ1.a & (SELECT y from SubQ2)

Correlated subqueries

A correlated subquery typically obtains values from its outer query before it executes. When the subquery returns, it passes its results to the outer query.

In the following example, the subquery needs values from the addresses.state column in the outer query:

=> SELECT name, street, city, state FROM addresses
     WHERE EXISTS (SELECT * FROM states WHERE states.state = addresses.state);

Vertica executes this query as follows:

  1. The query extracts and evaluates each addresses.state value in the outer subquery records.

  2. Then the query—using the EXISTS predicate—checks the addresses in the inner (correlated) subquery.

  3. Because it uses the EXISTS predicate, the query stops processing when it finds the first match.

When Vertica executes this query, it translates the full query into a JOIN WITH SIPS.

6.5 - Flattening FROM clause subqueries

FROM clause subqueries are always evaluated before their containing query.

FROM clause subqueries are always evaluated before their containing query. In some cases, the optimizer flattens FROM clause subqueries so the query can execute more efficiently.

For example, in order to create a query plan for the following statement, the Vertica query optimizer evaluates all records in table t1 before it evaluates the records in table t0:

=> SELECT * FROM (SELECT a, MAX(a) AS max FROM (SELECT * FROM t1) AS t0 GROUP BY a);

Given the previous query, the optimizer can internally flatten it as follows:

=> SELECT * FROM (SELECT a, MAX(a) FROM t1 GROUP BY a) AS t0;

Both queries return the same results, but the flattened query runs more quickly.

Flattening views

When a query's FROM clause specifies a view, the optimizer expands the view by replacing it with the query that the view encapsulates. If the view contains subqueries that are eligible for flattening, the optimizer produces a query plan that flattens those subqueries.

Flattening restrictions

The optimizer cannot create a flattened query plan if a subquery or view contains one of the following elements:

  • Aggregate function

  • Analytic function

  • Outer join (left, right or full)

  • GROUP BY, ORDER BY, or HAVING clause

  • DISTINCT keyword

  • LIMIT or OFFSET clause

  • UNION, EXCEPT, or INTERSECT clause

  • EXISTS subquery

Examples

If a predicate applies to a view or subquery, the flattening operation can allow for optimizations by evaluating the predicates before the flattening takes place. Two examples follow.

View flattening

In this example, view v1 is defined as follows:

=> CREATE VIEW v1 AS SELECT * FROM a;

The following query specifies this view:

=> SELECT * FROM v1 JOIN b ON x=y WHERE x > 10;

Without flattening, the optimizer evaluates the query as follows:

  1. Evalutes the subquery.

  2. Applies the predicate WHERE x > 10.

In contrast, the optimizer can create a flattened query plan by applying the predicate before evaluating the subquery. This reduces the optimizer's work because it returns only the records WHERE x > 10 to the containing query.

Vertica internally transforms the previous query as follows:

=> SELECT * FROM (SELECT * FROM a) AS t1 JOIN b ON x=y WHERE x > 10;

The optimizer then flattens the query:

=> SELECT * FROM a JOIN b ON x=y WHERE x > 10;

Subquery flattening

The following example shows how Vertica transforms FROM clause subqueries within a WHERE clause IN subquery. Given the following query:

=> SELECT * FROM a
   WHERE b IN (SELECT b FROM (SELECT * FROM t2)) AS D WHERE x=1;

The optimizer flattens it as follows:

=> SELECT * FROM a
   WHERE b IN (SELECT b FROM t2) AS D WHERE x=1;

See also

Subquery restrictions

6.6 - Subqueries in UPDATE and DELETE statements

You can nest subqueries within UPDATE and DELETE statements.

You can nest subqueries within UPDATE and DELETE statements.

UPDATE subqueries

You can update records in one table according to values in others, by nesting a subquery within an UPDATE statement. The example below illustrates this through a couple of noncorrelated subqueries. You can reproduce this example with the following tables:

=> CREATE TABLE addresses(cust_id INTEGER, address VARCHAR(2000));
CREATE TABLE
dbadmin=> INSERT INTO addresses VALUES(20,'Lincoln Street'),(30,'Booth Hill Road'),(30,'Beach Avenue'),(40,'Mt. Vernon Street'),(50,'Hillside Avenue');
 OUTPUT
--------
      5
(1 row)

=> CREATE TABLE new_addresses(new_cust_id integer, new_address Boolean DEFAULT 'T');
CREATE TABLE
dbadmin=> INSERT INTO new_addresses VALUES (20),(30),(80);
 OUTPUT
--------
      3
(1 row)

=> INSERT INTO new_addresses VALUES (60,'F');
 OUTPUT
--------
      1

=> COMMIT;
COMMIT

Queries on these tables return the following results:

=> SELECT * FROM addresses;
 cust_id |      address
---------+-------------------
      20 | Lincoln Street
      30 | Beach Avenue
      30 | Booth Hill Road
      40 | Mt. Vernon Street
      50 | Hillside Avenue
(5 rows)

=> SELECT * FROM new_addresses;
 new_cust_id | new_address
-------------+-------------
          20 | t
          30 | t
          80 | t
          60 | f
(4 rows)
  1. The following UPDATE statement uses a noncorrelated subquery to join new_addresses and addresses records on customer IDs. UPDATE sets the value 'New Address' in the joined addresses records. The statement output indicates that three rows were updated:

    => UPDATE addresses SET address='New Address'
       WHERE cust_id IN (SELECT new_cust_id FROM new_addresses WHERE new_address='T');
     OUTPUT
    --------
    3
    (1 row)
    
  2. Query the addresses table to see the changes for matching customer ID 20 and 30. Addresses for customer ID 40 and 50 are not updated:

    => SELECT * FROM addresses;
     cust_id |      address
    ---------+-------------------
          40 | Mt. Vernon Street
          50 | Hillside Avenue
          20 | New Address
          30 | New Address
          30 | New Address
    (5 rows)
    
    =>COMMIT;
    COMMIT
    

DELETE subqueries

You can delete records in one table based according to values in others by nesting a subquery within a DELETE statement.

For example, you want to remove records from new_addresses that were used earlier to update records in addresses. The following DELETE statement uses a noncorrelated subquery to join new_addresses and addresses records on customer IDs. It then deletes the joined records from table new_addresses:

=> DELETE FROM new_addresses
    WHERE new_cust_id IN (SELECT cust_id FROM addresses WHERE address='New Address');
 OUTPUT
--------
      2
(1 row)
=> COMMIT;
COMMIT

Querying new_addresses confirms that the records were deleted:

=> SELECT * FROM new_addresses;
 new_cust_id | new_address
-------------+-------------
          60 | f
          80 | t
(2 rows)

6.7 - Subquery examples

This topic illustrates some of the subqueries you can write.

This topic illustrates some of the subqueries you can write. The examples use the VMart example database.

Single-row subqueries

Single-row subqueries are used with single-row comparison operators (=, >=, <=, <>, and <=>) and return exactly one row.

For example, the following query retrieves the name and hire date of the oldest employee in the Vmart database:

=> SELECT employee_key, employee_first_name, employee_last_name, hire_date
   FROM employee_dimension
   WHERE hire_date = (SELECT MIN(hire_date) FROM employee_dimension);
 employee_key | employee_first_name | employee_last_name | hire_date
--------------+---------------------+--------------------+------------
         2292 | Mary                | Bauer              | 1956-01-11
(1 row)

Multiple-row subqueries

Multiple-row subqueries return multiple records.

For example, the following IN clause subquery returns the names of the employees making the highest salary in each of the six regions:

=> SELECT employee_first_name, employee_last_name, annual_salary, employee_region
    FROM employee_dimension WHERE annual_salary IN
     (SELECT MAX(annual_salary) FROM employee_dimension GROUP BY employee_region)
   ORDER BY annual_salary DESC;
 employee_first_name | employee_last_name | annual_salary |  employee_region
---------------------+--------------------+---------------+-------------------
 Alexandra           | Sanchez            |        992363 | West
 Mark                | Vogel              |        983634 | South
 Tiffany             | Vu                 |        977716 | SouthWest
 Barbara             | Lewis              |        957949 | MidWest
 Sally               | Gauthier           |        927335 | East
 Wendy               | Nielson            |        777037 | NorthWest
(6 rows)

Multicolumn subqueries

Multicolumn subqueries return one or more columns. Sometimes a subquery's result set is evaluated in the containing query in column-to-column and row-to-row comparisons.

You can substitute some multicolumn subqueries with a join, with the reverse being true as well. For example, the following two queries ask for the sales transactions of all products sold online to customers located in Massachusetts and return the same result set. The only difference is the first query is written as a join and the second is written as a subquery.

Join query: Subquery:
=> SELECT *
   FROM online_sales.online_sales_fact
   INNER JOIN public.customer_dimension
   USING (customer_key)
   WHERE customer_state = 'MA';
=> SELECT *
   FROM online_sales.online_sales_fact
   WHERE customer_key IN
     (SELECT customer_key
      FROM public.customer_dimension
      WHERE customer_state = 'MA');

The following query returns all employees in each region whose salary is above the average:

=> SELECT e.employee_first_name, e.employee_last_name, e.annual_salary,
      e.employee_region, s.average
   FROM employee_dimension e,
     (SELECT employee_region, AVG(annual_salary) AS average
      FROM employee_dimension GROUP BY employee_region) AS s
   WHERE  e.employee_region = s.employee_region AND e.annual_salary > s.average
   ORDER BY annual_salary DESC;
 employee_first_name | employee_last_name | annual_salary | employee_region |     average
---------------------+--------------------+---------------+-----------------+------------------
 Doug                | Overstreet         |        995533 | East            |  61192.786013986
 Matt                | Gauthier           |        988807 | South           | 57337.8638902996
 Lauren              | Nguyen             |        968625 | West            | 56848.4274914089
 Jack                | Campbell           |        963914 | West            | 56848.4274914089
 William             | Martin             |        943477 | NorthWest       | 58928.2276119403
 Luigi               | Campbell           |        939255 | MidWest         | 59614.9170454545
 Sarah               | Brown              |        901619 | South           | 57337.8638902996
 Craig               | Goldberg           |        895836 | East            |  61192.786013986
 Sam                 | Vu                 |        889841 | MidWest         | 59614.9170454545
 Luigi               | Sanchez            |        885078 | MidWest         | 59614.9170454545
 Michael             | Weaver             |        882685 | South           | 57337.8638902996
 Doug                | Pavlov             |        881443 | SouthWest       | 57187.2510548523
 Ruth                | McNulty            |        874897 | East            |  61192.786013986
 Luigi               | Dobisz             |        868213 | West            | 56848.4274914089
 Laura               | Lang               |        865829 | East            |  61192.786013986
 ...

You can also use the EXCEPT, INTERSECT, and UNION [ALL] keywords in FROM, WHERE, and HAVING clauses.

The following subquery returns information about all Connecticut-based customers who bought items through either stores or online sales channel and whose purchases amounted to more than 500 dollars:

=> SELECT DISTINCT customer_key, customer_name FROM public.customer_dimension
   WHERE customer_key IN (SELECT customer_key FROM store.store_sales_fact
      WHERE sales_dollar_amount > 500
      UNION ALL
      SELECT customer_key FROM online_sales.online_sales_fact
      WHERE sales_dollar_amount > 500)
   AND customer_state = 'CT';
 customer_key |  customer_name
--------------+------------------
          200 | Carla Y. Kramer
          733 | Mary Z. Vogel
          931 | Lauren X. Roy
         1533 | James C. Vu
         2948 | Infocare
         4909 | Matt Z. Winkler
         5311 | John Z. Goldberg
         5520 | Laura M. Martin
         5623 | Daniel R. Kramer
         6759 | Daniel Q. Nguyen
 ...

HAVING clause subqueries

A HAVING clause is used in conjunction with the GROUP BY clause to filter the select-list records that a GROUP BY returns. HAVING clause subqueries must use Boolean comparison operators: =, >, <, <>, <=, >= and take the following form:

SELECT <column, ...>
FROM <table>
GROUP BY <expression>
HAVING <expression>
  (SELECT <column, ...>
   FROM <table>
   HAVING <expression>);

For example, the following statement uses the VMart database and returns the number of customers who purchased lowfat products. Note that the GROUP BY clause is required because the query uses an aggregate (COUNT).

=> SELECT s.product_key, COUNT(s.customer_key) FROM store.store_sales_fact s
   GROUP BY s.product_key HAVING s.product_key IN
     (SELECT product_key FROM product_dimension WHERE diet_type = 'Low Fat');

The subquery first returns the product keys for all low-fat products, and the outer query then counts the total number of customers who purchased those products.

 product_key | count
 -------------+-------
          15 |     2
          41 |     1
          66 |     1
         106 |     1
         118 |     1
         169 |     1
         181 |     2
         184 |     2
         186 |     2
         211 |     1
         229 |     1
         267 |     1
         289 |     1
         334 |     2
         336 |     1
(15 rows)

6.8 - Subquery restrictions

The following restrictions apply to Vertica subqueries:.

The following restrictions apply to Vertica subqueries:

  • Subqueries are not allowed in the defining query of a CREATE PROJECTION statement.

  • Subqueries can be used in the SELECT list, but GROUP BY or aggregate functions are not allowed in the query if the subquery is not part of the GROUP BY clause in the containing query. For example, the following two statement returns an error message:

    => SELECT y, (SELECT MAX(a) FROM t1) FROM t2 GROUP BY y;
       ERROR:  subqueries in the SELECT or ORDER BY are not supported if the
       subquery is not part of the GROUP BY
    => SELECT MAX(y), (SELECT MAX(a) FROM t1) FROM t2;
       ERROR:  subqueries in the SELECT or ORDER BY are not supported if the
       query has aggregates and the subquery is not part of the GROUP BY
    
  • Subqueries are supported within UPDATE statements with the following exceptions:

    • You cannot use SET column = {expression} to specify a subquery.

    • The table specified in the UPDATE list cannot also appear in the FROM clause (no self joins).

  • FROM clause subqueries require an alias but tables do not. If the table has no alias, the query must refer its columns as table-name.column-name. However, column names that are unique among all tables in the query do not need to be qualified by their table name.

  • If the ORDER BY clause is inside a FROM clause subquery, rather than in the containing query, the query is liable to return unexpected sort results. This occurs because Vertica data comes from multiple nodes, so sort order cannot be guaranteed unless the outer query block specifies an ORDER BY clause. This behavior complies with the SQL standard, but it might differ from other databases.

  • Multicolumn subqueries cannot use the <, >, <=, >= comparison operators. They can use <>, !=, and = operators.

  • WHERE and HAVING clause subqueries must use Boolean comparison operators: =, >, <, <>, <=, >=. Those subqueries can be noncorrelated and correlated.

  • [NOT] IN and ANY subqueries nested in another expression are not supported if any of the column values are NULL. In the following statement, for example, if column x from either table t1 or t2 contains a NULL value, Vertica returns a run-time error:

    => SELECT * FROM t1 WHERE (x IN (SELECT x FROM t2)) IS FALSE;
    ERROR:  NULL value found in a column used by a subquery
    
  • Vertica returns an error message during subquery run time on scalar subqueries that return more than one row.

  • Aggregates and GROUP BY clauses are allowed in subqueries, as long as those subqueries are not correlated.

  • Correlated expressions under ALL and [NOT] IN are not supported.

  • Correlated expressions under OR are not supported.

  • Multiple correlations are allowed only for subqueries that are joined with an equality (=) predicate. However, IN/NOT IN, EXISTS/NOT EXISTS predicates within correlated subqueries are not allowed:

    => SELECT t2.x, t2.y, t2.z FROM t2 WHERE t2.z NOT IN
           (SELECT t1.z FROM t1 WHERE t1.x = t2.x);
       ERROR: Correlated subquery with NOT IN is not supported
    
  • Up to one level of correlated subqueries is allowed in the WHERE clause if the subquery references columns in the immediate outer query block. For example, the following query is not supported because the t2.x = t3.x subquery can only refer to table t1 in the outer query, making it a correlated expression because t3.x is two levels out:

    => SELECT t3.x, t3.y, t3.z FROM t3 WHERE t3.z IN (
         SELECT t1.z FROM t1 WHERE EXISTS (
            SELECT 'x' FROM t2 WHERE t2.x = t3.x) AND t1.x = t3.x);
    ERROR:  More than one level correlated subqueries are not supported
    

    The query is supported if it is rewritten as follows:

    => SELECT t3.x, t3.y, t3.z FROM t3 WHERE t3.z IN
           (SELECT t1.z FROM t1 WHERE EXISTS
             (SELECT 'x' FROM t2 WHERE t2.x = t1.x)
       AND t1.x = t3.x);
    

7 - Joins

Queries can combine records from multiple tables, or multiple instances of the same table.

Queries can combine records from multiple tables, or multiple instances of the same table. A query that combines records from one or more tables is called a join. Joins are allowed in SELECT statements and subqueries.

Supported join types

Vertica supports the following join types:

  • Inner (including natural, cross) joins

  • Left, right, and full outer joins

  • Optimizations for equality and range joins predicates

Vertica does not support nested loop joins.

Join algorithms

Vertica's query optimizer implements joins with either the hash join or merge join algorithm. For details, see Hash joins versus merge joins.

7.1 - Join syntax

Vertica supports the ANSI SQL-92 standard for joining tables, as follows:.

Vertica supports the ANSI SQL-92 standard for joining tables, as follows:

table-reference [join-type] JOIN table-reference [ ON join-predicate ]

where join-type can be one of the following:

For example:

=> SELECT * FROM T1 INNER JOIN T2 ON T1.id = T2.id;

Alternative syntax options

Vertica also supports two older join syntax conventions:

Join specified by WHERE clause join predicate

INNER JOIN is equivalent to a query that specifies its join predicate in a WHERE clause. For example, this example and the previous one return equivalent results. They both specify an inner join between tables T1 and T2 on columns T1.id and T2.id, respectively.

=> SELECT * FROM T1, T2 WHERE T1.id = T2.id;

JOIN USING clause

You can join two tables on identically named columns with a JOIN USING clause. For example:

=> SELECT * FROM T1 JOIN T2 USING(id);

By default, a join that is specified by JOIN USING is always an inner join.

Benefits of SQL-92 join syntax

Vertica recommends that you use SQL-92 join syntax for several reasons:

  • SQL-92 outer join syntax is portable across databases; the older syntax was not consistent between databases.

  • SQL-92 syntax provides greater control over whether predicates are evaluated during or after outer joins. This was also not consistent between databases when using the older syntax.

  • SQL-92 syntax eliminates ambiguity in the order of evaluating the joins, in cases where more than two tables are joined with outer joins.

7.2 - Join conditions vs. filter conditions

If you do not use the SQL-92 syntax, join conditions (predicates that are evaluated during the join) are difficult to distinguish from filter conditions (predicates that are evaluated after the join), and in some cases cannot be expressed at all.

If you do not use the SQL-92 syntax, join conditions (predicates that are evaluated during the join) are difficult to distinguish from filter conditions (predicates that are evaluated after the join), and in some cases cannot be expressed at all. With SQL-92, join conditions and filter conditions are separated into two different clauses, the ON clause and the WHERE clause, respectively, making queries easier to understand.

  • The ON clause contains relational operators (for example, <, <=, >, >=, <>, =, <=>) or other predicates that specify which records from the left and right input relations to combine, such as by matching foreign keys to primary keys. ON can be used with inner, left outer, right outer, and full outer joins. Cross joins and union joins do not use an ON clause.

    Inner joins return all pairings of rows from the left and right relations for which the ON clause evaluates to TRUE. In a left join, all rows from the left relation in the join are present in the result; any row of the left relation that does not match any rows in the right relation is still present in the result but with nulls in any columns taken from the right relation. Similarly, a right join preserves all rows from the right relation, and a full join retains all rows from both relations.

  • The WHERE clause is evaluated after the join is performed. It filters records returned by the FROM clause, eliminating any records that do not satisfy the WHERE clause condition.

Vertica automatically converts outer joins to inner joins in cases where it is correct to do so, allowing the optimizer to choose among a wider set of query plans and leading to better performance.

7.3 - Inner joins

An inner join combines records from two tables based on a join predicate and requires that each record in the first table has a matching record in the second table.

An inner join combines records from two tables based on a join predicate and requires that each record in the first table has a matching record in the second table. Thus, inner joins return only records from both joined tables that satisfy the join condition. Records that contain no matches are excluded from the result set.

Inner joins take the following form:

SELECT column-list FROM left-join-table
  [INNER] JOIN right-join-table ON <span class="code-variable"><span class="code-variable"><a href="/en/sql-reference/language-elements/predicates/join-predicate/#">join-predicate</a></span></span>

If you omit the INNER keyword, Vertica assumes an inner join. Inner joins are commutative and associative. You can specify tables in any order without changing the results.

Example

The following example specifies an inner join between tables store.store_dimension and public.employee_dimension whose records have matching values in columns store_region and employee_region, respectively:

=> SELECT s.store_region, SUM(e.vacation_days) TotalVacationDays
   FROM public.employee_dimension e
   JOIN store.store_dimension s ON s.store_region=e.employee_region
   GROUP BY s.store_region ORDER BY TotalVacationDays;

This join can also be expressed as follows:

=> SELECT s.store_region, SUM(e.vacation_days) TotalVacationDays
    FROM public.employee_dimension e, store.store_dimension s
    WHERE s.store_region=e.employee_region
    GROUP BY s.store_region ORDER BY TotalVacationDays;

Both queries return the same result set:

 store_region | TotalVacationDays
--------------+-------------------
 NorthWest    |             23280
 SouthWest    |            367250
 MidWest      |            925938
 South        |           1280468
 East         |           1952854
 West         |           2849976
(6 rows)

If the join's inner table store.store_dimension has any rows with store_region values that do not match employee_region values in table public.employee_dimension, those rows are excluded from the result set. To include that row, you can specify an outer join.

7.3.1 - Equi-joins and non equi-joins

Vertica supports any arbitrary join expression with both matching and non-matching column values.

Vertica supports any arbitrary join expression with both matching and non-matching column values. For example:

SELECT * FROM fact JOIN dim ON fact.x = dim.x;
SELECT * FROM fact JOIN dim ON fact.x > dim.y;
SELECT * FROM fact JOIN dim ON fact.x <= dim.y;
SELECT * FROM fact JOIN dim ON fact.x <> dim.y;
SELECT * FROM fact JOIN dim ON fact.x <=> dim.y;

Equi-joins are based on equality (matching column values). This equality is indicated with an equal sign (=), which functions as the comparison operator in the ON clause using SQL-92 syntax or the WHERE clause using older join syntax.

The first example below uses SQL-92 syntax and the ON clause to join the online sales table with the call center table using the call center key; the query then returns the sale date key that equals the value 156:

=> SELECT sale_date_key, cc_open_date FROM online_sales.online_sales_fact
   INNER JOIN online_sales.call_center_dimension
   ON (online_sales.online_sales_fact.call_center_key =
    online_sales.call_center_dimension.call_center_key
   AND sale_date_key = 156);
 sale_date_key | cc_open_date
---------------+--------------
           156 | 2005-08-12
(1 row)

The second example uses older join syntax and the WHERE clause to join the same tables to get the same results:

=> SELECT sale_date_key, cc_open_date
    FROM online_sales.online_sales_fact, online_sales.call_center_dimension
   WHERE online_sales.online_sales_fact.call_center_key =
      online_sales.call_center_dimension.call_center_key
   AND sale_date_key = 156;
 sale_date_key | cc_open_date
---------------+--------------
           156 | 2005-08-12
(1 row)

Vertica also permits tables with compound (multiple-column) primary and foreign keys. For example, to create a pair of tables with multi-column keys:

=> CREATE TABLE dimension(pk1 INTEGER NOT NULL, pk2 INTEGER NOT NULL);=> ALTER TABLE dimension ADD PRIMARY KEY (pk1, pk2);
=> CREATE TABLE fact (fk1 INTEGER NOT NULL, fk2 INTEGER NOT NULL);
=> ALTER TABLE fact ADD FOREIGN KEY (fk1, fk2) REFERENCES dimension (pk1, pk2);

To join tables using compound keys, you must connect two join predicates with a Boolean AND operator. For example:

=> SELECT * FROM fact f JOIN dimension d ON f.fk1 = d.pk1 AND f.fk2 = d.pk2;

You can write queries with expressions that contain the <=> operator for NULL=NULL joins.

=> SELECT * FROM fact JOIN dim ON fact.x <=> dim.y;

The <=> operator performs an equality comparison like the = operator, but it returns true, instead of NULL, if both operands are NULL, and false, instead of NULL, if one operand is NULL.

=> SELECT 1 <=> 1, NULL <=> NULL, 1 <=> NULL;
 ?column? | ?column? | ?column?
----------+----------+----------
 t        | t        | f
(1 row)

Compare the <=> operator to the = operator:

=> SELECT 1 = 1, NULL = NULL, 1 = NULL;
 ?column? | ?column? | ?column?
----------+----------+----------
 t        |          |
(1 row)

When composing joins, it helps to know in advance which columns contain null values. An employee's hire date, for example, would not be a good choice because it is unlikely hire date would be omitted. An hourly rate column, however, might work if some employees are paid hourly and some are salaried. If you are unsure about the value of columns in a given table and want to check, type the command:

=> SELECT COUNT(*) FROM tablename WHERE columnname IS NULL;

7.3.2 - Natural joins

A natural join is just a join with an implicit join predicate.

A natural join is just a join with an implicit join predicate. Natural joins can be inner, left outer, right outer, or full outer joins and take the following form:

SELECT column-list FROM left-join-table
NATURAL [ INNER | LEFT OUTER | RIGHT OUTER | FULL OUTER ] JOIN right-join-table

Natural joins are, by default, natural inner joins; however, there can also be natural left/right/full outer joins. The primary difference between an inner and natural join is that inner joins have an explicit join condition, whereas the natural join’s conditions are formed by matching all pairs of columns in the tables that have the same name and compatible data types, making natural joins equi-joins because join condition are equal between common columns. (If the data types are incompatible, Vertica returns an error.)

The following query is a simple natural join between tables T1 and T2 when the T2 column val is greater than 5:

=> SELECT * FROM T1 NATURAL JOIN T2 WHERE T2.val > 5;

The store_sales_fact table and the product_dimension table have two columns that share the same name and data type: product_key and product_version. The following example creates a natural join between those two tables at their shared columns:

=> SELECT product_description, sales_quantity FROM store.store_sales_fact
   NATURAL JOIN public.product_dimension;

The following three queries return the same result expressed as a basic query, an inner join, and a natural join. at the table expressions are equivalent only if the common attribute in the store_sales_fact table and the store_dimension table is store_key. If both tables have a column named store_key, then the natural join would also have a store_sales_fact.store_key = store_dimension.store_key join condition. Since the results are the same in all three instances, they are shown in the first (basic) query only:

=> SELECT store_name FROM store.store_sales_fact, store.store_dimension
   WHERE store.store_sales_fact.store_key = store.store_dimension.store_key
   AND store.store_dimension.store_state = 'MA' ORDER BY store_name;
 store_name
------------
 Store11
 Store128
 Store178
 Store66
 Store8
 Store90
(6 rows)

The query written as an inner join:

=> SELECT store_name FROM store.store_sales_fact
   INNER JOIN store.store_dimension
   ON (store.store_sales_fact.store_key = store.store_dimension.store_key)
   WHERE store.store_dimension.store_state = 'MA' ORDER BY store_name;

In the case of the natural join, the join predicate appears implicitly by comparing all of the columns in both tables that are joined by the same column name. The result set contains only one column representing the pair of equally-named columns.

=> SELECT store_name FROM store.store_sales_fact
   NATURAL JOIN store.store_dimension
   WHERE store.store_dimension.store_state = 'MA' ORDER BY store_name;

7.3.3 - Cross joins

Cross joins are the simplest joins to write, but they are not usually the fastest to run because they consist of all possible combinations of two tables’ records.

Cross joins are the simplest joins to write, but they are not usually the fastest to run because they consist of all possible combinations of two tables’ records. Cross joins contain no join condition and return what is known as a Cartesian product, where the number of rows in the result set is equal to the number of rows in the first table multiplied by the number of rows in the second table.

The following query returns all possible combinations from the promotion table and the store sales table:

=> SELECT * FROM promotion_dimension CROSS JOIN store.store_sales_fact;

Because this example returns over 600 million records, many cross join results can be extremely large and difficult to manage. Cross joins can be useful, however, such as when you want to return a single-row result set.

Implicit versus explicit joins

Vertica recommends that you do not write implicit cross joins (comma-separated tables in the FROM clause). These queries can imply accidental omission of a join predicate.

The following query implicitly cross joins tables promotion_dimension and** store.store_sales_fact:

=> SELECT * FROM promotion_dimension, store.store_sales_fact;

It is better practice to express this cross join explicitly, as follows:

=> SELECT * FROM promotion_dimension CROSS JOIN store.store_sales_fact;

Examples

The following example creates two small tables and their superprojections and then runs a cross join on the tables:

=> CREATE TABLE employee(employee_id INT, employee_fname VARCHAR(50));
=> CREATE TABLE department(dept_id INT, dept_name VARCHAR(50));
=> INSERT INTO employee VALUES (1, 'Andrew');
=> INSERT INTO employee VALUES (2, 'Priya');
=> INSERT INTO employee VALUES (3, 'Michelle');
=> INSERT INTO department VALUES (1, 'Engineering');
=> INSERT INTO department VALUES (2, 'QA');
=> SELECT * FROM employee CROSS JOIN department;

In the result set, the cross join retrieves records from the first table and then creates a new row for every row in the 2nd table. It then does the same for the next record in the first table, and so on.

 employee_id | employee_name | dept_id | dept_name
 -------------+---------------+---------+-----------
           1 | Andrew        |       1 |  Engineering
           2 | Priya         |       1 |  Engineering
           3 | Michelle      |       1 |  Engineering
           1 | Andrew        |       2 |  QA
           2 | Priya         |       2 |  QA
           3 | Michelle      |       2 |  QA
(6 rows)

7.4 - Outer joins

Outer joins extend the functionality of inner joins by letting you preserve rows of one or both tables that do not have matching rows in the non-preserved table.

Outer joins extend the functionality of inner joins by letting you preserve rows of one or both tables that do not have matching rows in the non-preserved table. Outer joins take the following form:

SELECT column-list FROM left-join-table
[ LEFT | RIGHT | FULL ] OUTER JOIN right-join-table ON <span class="code-variable"><span class="code-variable"><a href="/en/sql-reference/language-elements/predicates/join-predicate/#">join-predicate</a></span></span>

Left outer joins

A left outer join returns a complete set of records from the left-joined (preserved) table T1, with matched records, where available, in the right-joined (non-preserved) table T2. Where Vertica finds no match, it extends the right side column (T2) with null values.

=> SELECT * FROM T1 LEFT OUTER JOIN T2 ON T1.x = T2.x;

To exclude the non-matched values from T2, write the same left outer join, but filter out the records you don't want from the right side by using a WHERE clause:

=> SELECT * FROM T1 LEFT OUTER JOIN T2
   ON T1.x = T2.x WHERE T2.x IS NOT NULL;

The following example uses a left outer join to enrich telephone call detail records with an incomplete numbers dimension. It then filters out results that are known not to be from Massachusetts:

=> SELECT COUNT(*) FROM calls LEFT OUTER JOIN numbers
   ON calls.to_phone = numbers.phone WHERE NVL(numbers.state, '') <> 'MA';

Right outer joins

A right outer join returns a complete set of records from the right-joined (preserved) table, as well as matched values from the left-joined (non-preserved) table. If Vertica finds no matching records from the left-joined table (T1), NULL values appears in the T1 column for any records with no matching values in T1. A right join is, therefore, similar to a left join, except that the treatment of the tables is reversed.

=> SELECT * FROM T1 RIGHT OUTER JOIN T2 ON T1.x = T2.x;

The above query is equivalent to the following query, where T1 RIGHT OUTER JOIN T2 = T2 LEFT OUTER JOIN T1.

=> SELECT * FROM T2 LEFT OUTER JOIN T1 ON T2.x = T1.x;

The following example identifies customers who have not placed an order:

=> SELECT customers.customer_id FROM orders RIGHT OUTER JOIN customers
   ON orders.customer_id = customers.customer_id
   GROUP BY customers.customer_id HAVING COUNT(orders.customer_id) = 0;

Full outer joins

A full outer join returns results for both left and right outer joins. The joined table contains all records from both tables, including nulls (missing matches) from either side of the join. This is useful if you want to see, for example, each employee who is assigned to a particular department and each department that has an employee, but you also want to see all the employees who are not assigned to a particular department, as well as any department that has no employees:

=> SELECT employee_last_name, hire_date FROM  employee_dimension emp
   FULL OUTER JOIN department dept ON emp.employee_key = dept.department_key;

Notes

Vertica also supports joins where the outer (preserved) table or subquery is replicated on more than one node and the inner (non-preserved) table or subquery is segmented across more than one node. For example, in the following query, the fact table, which is almost always segmented, appears on the non-preserved side of the join, and it is allowed:

=> SELECT sales_dollar_amount, transaction_type, customer_name
    FROM store.store_sales_fact f RIGHT JOIN customer_dimension d
   ON f.customer_key = d.customer_key;
 sales_dollar_amount | transaction_type | customer_name
---------------------+------------------+---------------
                 252 | purchase         | Inistar
                 363 | purchase         | Inistar
                 510 | purchase         | Inistar
                -276 | return           | Foodcorp
                 252 | purchase         | Foodcorp
                 195 | purchase         | Foodcorp
                 290 | purchase         | Foodcorp
                 222 | purchase         | Foodcorp
                     |                  | Foodgen
                     |                  | Goldcare
(10 rows)

7.5 - Controlling join inputs

By default, the optimizer uses its own internal logic to determine whether to join one table to another as an inner or outer input.

By default, the optimizer uses its own internal logic to determine whether to join one table to another as an inner or outer input. Occasionally, the optimizer might choose the larger table as the inner input to a join. Doing so can incur performance and concurrency issues.

If the configuration parameter configuration parameter EnableForceOuter is set to 1, you can control join inputs for specific tables through ALTER TABLE..FORCE OUTER. The FORCE OUTER option modifies a table's force_outer setting in the system table TABLES. When implementing a join, Vertica compares the force_outer settings of the participating tables:

  • If table settings are unequal, Vertica uses them to set the join inputs:

    • A table with a low force_outer setting relative to other tables is joined to them as an inner input.

    • A table with a high force_outer setting relative to other tables is joined to them as an outer input.

  • If all table settings are equal, Vertica ignores them and assembles the join on its own.

The force_outer column is initially set to 5 for all newly-defined tables. You can use ALTER TABLE..FORCE OUTER to reset force_outer to a value equal to or greater than 0. For example, you might change the force_outer settings of tables abc and xyz to 3 and 8, respectively:

=> ALTER TABLE abc FORCE OUTER 3;
=> ALTER TABLE xyz FORCE OUTER 8;

Given these settings, the optimizer joins abc as the inner input to any table with a force_outer value greater than 3. The optimizer joins xyz as the outer input to any table with a force_outer value less than 8.

Projection inheritance

When you query a projection directly, it inherits the force_outer setting of its anchor table. The query then uses this setting when joined to another projection.

Enabling forced join inputs

The configuration parameter EnableForceOuter determines whether Vertica uses a table's force_outer value to implement a join. By default, this parameter is set to 0, and forced join inputs are disabled. You can enable forced join inputs at session and database scopes, through ALTER SESSION and ALTER DATABASE, respectively:

=> ALTER SESSION SET EnableForceOuter = { 0 | 1 };
=> ALTER DATABASE db-name SET EnableForceOuter =  { 0 | 1 };

If EnableForceOuter is set to 0, ALTER TABLE..FORCE OUTER statements return with this warning:

Viewing forced join inputs

EXPLAIN-generated query plans indicate whether the configuration parameter EnableForceOuter is on. A join query might include tables whose force_outer settings are less or greater than the default value of 5. In this case, the query plan includes a Force outer level field for the relevant join inputs.

For example, the following query joins tables store.store_sales and public.products, where both tables have the same force_outer setting (5). EnableForceOuter is on, as indicated in the generated query plan:

=> EXPLAIN SELECT s.store_key, p.product_description, s.sales_quantity, s.sale_date
   FROM store.store_sales s JOIN public.products p ON s.product_key=p.product_key
   WHERE s.sale_date='2014-12-01' ORDER BY s.store_key, s.sale_date;

 EnableForceOuter is on
 Access Path:
 +-SORT [Cost: 7K, Rows: 100K (NO STATISTICS)] (PATH ID: 1)
 |  Order: sales.store_key ASC, sales.sale_date ASC
 |  Execute on: All Nodes
 | +---> JOIN HASH [Cost: 5K, Rows: 100K (NO STATISTICS)] (PATH ID: 2) Outer (BROADCAST)(LOCAL ROUND ROBIN)
 | |      Join Cond: (sales.product_key = products.product_key)
 | |      Execute on: All Nodes
 | | +-- Outer -> STORAGE ACCESS for sales [Cost: 2K, Rows: 100K (NO STATISTICS)] (PATH ID: 3)
 | | |      Projection: store.store_sales_b0
 | | |      Materialize: sales.sale_date, sales.store_key, sales.product_key, sales.sales_quantity
 | | |      Filter: (sales.sale_date = '2014-12-01'::date)
 | | |      Execute on: All Nodes
 | | +-- Inner -> STORAGE ACCESS for products [Cost: 177, Rows: 60K (NO STATISTICS)] (PATH ID: 4)
 | | |      Projection: public.products_b0
 | | |      Materialize: products.product_key, products.product_description
 | | |      Execute on: All Nodes

The following ALTER TABLE statement resets the force_outer setting of public.products to 1:

=> ALTER TABLE public.products FORCE OUTER 1;
ALTER TABLE

The regenerated query plan for the same join now includes a Force outer level field and specifies public.products as the inner input:

=> EXPLAIN SELECT s.store_key, p.product_description, s.sales_quantity, s.sale_date
   FROM store.store_sales s JOIN public.products p ON s.product_key=p.product_key
   WHERE s.sale_date='2014-12-01' ORDER BY s.store_key, s.sale_date;

 EnableForceOuter is on
 Access Path:
 +-SORT [Cost: 7K, Rows: 100K (NO STATISTICS)] (PATH ID: 1)
 |  Order: sales.store_key ASC, sales.sale_date ASC
 |  Execute on: All Nodes
 | +---> JOIN HASH [Cost: 5K, Rows: 100K (NO STATISTICS)] (PATH ID: 2) Outer (BROADCAST)(LOCAL ROUND ROBIN)
 | |      Join Cond: (sales.product_key = products.product_key)
 | |      Execute on: All Nodes
 | | +-- Outer -> STORAGE ACCESS for sales [Cost: 2K, Rows: 100K (NO STATISTICS)] (PATH ID: 3)
 | | |      Projection: store.store_sales_b0
 | | |      Materialize: sales.sale_date, sales.store_key, sales.product_key, sales.sales_quantity
 | | |      Filter: (sales.sale_date = '2014-12-01'::date)
 | | |      Execute on: All Nodes
 | | +-- Inner -> STORAGE ACCESS for products [Cost: 177, Rows: 60K (NO STATISTICS)] (PATH ID: 4)
 | | |      Projection: public.products_b0
 | | |      Force outer level: 1
 | | |      Materialize: products.product_key, products.product_description
 | | |      Execute on: All Nodes

If you change the force_outer setting of public.products to 8, Vertica creates a different query plan that specifies public.products as the outer input:

=> ALTER TABLE public.products FORCE OUTER 8;
ALTER TABLE

=> EXPLAIN SELECT s.store_key, p.product_description, s.sales_quantity, s.sale_date
   FROM store.store_sales s JOIN public.products p ON s.product_key=p.product_key
   WHERE s.sale_date='2014-12-01' ORDER BY s.store_key, s.sale_date;


EnableForceOuter is on
 Access Path:
 +-SORT [Cost: 7K, Rows: 100K (NO STATISTICS)] (PATH ID: 1)
 |  Order: sales.store_key ASC, sales.sale_date ASC
 |  Execute on: All Nodes
 | +---> JOIN HASH [Cost: 5K, Rows: 100K (NO STATISTICS)] (PATH ID: 2) Inner (BROADCAST)
 | |      Join Cond: (sales.product_key = products.product_key)
 | |      Materialize at Output: products.product_description
 | |      Execute on: All Nodes
 | | +-- Outer -> STORAGE ACCESS for products [Cost: 20, Rows: 60K (NO STATISTICS)] (PATH ID: 3)
 | | |      Projection: public.products_b0
 | | |      Force outer level: 8
 | | |      Materialize: products.product_key
 | | |      Execute on: All Nodes
 | | |      Runtime Filter: (SIP1(HashJoin): products.product_key)
 | | +-- Inner -> STORAGE ACCESS for sales [Cost: 2K, Rows: 100K (NO STATISTICS)] (PATH ID: 4)
 | | |      Projection: store.store_sales_b0
 | | |      Materialize: sales.sale_date, sales.store_key, sales.product_key, sales.sales_quantity
 | | |      Filter: (sales.sale_date = '2014-12-01'::date)
 | | |      Execute on: All Nodes

Restrictions

Vertica ignores force_outer settings when it performs the following operations:

  • Outer joins: Vertica generally respects OUTER JOIN clauses regardless of the force_outer settings of the joined tables.

  • MERGE statement joins.

  • Queries that include the SYNTACTIC_JOIN hint.

  • Half-join queries such as SEMI JOIN.

  • Joins to subqueries, where the subquery is always processed as having a force_outer setting of 5 regardless of the force_outer settings of the tables that are joined in the subquery. This setting determines a subquery's designation as inner or outer input relative to other join inputs. If two subqueries are joined, the optimizer determines which one is the inner input, and which one the outer.

7.6 - Range joins

Vertica provides performance optimizations for <, <=, >, >=, and BETWEEN predicates in join ON clauses.

Vertica provides performance optimizations for <, <=, >, >=, and BETWEEN predicates in join ON clauses. These optimizations are particularly useful when a column from one table is restricted to be in a range specified by two columns of another table.

Key ranges

Multiple, consecutive key values can map to the same dimension values. Consider, for example, a table of IPv4 addresses and their owners. Because large subnets (ranges) of IP addresses can belong to the same owner, this dimension can be represented as:

=> CREATE TABLE ip_owners(
      ip_start INTEGER,
      ip_end INTEGER,
      owner_id INTEGER);
=> CREATE TABLE clicks(
      ip_owners INTEGER,
      dest_ip INTEGER);

A query that associates a click stream with its destination can use a join similar to the following, which takes advantage of range optimization:

=> SELECT owner_id, COUNT(*) FROM clicks JOIN ip_owners
   ON clicks.dest_ip BETWEEN ip_start AND ip_end
   GROUP BY owner_id;

Requirements

Operators <, <=, >, >=, or BETWEEN must appear as top-level conjunctive predicates for range join optimization to be effective, as shown in the following examples:

`BETWEEN` as the only predicate:

```
=> SELECT COUNT(*) FROM fact JOIN dim
    ON fact.point BETWEEN dim.start AND dim.end;

```

Comparison operators as top-level predicates (within `AND`):

```
=> SELECT COUNT(*) FROM fact JOIN dim
    ON fact.point > dim.start AND fact.point < dim.end;

```

`BETWEEN` as a top-level predicate (within `AND`):

```
=> SELECT COUNT(*) FROM fact JOIN dim
   ON (fact.point BETWEEN dim.start AND dim.end) AND fact.c <> dim.c;

```

Query not optimized because `OR` is top-level predicate (disjunctive):

```
=> SELECT COUNT(*) FROM fact JOIN dim
   ON (fact.point BETWEEN dim.start AND dim.end) OR dim.end IS NULL;

```

Notes

  • Expressions are optimized in range join queries in many cases.

  • If range columns can have NULL values indicating that they are open-ended, it is possible to use range join optimizations by replacing nulls with very large or very small values:

    => SELECT COUNT(*) FROM fact JOIN dim
       ON fact.point BETWEEN NVL(dim.start, -1) AND NVL(dim.end, 1000000000000);
    
  • If there is more than one set of ranging predicates in the same ON clause, the order in which the predicates are specified might impact the effectiveness of the optimization:

    => SELECT COUNT(*) FROM fact JOIN dim ON fact.point1 BETWEEN dim.start1 AND dim.end1
       AND fact.point2 BETWEEN dim.start2 AND dim.end2;
    

    The optimizer chooses the first range to optimize, so write your queries so that the range you most want optimized appears first in the statement.

  • The use of the range join optimization is not directly affected by any characteristics of the physical schema; no schema tuning is required to benefit from the optimization.

  • The range join optimization can be applied to joins without any other predicates, and to HASH or MERGE joins.

  • To determine if an optimization is in use, search for RANGE in the EXPLAIN plan.

7.7 - Event series joins

An join is a Vertica SQL extension that enables the analysis of two series when their measurement intervals don’t align precisely, such as with mismatched timestamps.

An event series join is a Vertica SQL extension that enables the analysis of two series when their measurement intervals don’t align precisely, such as with mismatched timestamps. You can compare values from the two series directly, rather than having to normalize the series to the same measurement interval.

Event series joins are an extension of Outer joins, but instead of padding the non-preserved side with NULL values when there is no match, the event series join pads the non-preserved side values that it interpolates from the previous value.

The difference in how you write a regular join versus an event series join is the INTERPOLATE predicate, which is used in the ON clause. For example, the following two statements show the differences, which are shown in greater detail in Writing event series joins.

Regular full outer join Event series join
SELECT * FROM hTicks h FULL OUTER JOIN aTicks a
ON (h.time = a.time);
SELECT * FROM hTicks h FULL OUTER JOIN aTicks a 
ON (h.time INTERPOLATE PREVIOUS VALUE a.time);

Similar to regular joins, an event series join has inner and outer join modes, which are described in the topics that follow.

For full syntax, including notes and restrictions, see INTERPOLATE

7.7.1 - Sample schema for event series joins examples

If you don't plan to run the queries and just want to look at the examples, you can skip this topic and move straight to Writing Event Series Joins.

If you don't plan to run the queries and just want to look at the examples, you can skip this topic and move straight to Writing event series joins.

Schema of hTicks and aTicks tables

The examples that follow use the following hTicks and aTicks tables schemas:

CREATE TABLE hTicks (
   stock VARCHAR(20),
   time TIME,
   price NUMERIC(8,2)
);
CREATE TABLE aTicks (
   stock VARCHAR(20),
   time TIME,
   price NUMERIC(8,2)
);

Although TIMESTAMP is more commonly used for the event series column, the examples in this topic use TIME to keep the output simple.

INSERT INTO hTicks VALUES ('HPQ', '12:00', 50.00);
INSERT INTO hTicks VALUES ('HPQ', '12:01', 51.00);
INSERT INTO hTicks VALUES ('HPQ', '12:05', 51.00);
INSERT INTO hTicks VALUES ('HPQ', '12:06', 52.00);
INSERT INTO aTicks VALUES ('ACME', '12:00', 340.00);
INSERT INTO aTicks VALUES ('ACME', '12:03', 340.10);
INSERT INTO aTicks VALUES ('ACME', '12:05', 340.20);
INSERT INTO aTicks VALUES ('ACME', '12:05', 333.80);
COMMIT;

Output of the two tables:

hTicks aTicks

=> SELECT * FROM hTicks;

There are no entry records between 12:02–12:04:

 stock |   time   | price
-------+----------+-------
 HPQ   | 12:00:00 | 50.00
 HPQ   | 12:01:00 | 51.00
 HPQ   | 12:05:00 | 51.00
 HPQ   | 12:06:00 | 52.00
(4 rows)

=> SELECT * FROM aTicks;

There are no entry records at 12:01, 12:02 and at 12:04:

 stock |   time   | price
-------+----------+--------
 ACME  | 12:00:00 | 340.00
 ACME  | 12:03:00 | 340.10
 ACME  | 12:05:00 | 340.20
 ACME  | 12:05:00 | 333.80
(4 rows)

Example query showing gaps

A full outer join shows the gaps in the timestamps:

=> SELECT * FROM hTicks h FULL OUTER JOIN aTicks a ON h.time = a.time;
 stock |   time   | price | stock |   time   | price
-------+----------+-------+-------+----------+--------
 HPQ   | 12:00:00 | 50.00 | ACME  | 12:00:00 | 340.00
 HPQ   | 12:01:00 | 51.00 |       |          |
 HPQ   | 12:05:00 | 51.00 | ACME  | 12:05:00 | 333.80
 HPQ   | 12:05:00 | 51.00 | ACME  | 12:05:00 | 340.20
 HPQ   | 12:06:00 | 52.00 |       |          |
       |          |       | ACME  | 12:03:00 | 340.10
(6 rows)

Schema of bid and asks tables

The examples that follow use the following hTicks and aTicks tables.

CREATE TABLE bid(stock VARCHAR(20), time TIME, price NUMERIC(8,2));
CREATE TABLE ask(stock VARCHAR(20), time TIME, price NUMERIC(8,2));
INSERT INTO bid VALUES ('HPQ', '12:00', 100.10);
INSERT INTO bid VALUES ('HPQ', '12:01', 100.00);
INSERT INTO bid VALUES ('ACME', '12:00', 80.00);
INSERT INTO bid VALUES ('ACME', '12:03', 79.80);
INSERT INTO bid VALUES ('ACME', '12:05', 79.90);
INSERT INTO ask VALUES ('HPQ', '12:01', 101.00);
INSERT INTO ask VALUES ('ACME', '12:00', 80.00);
INSERT INTO ask VALUES ('ACME', '12:02', 75.00);
COMMIT;

Output of the two tables:

bid ask

=> SELECT * FROM bid;

There are no entry records for stocks HPQ and ACME at 12:02 and at 12:04:

 stock |   time   | price
-------+----------+--------
 HPQ   | 12:00:00 | 100.10
 HPQ   | 12:01:00 | 100.00
 ACME  | 12:00:00 |  80.00
 ACME  | 12:03:00 |  79.80
 ACME  | 12:05:00 |  79.90
(5 rows)

=> SELECT * FROM ask;

There are no entry records for stock HPQ at 12:00 and none for ACMEat 12:01:

 stock |   time   | price
-------+----------+--------
 HPQ   | 12:01:00 | 101.00
 ACME  | 12:00:00 |  80.00
 ACME  | 12:02:00 |  75.00
(3 rows)

Example query showing gaps

A full outer join shows the gaps in the timestamps:

=> SELECT * FROM bid b FULL OUTER JOIN ask a ON b.time = a.time;
 stock |   time   | price  | stock |   time   | price
-------+----------+--------+-------+----------+--------
 HPQ   | 12:00:00 | 100.10 | ACME  | 12:00:00 |  80.00
 HPQ   | 12:01:00 | 100.00 | HPQ   | 12:01:00 | 101.00
 ACME  | 12:00:00 |  80.00 | ACME  | 12:00:00 |  80.00
 ACME  | 12:03:00 |  79.80 |       |          |
 ACME  | 12:05:00 |  79.90 |       |          |
       |          |        | ACME  | 12:02:00 |  75.00
(6 rows)

7.7.2 - Writing event series joins

The examples in this topic contains mismatches between timestamps—just as you'd find in real life situations; for example, there could be a period of inactivity on stocks where no trade occurs, which can present challenges when you want to compare two stocks whose timestamps don't match.

The examples in this topic contains mismatches between timestamps—just as you'd find in real life situations; for example, there could be a period of inactivity on stocks where no trade occurs, which can present challenges when you want to compare two stocks whose timestamps don't match.

The hTicks and aTicks tables

As described in the example ticks schema, tables, hTicks is missing input rows for 12:02, 12:03, and 12:04, and aTicks is missing inputs at 12:01, 12:02, and 12:04.

hTicks aTicks
=> SELECT * FROM hTicks;
 stock |   time   | price
-------+----------+-------
 HPQ   | 12:00:00 | 50.00
 HPQ   | 12:01:00 | 51.00
 HPQ   | 12:05:00 | 51.00
 HPQ   | 12:06:00 | 52.00
(4 rows)
=> SELECT * FROM aTicks;
 stock |   time   | price
-------+----------+--------
 ACME  | 12:00:00 | 340.00
 ACME  | 12:03:00 | 340.10
 ACME  | 12:05:00 | 340.20
 ACME  | 12:05:00 | 333.80
(4 rows)

Querying event series data with full outer joins

Using a traditional full outer join, this query finds a match between tables hTicks and aTicks at 12:00 and 12:05 and pads the missing data points with NULL values.

=> SELECT * FROM hTicks h FULL OUTER JOIN aTicks a ON (h.time = a.time);
 stock |   time   | price | stock |   time   | price
-------+----------+-------+-------+----------+--------
 HPQ   | 12:00:00 | 50.00 | ACME  | 12:00:00 | 340.00
 HPQ   | 12:01:00 | 51.00 |       |          |
 HPQ   | 12:05:00 | 51.00 | ACME  | 12:05:00 | 333.80
 HPQ   | 12:05:00 | 51.00 | ACME  | 12:05:00 | 340.20
 HPQ   | 12:06:00 | 52.00 |       |          |
       |          |       | ACME  | 12:03:00 | 340.10
(6 rows)

To replace the gaps with interpolated values for those missing data points, use the INTERPOLATE predicate to create an event series join. The join condition is restricted to the ON clause, which evaluates the equality predicate on the timestamp columns from the two input tables. In other words, for each row in outer table hTicks, the ON clause predicates are evaluated for each combination of each row in the inner table aTicks.

Simply rewrite the full outer join query to use the INTERPOLATE predicate with the required PREVIOUS VALUE keywords. Note that a full outer join on event series data is the most common scenario for event series data, where you keep all rows from both tables

=> SELECT * FROM hTicks h FULL OUTER JOIN aTicks a
   ON (h.time INTERPOLATE PREVIOUS VALUE a.time);

Vertica interpolates the missing values (which appear as NULL in the full outer join) using that table's previous value:

![event series join](/images/hticks1.png)

If you review the regular full outer join output, you can see that both tables have a match in the time column at 12:00 and 12:05, but at 12:01, there is no entry record for ACME. So the operation interpolates a value for ACME (ACME,12:00,340) based on the previous value in the aTicks table.

Querying event series data with left outer joins

You can also use left and right outer joins. You might, for example, decide you want to preserve only hTicks values. So you'd write a left outer join:

=> SELECT * FROM hTicks h LEFT OUTER JOIN aTicks a
   ON (h.time INTERPOLATE PREVIOUS VALUE a.time);
 stock |   time   | price | stock |   time   | price
-------+----------+-------+-------+----------+--------
 HPQ   | 12:00:00 | 50.00 | ACME  | 12:00:00 | 340.00
 HPQ   | 12:01:00 | 51.00 | ACME  | 12:00:00 | 340.00
 HPQ   | 12:05:00 | 51.00 | ACME  | 12:05:00 | 333.80
 HPQ   | 12:05:00 | 51.00 | ACME  | 12:05:00 | 340.20
 HPQ   | 12:06:00 | 52.00 | ACME  | 12:05:00 | 340.20
(5 rows)

Here's what the same data looks like using a traditional left outer join:

=> SELECT * FROM hTicks h LEFT OUTER JOIN aTicks a ON h.time = a.time;
 stock |   time   | price | stock |   time   | price
-------+----------+-------+-------+----------+--------
 HPQ   | 12:00:00 | 50.00 | ACME  | 12:00:00 | 340.00
 HPQ   | 12:01:00 | 51.00 |       |          |
 HPQ   | 12:05:00 | 51.00 | ACME  | 12:05:00 | 333.80
 HPQ   | 12:05:00 | 51.00 | ACME  | 12:05:00 | 340.20
 HPQ   | 12:06:00 | 52.00 |       |          |
(5 rows)

Note that a right outer join has the same behavior with the preserved table reversed.

Querying event series data with inner joins

Note that INNER event series joins behave the same way as normal ANSI SQL-99 joins, where all gaps are omitted. Thus, there is nothing to interpolate, and the following two queries are equivalent and return the same result set:

A regular inner join:

=> SELECT * FROM HTicks h JOIN aTicks a
    ON (h.time INTERPOLATE PREVIOUS VALUE a.time);
 stock |   time   | price | stock |   time   | price
-------+----------+-------+-------+----------+--------
 HPQ   | 12:00:00 | 50.00 | ACME  | 12:00:00 | 340.00
 HPQ   | 12:05:00 | 51.00 | ACME  | 12:05:00 | 333.80
 HPQ   | 12:05:00 | 51.00 | ACME  | 12:05:00 | 340.20
(3 rows)

An event series inner join:

=> SELECT * FROM hTicks h INNER JOIN aTicks a ON (h.time = a.time);
 stock |   time   | price | stock |   time   | price
-------+----------+-------+-------+----------+--------
 HPQ   | 12:00:00 | 50.00 | ACME  | 12:00:00 | 340.00
 HPQ   | 12:05:00 | 51.00 | ACME  | 12:05:00 | 333.80
 HPQ   | 12:05:00 | 51.00 | ACME  | 12:05:00 | 340.20
(3 rows)

The bid and ask tables

Using the example schema for the bid and ask tables, write a full outer join to interpolate the missing data points:

=> SELECT * FROM bid b FULL OUTER JOIN ask a
    ON (b.stock = a.stock AND b.time INTERPOLATE PREVIOUS VALUE a.time);

In the below output, the first row for stock HPQ shows nulls because there is no entry record for HPQ before 12:01.

 stock |   time   | price  | stock |   time   | price
-------+----------+--------+-------+----------+--------
 ACME  | 12:00:00 |  80.00 | ACME  | 12:00:00 |  80.00
 ACME  | 12:00:00 |  80.00 | ACME  | 12:02:00 |  75.00
 ACME  | 12:03:00 |  79.80 | ACME  | 12:02:00 |  75.00
 ACME  | 12:05:00 |  79.90 | ACME  | 12:02:00 |  75.00
 HPQ   | 12:00:00 | 100.10 |       |          |
 HPQ   | 12:01:00 | 100.00 | HPQ   | 12:01:00 | 101.00
(6 rows)

Note also that the same row (ACME,12:02,75) from the ask table appears three times. The first appearance is because no matching rows are present in the bid table for the row in ask, so Vertica interpolates the missing value using the ACME value at 12:02 (75.00). The second appearance occurs because the row in bid (ACME,12:05,79.9) has no matches in ask. The row from ask that contains (ACME,12:02,75) is the closest row; thus, it is used to interpolate the values.

If you write a regular full outer join, you can see where the mismatched timestamps occur:

=> SELECT * FROM bid b FULL OUTER JOIN ask a ON (b.time = a.time);
 stock |   time   | price  | stock |   time   | price
-------+----------+--------+-------+----------+--------
 ACME  | 12:00:00 |  80.00 | ACME  | 12:00:00 |  80.00
 ACME  | 12:03:00 |  79.80 |       |          |
 ACME  | 12:05:00 |  79.90 |       |          |
 HPQ   | 12:00:00 | 100.10 | ACME  | 12:00:00 |  80.00
 HPQ   | 12:01:00 | 100.00 | HPQ   | 12:01:00 | 101.00
       |          |        | ACME  | 12:02:00 |  75.00
(6 rows)