This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

SELECT

Returns a result set from one or more data sources—tables, views, joined tables, and named subqueries.

Returns a result set from one or more data sources—tables, views, joined tables, and named subqueries.

Syntax

[ AT epoch ] [ WITH-clause ] SELECT [ ALL | DISTINCT ]
    { * | { MATCH_COLUMNS('pattern') | expression [ [AS] alias ] }[,...] }
    [ into-table-clause ]
    [ from-clause ]
    [ where-clause ]
    [ time-series-clause ]
    [ group-by-clause[,...] ]
    [ having-clause[,...] ]
    [ match-clause ]
    [ union-clause ]
    [ intersect-clause ]
    [ except-clause ]
    [ order-by-clause [ offset-clause ]]
    [ limit-clause ]
    [ FOR UPDATE [ OF table-name[,...] ] ]

Parameters

AT epoch
Returns data from the specified epoch, where epoch is one of the following:
  • EPOCH LATEST: Return data up to but not including the current epoch. The result set includes data from the latest committed DML transaction.

  • EPOCH integer: Return data up to and including the integer-specified epoch.

  • TIME 'timestamp': Return data from the timestamp-specified epoch.

See Epochs for additional information about how Vertica uses epochs.

For details, see Historical queries.

ALL | DISTINCT
  • ALL (default): Retains duplicate rows in result set or group.

  • DISTINCT: Removes duplicate rows from the result set or group.

The ALL or DISTINCT qualifier must immediately follow the SELECT keyword. Only one instance of this keyword can appear in the select list.
*
Lists all columns in the queried tables.
MATCH_COLUMNS('pattern')
Returns all columns in the queried tables that match pattern.
expression [[AS] alias]
An expression that typically resolves to column data from the queried tables—for example, names of columns that are specified in the FROM clause; also:

You can optionally assign a temporary alias to each column expression and reference that alias elsewhere in the SELECT statement—for example, in the query predicate or ORDER BY clause. Vertica uses the alias as the column heading in query output.

FOR UPDATE
Specifies to obtain an X lock on all tables specified in the query, most often used from READ COMMITTED isolation.

FOR UPDATE requires update/delete permissions on the queried tables and cannot be issued from a read-only transaction.

Privileges

Non-superusers:

  • USAGE on the schema

  • SELECT on the table or view

Examples

When multiple clients run transactions as in the following example query, deadlocks can occur if FOR UPDATE is not used. Two transactions acquire an S lock, and when both attempt to upgrade to an X lock, they encounter deadlocks:

=> SELECT balance FROM accounts WHERE account_id=3476 FOR UPDATE;
    ...
=> UPDATE accounts SET balance = balance+10 WHERE account_id=3476;
=> COMMIT;

See also

1 - EXCEPT clause

Combines two or more SELECT queries.

Combines two or more SELECT queries. EXCEPT returns distinct results of the left-hand query that are not also found in the right-hand query.

Syntax

SELECT
    EXCEPT except-query[...]
    [ ORDER BY { column-name  | ordinal-number } [ ASC | DESC ] [,...] ]
    [ LIMIT { integer | ALL } ]
    [ OFFSET integer ]

Notes

  • Use the EXCEPT clause to filter out specific results from a SELECT statement. The EXCEPT query operates on the results of two or more SELECT queries. It returns only those rows in the left-hand query that are not also present in the right-hand query.

  • Vertica evaluates multiple EXCEPT clauses in the same SELECT query from left to right, unless parentheses indicate otherwise.

  • You cannot use the ALL keyword with an EXCEPT query.

  • The results of each SELECT statement must be union compatible. Each statement must return the same number of columns, and the corresponding columns must have compatible data types. For example, you cannot use the EXCEPT clause on a column of type INTEGER and a column of type VARCHAR. If statements do not meet these criteria, Vertica returns an error.

  • You can use EXCEPT in FROM, WHERE, and HAVING clauses.

  • You can order the results of an EXCEPT operation by including an ORDER BY operation in the statement. When you write the ORDER BY list, specify the column names from the leftmost SELECT statement, or specify integers that indicate the position of the columns by which to sort.

  • The rightmost ORDER BY, LIMIT, or OFFSET clauses in an EXCEPT query do not need to be enclosed in parentheses, because the rightmost query specifies that Vertica perform the operation on the results of the EXCEPT operation. Any ORDER BY, LIMIT, or OFFSET clauses contained in SELECT queries that appear earlier in the EXCEPT query must be enclosed in parentheses.

  • Vertica supports EXCEPT noncorrelated subquery predicates. For example:

    => SELECT * FROM T1
       WHERE T1.x IN
          (SELECT MAX(c1) FROM T2
           EXCEPT
              SELECT MAX(cc1) FROM T3
           EXCEPT 
              SELECT MAX(d1) FROM T4);
    

Examples

Consider the following three tables:

Company_A

  Id  | emp_lname |      dept      | sales
------+-----------+----------------+-------
 1234 | Stephen   | auto parts     |  1000
 5678 | Alice     | auto parts     |  2500
 9012 | Katherine | floral         |   500
 3214 | Smithson  | sporting goods |  1500
(4 rows)

Company_B

  Id  | emp_lname |    dept     | sales
------+-----------+-------------+-------
 4321 | Marvin    | home goods  |   250
 8765 | Bob       | electronics | 20000
 9012 | Katherine | home goods  |   500
 3214 | Smithson  | home goods  |  1500
(4 rows)

Company_C

  Id  | emp_lname |      dept      | sales
------+-----------+----------------+-------
 3214 | Smithson  | sporting goods |  1500
 5432 | Madison   | sporting goods |   400
 7865 | Cleveland | outdoor        |  1500
 1234 | Stephen   | floral         |  1000
(4 rows)

The following query returns the IDs and last names of employees that exist in Company_A, but not in Company_B:

=> SELECT id, emp_lname FROM Company_A
   EXCEPT
   SELECT id, emp_lname FROM Company_B;
  id  | emp_lname
------+-----------
 1234 | Stephen
 5678 | Alice
(2 rows)

The following query sorts the results of the previous query by employee last name:

=> SELECT id, emp_lname FROM Company_A
   EXCEPT
   SELECT id, emp_lname FROM Company_B
   ORDER BY emp_lname ASC;
  id  | emp_lname
------+-----------
 5678 | Alice
 1234 | Stephen
(2 rows)

If you order by the column position, the query returns the same results:

=> SELECT id, emp_lname FROM Company_A
   EXCEPT
   SELECT id, emp_lname FROM Company_B
   ORDER BY 2 ASC;
  id  | emp_lname
------+-----------
 5678 | Alice
 1234 | Stephen
(2 rows)

The following query returns the IDs and last names of employees that exist in Company_A, but not in Company_B or Company_C:

=> SELECT id, emp_lname FROM Company_A
   EXCEPT
   SELECT id, emp_lname FROM Company_B
   EXCEPT
   SELECT id, emp_lname FROM Company_C;
  id  | emp_lname
------+-----------
 5678 | Alice
(1 row)

The following query shows the results of mismatched data types:

=> SELECT id, emp_lname FROM Company_A
   EXCEPT
   SELECT emp_lname, id FROM Company_B;
ERROR 3429:  For 'EXCEPT', types int and varchar are inconsistent
DETAIL:  Columns: id and emp_lname

Using the VMart example database, the following query returns information about all Connecticut-based customers who bought items through stores and whose purchases amounted to more than $500, except for those customers who paid cash:

=> SELECT customer_key, customer_name FROM public.customer_dimension
      WHERE customer_key IN (SELECT customer_key FROM store.store_sales_fact
         WHERE sales_dollar_amount > 500
         EXCEPT
         SELECT customer_key FROM store.store_sales_fact
         WHERE tender_type = 'Cash')
      AND customer_state = 'CT';
 customer_key |    customer_name
--------------+----------------------
        15084 | Doug V. Lampert
        21730 | Juanita F. Peterson
        24412 | Mary U. Garnett
        25840 | Ben Z. Taylor
        29940 | Brian B. Dobisz
        32225 | Ruth T. McNulty
        33127 | Darlene Y. Rodriguez
        40000 | Steve L. Lewis
        44383 | Amy G. Jones
        46495 | Kevin H. Taylor
(10 rows)

See also

2 - FROM clause

A comma-separated list of data sources to query.

A comma-separated list of data sources to query.

Syntax

FROM dataset[,...] [ TABLESAMPLE(percent) ]

Parameters

dataset``
A set of data to query, one of the following:
`TABLESAMPLE(percent)`
Specifies to return a random sampling of records, where percent specifies the approximate sampling size. The percent value must be between 0 and 100, exclusive, and can include decimal values. The number of records returned is not guaranteed to be the exact percentage specified.

All rows of the data have equal opportunities to be selected. Vertica performs sampling before applying other query filters.

Examples

Count all records in customer_dimension table:

=> SELECT COUNT(*) FROM customer_dimension;
 COUNT
-------
 50000
(1 row)

Return a small sampling of rows in table customer_dimension:

=> SELECT customer_name, customer_state FROM customer_dimension TABLESAMPLE(0.5) WHERE customer_state='IL';
    customer_name    | customer_state
---------------------+----------------
 Amy Y. McNulty      | IL
 Daniel C. Nguyen    | IL
 Midori O. Greenwood | IL
 Meghan U. Lampert   | IL
 Tiffany Y. Lang     | IL
 Laura S. King       | IL
 Steve T. Nguyen     | IL
 Craig S. Webber     | IL
 Luigi A. Lewis      | IL
 Mark W. Williams    | IL
(10 rows)

2.1 - Joined-table

Specifies how to join tables.

Specifies how to join tables.

Syntax

table-reference [ join-type ] JOIN table-reference [ TABLESAMPLE(percent) ] [ ON join-predicate ]

Arguments

table-reference
A table name, optionally qualified.
join-type
One of the following:
TABLESAMPLE(percent)
Use simple random sampling to return an approximate percentage of records. The percentage value must be greater than 0 and less than 100. All rows in the total potential return set are equally eligible to be included in the sampling. Vertica performs this sampling before other filters in the query are applied. The number of records returned is not guaranteed to be exactly percent.

The TABLESAMPLE option is valid only with user-defined tables and Data Collector (DC) tables. Views and system tables are not supported.

ON join-predicate
Specifies the columns to join on. Invalid for NATURAL and CROSS joins, required for all other join types.

Alternative JOIN syntax options

Vertica supports two older join syntax conventions:

  • Table joins specified by join predicate in a WHERE clause

  • Table joins specified by a USING clause

For details, see Join Syntax.

Examples

The following SELECT statement qualifies its JOIN clause with the TABLESAMPLE option:

=> SELECT user_id.id, user_name.name FROM user_name TABLESAMPLE(50)
     JOIN user_id TABLESAMPLE(50) ON user_name.id = user_id.id;
  id  |  name
------+--------
  489 | Markus
 2234 | Cato
  763 | Pompey
(3 rows)

2.2 - Table-reference

A temporary name used for references to table.

Syntax

[[database.]schema.]table[ [AS] alias]

Parameters

[database.]schema

Database and schema. The default schema is public. If you specify a database, it must be the current database.

table
A table in the logical schema.
[AS] alias
A temporary name used for references to table.

3 - GROUP BY clause

Use the GROUP BY clause with aggregate functions in a SELECT statement to collect data across multiple records.

Use the GROUP BY clause with aggregate functions in a SELECT statement to collect data across multiple records. Vertica groups the results into one or more sets of rows that match an expression.

The GROUP BY clause without aggregates is similar to using SELECT DISTINCT.

ROLLUP is an extension to the GROUP BY clause. ROLLUP performs subtotal aggregations.

Syntax

GROUP BY [/*+GBYTYPE(algorithm)*/] { expression | aggregate-expression }[,...]

Arguments

/*+GBYTYPE(algorithm)*/
Specifies which algorithm has precedence for implementing this GROUP BY clause, over the algorithm the Vertica query optimizer might otherwise choose. You can set algorithm to one of the following values:
  • HASH: GROUPBY HASH algorithm

  • PIPE: GROUPBY PIPELINED algorithm

For more information about both algorithms, see GROUP BY implementation options.

expression
Any expression, including constants and column references in the tables specified in the FROM clause. For example:
column,... column, (expression)
aggregate-expression
An ordered list of columns, expressions, CUBE, GROUPING SETS, or ROLLUP aggregates.

You can include CUBE and ROLLUP aggregates within a GROUPING SETS aggregate. CUBE and ROLLUP aggregates can result in a large amount of output. In that case, use GROUPING SETS to return only certain results.

You cannot include any aggregates within a CUBE or ROLLUP expression.

You can append multiple GROUPING SETS, CUBE, or ROLLUP aggregates in the same query. For example:

  
GROUP BY a,b,c,d, ROLLUP(a,b)
GROUP BY a,b,c,d, CUBE((a,b),c,d)
GROUP BY a,b,c,d, CUBE(a,b), ROLLUP (c,d)
GROUP BY ROLLUP(a), CUBE(b), GROUPING SETS(c)
GROUP BY a,b,c,d, GROUPING SETS ((a,d),(b,c),CUBE(a,b))
GROUP BY a,b,c,d, GROUPING SETS ((a,d),(b,c),(a,b),(a),(b),())

Usage considerations

  • expression cannot include aggregate functions. However, you can use the GROUP BY clause with CUBE, GROUPING SETS, and ROLLUP to return summary values for each group.

  • When you create a GROUP BY clause, you must include all non-aggregated columns that appear in the SELECT list.

  • If the GROUP BY clause includes a WHERE clause, Vertica ignores all rows that do not satisfy the WHERE clause.

Examples

This example shows how to use the WHERE clause with GROUP BY. In this case, the example retrieves all employees whose last name begins with S, and ignores all rows that do not meet this criteria. The GROUP BY clause uses the ILIKE function to retrieve only last names beginning with S. The aggregate function SUM computes the total vacation days for each group.

=> SELECT employee_last_name, SUM(vacation_days)
   FROM employee_dimension
   WHERE employee_last_name ILIKE 'S%'
   GROUP BY employee_last_name;
 employee_last_name | SUM
--------------------+------
 Sanchez            | 2892
 Smith              | 2672
 Stein              | 2660
(3 rows)

The GROUP BY clause in the following example groups results by vendor region, and vendor region's biggest deal:


=> SELECT vendor_region, MAX(deal_size) AS "Biggest Deal"
   FROM vendor_dimension
   GROUP BY vendor_region;
 vendor_region | Biggest Deal
---------------+--------------
 East          |       990889
 MidWest       |       699163
 NorthWest     |        76101
 South         |       854136
 SouthWest     |       609807
 West          |       964005
(6 rows)

The following query modifies the previous one with a HAVING clause, which specifies to return only groups whose maximum deal size exceeds $900,000:

=> SELECT vendor_region, MAX(deal_size) as "Biggest Deal"
   FROM vendor_dimension
   GROUP BY vendor_region
   HAVING MAX(deal_size) > 900000;
 vendor_region | Biggest Deal
---------------+--------------
 East          |       990889
 West          |       964005
(2 rows)

You can use the GROUP BY clause with one-dimensional arrays of scalar types. In the following example, grants is an ARRAY[VARCHAR] and grant_values is an ARRAY[INT].

=> SELECT department, grants, SUM(apply_sum(grant_values))
   FROM employees
   GROUP BY grants, department;
 department |          grants          |  SUM
------------+--------------------------+--------
 Physics    | ["US-7376","DARPA-1567"] | 235000
 Astronomy  | ["US-7376","DARPA-1567"] |   9000
 Physics    | ["US-7376"]              |  30000
(3 rows)

The GROUP BY clause without aggregates is similar to using SELECT DISTINCT. For example, the following two queries return the same results:

=> SELECT DISTINCT household_id FROM customer_dimension;
=> SELECT household_id FROM customer_dimension GROUP BY household_id;

See also

3.1 - CUBE aggregate

Automatically performs all possible aggregations of the specified columns, as an extension to the GROUP BY clause.

Automatically performs all possible aggregations of the specified columns, as an extension to the GROUP BY clause.

You can use the ROLLUP clause with three grouping functions:

Syntax

GROUP BY group-expression[,...]

Parameters

group-expression``
One or both of the following:
  • An expression that is not an aggregate or a grouping function that includes constants and column references in FROM-specified tables. For example:

    column1, (column2+1), column3+column4
    
  • A multilevel expression, one of the following:

    • ROLLUP

    • CUBE

    • GROUPING SETS

Restrictions

  • GROUP BY CUBE does not order data. If you want to sort data, use the ORDER BY clause. The ORDER BY clause must come after the GROUP BY clause.
  • You can use CUBE inside a GROUPING SETS expression, but not inside a ROLLUP expression or another CUBE expression.

Levels of CUBE aggregation

If n is the number of grouping columns, CUBE creates 2n levels of aggregations. For example:

CUBE (A, B, C) creates all possible groupings, resulting in eight groups:

  • (A, B, C)
  • (A, B)
  • (A, C)
  • (B, C)
  • (A)
  • (B)
  • (C)
  • ()

If you increase the number of CUBE columns, the number of CUBE groupings increases exponentially. The CUBE query may be resource intensive and produce combinations that are not of interest. In that case, consider using theGROUPING SETS aggregate, which allows you to choose specific groupings.

Examples

Using CUBE to return all groupings

Suppose you have a table that contains information about family expenses for books and electricity:

=> SELECT * FROM expenses ORDER BY Category, Year;
 Year |  Category   | Amount
------+-------------+--------
2005  | Books       |  39.98
2007  | Books       |  29.99
2008  | Books       |  29.99
2005  | Electricity | 109.99
2006  | Electricity | 109.99
2007  | Electricity | 229.98

To aggregate the data by both Category and Year using the CUBE aggregate:

=> SELECT Category, Year, SUM(Amount) FROM expenses
   GROUP BY CUBE(Category, Year) ORDER BY 1, 2, GROUPING_ID();
 Category    | Year |  SUM
-------------+------+--------
 Books       | 2005 |  39.98
 Books       | 2007 |  29.99
 Books       | 2008 |  29.99
 Books       |      |  99.96
 Electricity | 2005 | 109.99
 Electricity | 2006 | 109.99
 Electricity | 2007 | 229.98
 Electricity |      | 449.96
             | 2005 | 149.97
             | 2006 | 109.99
             | 2007 | 259.97
             | 2008 |  29.99
             |      | 549.92

The results include subtotals for each category and year, and a grand total ($549.92).

Using CUBE with the HAVING clause

This example shows how you can restrict the GROUP BY results, use the HAVING clause with the CUBE aggregate. This query returns only the category totals and the full total:

=> SELECT Category, Year, SUM(Amount) FROM expenses
   GROUP BY CUBE(Category,Year) HAVING GROUPING(Year)=1;
 Category    | Year |  SUM
-------------+------+--------
 Books       |      |  99.96
 Electricity |      | 449.96
             |      | 549.92

The next query returns only the aggregations for the two categories for each year. The GROUPING ID function specifies to omit the grand total ($549.92):

=> SELECT Category, Year, SUM (Amount) FROM expenses
   GROUP BY CUBE(Category,Year) HAVING GROUPING_ID(Category,Year)<2
   ORDER BY 1, 2, GROUPING_ID();
 Category   | Year |  SUM
------------+------+--------
 Books      | 2005 |  39.98
 Books      | 2007 |  29.99
 Books      | 2008 |  29.99
 Books      |      |  99.96
 Electrical | 2005 | 109.99
 Electrical | 2006 | 109.99
 Electrical | 2007 | 229.98
 Electrical |      | 449.96

See also

3.2 - GROUPING SETS aggregate

The GROUPING SETS aggregate is an extension to the GROUP BY clause that automatically performs subtotal aggregations on groupings that you specify.

The GROUPING SETS aggregate is an extension to the GROUP BY clause that automatically performs subtotal aggregations on groupings that you specify.

You can use the GROUPING SETS clause with three grouping functions:

To sort data, use the ORDER BY clause. The ORDER BY clause must follow the GROUP BY clause.

Syntax

GROUP BY group-expression[,...]

Parameters

group-expression``
One or both of the following:
  • An expression that is not an aggregate or a grouping function that includes constants and column references in FROM-specified tables. For example:

    column1, (column2+1), column3+column4

  • A multilevel expression, one of the following:

    • ROLLUP

    • CUBE

    • GROUPING SETS

Defining the groupings

GROUPING SETS allows you to specify exactly which groupings you want in the results. You can also concatenate the groupings as follows:

The following example clauses result in the groupings shown.

This clause... Defines groupings...
...GROUP BY GROUPING SETS(A,B,C,D)... (A), (B), (C), (D)
...GROUP BY GROUPING SETS((A),(B),(C),(D))... (A), (B), (C), (D)
...GROUP BY GROUPING SETS((A,B,C,D))... (A, B, C, D)
...GROUP BY GROUPING SETS(A,B),GROUPING SETS(C,D)... (A, C), (B, C), (A, D), (B, C)
...GROUP BY GROUPING SETS((A,B)),GROUPING SETS(C,D)... (A, B, C), (A, B, D)
...GROUP BY GROUPING SETS(A,B),GROUPING SETS(ROLLUP(C,D))... (A,B), (A,B,C), (A,B,C,D)
...GROUP BY A,B,C,GROUPING SETS(ROLLUP(C, D))...

(A, B, C, D), (A, B, C), (A, B, C)

The clause contains two groups (A, B, C). In the HAVING clause, use the GROUP_ID function as a predicate, to eliminate the second grouping.

Example: selecting groupings

This example shows how to select only those groupings you want. Suppose you want to aggregate on columns only, and you do not need the grand total. The first query omits the total. In the second query, you add () to the GROUPING SETS list to get the total. Use the ORDER BY clause to sort the results by grouping:

=> SELECT Category, Year, SUM(Amount) FROM expenses
   GROUP BY GROUPING SETS((Category, Year), (Year))
   ORDER BY 1, 2, GROUPING_ID();
  Category  | Year |  SUM
------------+------+--------
 Books      | 2005 |  39.98
 Books      | 2007 |  29.99
 Books      | 2008 |  29.99
 Electrical | 2005 | 109.99
 Electrical | 2006 | 109.99
 Electrical | 2007 | 229.98
            | 2005 | 149.97
            | 2006 | 109.99
            | 2007 | 259.97
            | 2008 |  29.99
=> SELECT Category, Year, SUM(Amount) FROM expenses
   GROUP BY GROUPING SETS((Category, Year), (Year), ())
   ORDER BY 1, 2, GROUPING_ID();
  Category  | Year |  SUM
------------+------+--------
 Books      | 2005 |  39.98
 Books      | 2007 |  29.99
 Books      | 2008 |  29.99
 Electrical | 2005 | 109.99
 Electrical | 2006 | 109.99
 Electrical | 2007 | 229.98
            | 2005 | 149.97
            | 2006 | 109.99
            | 2007 | 259.97
            | 2008 |  29.99
            |      | 549.92

See also

3.3 - ROLLUP aggregate

Automatically performs subtotal aggregations as an extension to the GROUP BY clause.

Automatically performs subtotal aggregations as an extension to the GROUP BY clause. ROLLUP performs these aggregations across multiple dimensions, at different levels, within a single SQL query.

You can use the ROLLUP clause with three grouping functions:

Syntax

ROLLUP grouping-expression[,...]

Parameters

group-expression
One or both of the following:
  • An expression that is not an aggregate or a grouping function that includes constants and column references in FROM-specified tables. For example:

    column1, (column2+1), column3+column4

  • A multilevel expression, one of the following:

    • ROLLUP

    • CUBE

    • GROUPING SETS

Restrictions

GROUP BY ROLLUP does not sort results. To sort data, an ORDER BY clause must follow the GROUP BY clause.

Levels of aggregation

If n is the number of grouping columns, ROLLUP creates n+1 levels of subtotals and grand total. Because ROLLUP removes the right-most column at each step, specify column order carefully.

Suppose that ROLLUP(A, B, C) creates four groups:

  • (A, B, C)

  • (A, B)

  • (A)

  • ()

Because ROLLUP removes the right-most column at each step, there are no groups for (A, C) and (B, C).

If you enclose two or more columns in parentheses, GROUP BY treats them as a single entity. For example:

  • ROLLUP(A, B, C) creates four groups:

    
    (A, B, C)
    (A, B)
    (A)
    ()
    
  • ROLLUP((A, B), C) treats (A, B) as a single entity and creates three groups:

    (A, B, C)
    (A, B)
    ()
    

Example: aggregating the full data set

The following example shows how to use the GROUP BY clause to determine family expenses for electricity and books over several years. The SUM aggregate function computes the total amount of money spent in each category per year.

Suppose you have a table that contains information about family expenses for books and electricity:

=> SELECT * FROM expenses ORDER BY Category, Year;
 Year |  Category   | Amount
------+-------------+--------
2005  | Books       |  39.98
2007  | Books       |  29.99
2008  | Books       |  29.99
2005  | Electricity | 109.99
2006  | Electricity | 109.99
2007  | Electricity | 229.98

For the expenses table, ROLLUP computes the subtotals in each category between 2005–2007:

  • Books: $99.96

  • Electricity: $449.96

  • Grand total: $549.92.

Use the ORDER BY clause to sort the results:

=> SELECT Category, Year, SUM(Amount) FROM expenses
   GROUP BY ROLLUP(Category, Year) ORDER BY 1,2, GROUPING_ID();
 Category    | Year |  SUM
-------------+------+--------
 Books       | 2005 |  39.98
 Books       | 2007 |  29.99
 Books       | 2008 |  29.99
 Books       |      |  99.96
 Electricity | 2005 | 109.99
 Electricity | 2006 | 109.99
 Electricity | 2007 | 229.98
 Electricity |      | 449.96
             |      | 549.92

Example: using ROLLUP with the HAVING clause

This example shows how to use the HAVING clause with ROLLUP to restrict the GROUP BY results. The following query produces only those ROLLUP categories where year is subtotaled, based on the expression in the GROUPING function:

=> SELECT Category, Year, SUM(Amount) FROM expenses
   GROUP BY ROLLUP(Category,Year) HAVING GROUPING(Year)=1
   ORDER BY 1, 2, GROUPING_ID();
 Category    | Year |  SUM
-------------+------+--------
 Books       |      |  99.96
 Electricity |      | 449.96
             |      | 549.92

The next example rolls up on (Category, Year), but not on the full results. The GROUPING_ID function specifies to aggregate less than three levels:

=> SELECT Category, Year, SUM(Amount) FROM expenses
   GROUP BY ROLLUP(Category,Year) HAVING GROUPING_ID(Category,Year)<3
   ORDER BY 1, 2, GROUPING_ID();
 Category    | Year |  SUM
-------------+------+--------
 Books       | 2005 |  39.98
 Books       | 2007 |  29.99
 Books       | 2008 |  29.99
 Books       |      |  99.96
 Electricity | 2005 | 109.99
 Electricity | 2006 | 109.99
 Electricity | 2007 | 229.98
 Electricity |      | 449.96

See also

4 - HAVING clause

Filters the results of a GROUP BY clause.

Filters the results of a GROUP BY clause. Semantically, the HAVING clause occurs after the GROUP BY operation. It was added to the SQL standard because a WHERE clause cannot specify aggregate functions.

Syntax

HAVING condition[,...]

Parameters

condition
Unambiguously references a grouping column, unless the reference appears in an aggregate function.

Examples

The following example returns the employees with salaries greater than $800,000:


=> SELECT employee_last_name, MAX(annual_salary) as highest_salary FROM employee_dimension
     GROUP BY employee_last_name HAVING MAX(annual_salary) > 800000 ORDER BY highest_salary DESC;
 employee_last_name | highest_salary
--------------------+----------------
 Sanchez            |         992363
 Vogel              |         983634
 Vu                 |         977716
 Lewis              |         957949
 Taylor             |         953373
 King               |         937765
 Gauthier           |         927335
 Garnett            |         903104
 Bauer              |         901181
 Jones              |         885395
 Rodriguez          |         861647
 Young              |         846657
 Greenwood          |         837543
 Overstreet         |         831317
 Garcia             |         811231
(15 rows)

5 - INTERSECT clause

Calculates the intersection of the results of two or more SELECT queries.

Calculates the intersection of the results of two or more SELECT queries. INTERSECT returns distinct values by both the query on the left and right sides of the INTERSECT operand.

Syntax

select-stmt
    INTERSECT query[...]
    [ order-by-clause  [ offset-clause ]]
    [ limit-clause ]

Notes

  • Use the INTERSECT clause to return all elements that are common to the results of all the SELECT queries. The INTERSECT query operates on the results of two or more SELECT queries. INTERSECT returns only the rows that are returned by all the specified queries.

  • You cannot use the ALL keyword with an INTERSECT query.

  • The results of each SELECT query must be union compatible; they must return the same number of columns, and the corresponding columns must have compatible data types. For example, you cannot use the INTERSECT clause on a column of type INTEGER and a column of type VARCHAR. If the SELECT queries do not meet these criteria, Vertica returns an error.

  • Order the results of an INTERSECT operation by using an ORDER BY clause. In the ORDER BY list, specify the column names from the leftmost SELECT statement or specify integers that indicate the position of the columns by which to sort.

  • You can use INTERSECT in FROM, WHERE, and HAVING clauses.

  • The rightmost ORDER BY, LIMIT, or OFFSET clauses in an INTERSECT query do not need to be enclosed in parentheses because the rightmost query specifies that Vertica perform the operation on the results of the INTERSECT operation. Any ORDER BY, LIMIT, or OFFSET clauses contained in SELECT queries that appear earlier in the INTERSECT query must be enclosed in parentheses.

  • The order by column names is from the first select.

  • Vertica supports INTERSECT noncorrelated subquery predicates. For example:

    => SELECT * FROM T1
       WHERE T1.x IN
          (SELECT MAX(c1) FROM T2
           INTERSECT 
              SELECT MAX(cc1) FROM T3
           INTERSECT 
              SELECT MAX(d1) FROM T4);
    

Examples

Consider the following three tables:

Company_A

id       emp_lname     dept          sales
------+------------+----------------+-------
1234  | Stephen    | auto parts     | 1000
5678  | Alice      | auto parts     | 2500
9012  | Katherine  | floral         |  500
3214  | Smithson   | sporting goods | 1500

Company_B

id       emp_lname     dept        sales
------+------------+-------------+-------
4321  | Marvin     | home goods  |   250
9012  | Katherine  | home goods  |   500
8765  | Bob        | electronics | 20000
3214  | Smithson   | home goods  |  1500

Company_C

  id  | emp_lname |      dept      | sales
------+-----------+----------------+-------
 3214 | Smithson  | sporting goods |  1500
 5432 | Madison   | sporting goods |   400
 7865 | Cleveland | outdoor        |  1500
 1234 | Stephen   | floral         |  1000

The following query returns the IDs and last names of employees that exist in both Company_A and Company_B:

=> SELECT id, emp_lname FROM Company_A
   INTERSECT
   SELECT id, emp_lname FROM Company_B;
 id   | emp_lname
------+-----------
 3214 | Smithson
 9012 | Katherine
(2 rows)

The following query returns the same two employees in descending order of sales:

=> SELECT id, emp_lname, sales FROM Company_A
   INTERSECT
   SELECT id, emp_lname, sales FROM Company_B
   ORDER BY sales DESC;
  id  | emp_lname | sales
------+-----------+-------
 3214 | Smithson  |  1500
 9012 | Katherine |   500
(2 rows)

The following query returns the employee who works for both companies whose sales in Company_B are greater than 1000:

=> SELECT id, emp_lname, sales FROM Company_A
   INTERSECT
   (SELECT id, emp_lname, sales FROM company_B WHERE sales > 1000)
   ORDER BY sales DESC;
  id  | emp_lname | sales
------+-----------+-------
 3214 | Smithson  |  1500
(1 row)

In the following query returns the ID and last name of the employee who works for all three companies:

=> SELECT id, emp_lname FROM Company_A
    INTERSECT
   SELECT id, emp_lname FROM Company_B
   INTERSECT
   SELECT id, emp_lname FROM Company_C;
  id  | emp_lname
------+-----------
 3214 | Smithson
(1 row)

The following query shows the results of a mismatched data types; these two queries are not union compatible:

=> SELECT id, emp_lname FROM Company_A
    INTERSECT
   SELECT emp_lname, id FROM Company_B;
ERROR 3429:  For 'INTERSECT', types int and varchar are inconsistent
DETAIL:  Columns: id and emp_lname

Using the VMart example database, the following query returns information about all Connecticut-based customers who bought items online and whose purchase amounts were between $400 and $500:

=> SELECT customer_key, customer_name from public.customer_dimension
       WHERE customer_key IN (SELECT customer_key
         FROM online_sales.online_sales_fact
         WHERE sales_dollar_amount > 400
         INTERSECT
         SELECT customer_key FROM online_sales.online_sales_fact
         WHERE sales_dollar_amount > 500)
      AND customer_state = 'CT' ORDER BY customer_key;
 customer_key |     customer_name
--------------+------------------------
           39 | Sarah S. Winkler
           44 | Meghan H. Overstreet
           70 | Jack X. Cleveland
          103 | Alexandra I. Vu
          110 | Matt . Farmer
          173 | Mary R. Reyes
          188 | Steve G. Williams
          233 | Theodore V. McNulty
          250 | Marcus E. Williams
          294 | Samantha V. Young
          313 | Meghan P. Pavlov
          375 | Sally N. Vu
          384 | Emily R. Smith
          387 | Emily L. Garcia
...

The previous query and the next one are equivalent, and return the same results:

=> SELECT customer_key,customer_name FROM public.customer_dimension
       WHERE customer_key IN (SELECT customer_key
      FROM online_sales.online_sales_fact
         WHERE sales_dollar_amount > 400
         AND sales_dollar_amount < 500)
   AND customer_state = 'CT' ORDER BY customer_key;

See also

6 - INTO TABLE clause

Creates a table from a query result set.

Creates a table from a query result set.

Syntax

Permanent table:

INTO [TABLE] [[database.]schema.]table

Temporary table:

INTO [scope] TEMP[ORARY] [TABLE] [[database.]schema.]table
   [ ON COMMIT { DELETE | PRESERVE } ROWS ]

Parameters

scope
Specifies visibility of a temporary table definition:
  • GLOBAL (default): The table definition is visible to all sessions, and persists until you explicitly drop the table.

  • LOCAL: The table definition is visible only to the session in which it is created, and is dropped when the session ends.

Regardless of this setting, retention of temporary table data is set by the keywords ON COMMIT DELETE ROWS and ON COMMIT PRESERVE ROWS (see below).

For more information, see Creating temporary tables.

[database.]schema

Database and schema. The default schema is public. If you specify a database, it must be the current database.

table
The name of the table to create.
ON COMMIT { DELETE | PRESERVE } ROWS
Specifies whether data is transaction- or session-scoped:
  • DELETE (default) marks the temporary table for transaction-scoped data. Vertica removes all table data after each commit.

  • PRESERVE marks the temporary table for session-scoped data, which is preserved beyond the lifetime of a single transaction. Vertica removes all table data when the session ends.

Examples

The following SELECT statement has an INTO TABLE clause that creates table newTable from customer_dimension:

=> SELECT * INTO TABLE newTable FROM customer_dimension;

The following SELECT statement creates temporary table newTempTable. By default, temporary tables are created at a global scope, so its definition is visible to other sessions and persists until it is explicitly dropped. No customer_dimension data is copied into the new table, and Vertica issues a warning accordingly:

=> SELECT * INTO TEMP TABLE newTempTable FROM customer_dimension;
WARNING 4102:  No rows are inserted into table "public"."newTempTable" because
  ON COMMIT DELETE ROWS is the default for create temporary table
HINT:  Use "ON COMMIT PRESERVE ROWS" to preserve the data in temporary table
CREATE TABLE

The following SELECT statement creates local temporary table newTempTableLocal. This table is visible only to the session in which it was created, and is automatically dropped when the session ends. The INTO TABLE clause includes ON COMMIT PRESERVE ROWS, so Vertica copies all selection data into the new table:

=> SELECT * INTO LOCAL TEMP TABLE newTempTableLocal ON COMMIT PRESERVE ROWS
     FROM customer_dimension;
CREATE TABLE

7 - LIMIT clause

Specifies the maximum number of result set rows to return, either from the entire result set, or from windows of a partitioned result set.

Specifies the maximum number of result set rows to return, either from the entire result set, or from windows of a partitioned result set.

Syntax

Applied to entire result set:

   LIMIT { num-rows | ALL }

Applied to windows of a partitioned result set:

   LIMIT num-rows OVER ( PARTITION BY column-expr-x, ORDER BY column-expr-y [ASC | DESC]

Parameters

num-rows
The maximum number of rows to return.
ALL
Returns all rows, valid only when LIMIT is applied to the entire result set.
OVER()
Specifies how to partition and sort input data with respect to the current row. The input data is the result set that the query returns after it evaluates FROM, WHERE, GROUP BY, and HAVING clauses.

For details, see Using LIMIT with Window Partitioning below.

Limiting returned rows

LIMIT specifies to return only top-k rows from the queried dataset. Row precedence is determined by the query's ORDER BY clause.

For example, the following query returns the first 10 rows of data in table customer_dimension, as ordered by columns store_region and number_of_employees:

=> SELECT store_region, store_city||', '||store_state location, store_name, number_of_employees
     FROM store.store_dimension WHERE number_of_employees <= 12 ORDER BY store_region, number_of_employees LIMIT 10;
 store_region |    location    | store_name | number_of_employees
--------------+----------------+------------+---------------------
 East         | Stamford, CT   | Store219   |                  12
 East         | New Haven, CT  | Store66    |                  12
 East         | New York, NY   | Store122   |                  12
 MidWest      | South Bend, IN | Store134   |                  10
 MidWest      | Evansville, IN | Store30    |                  11
 MidWest      | Green Bay, WI  | Store27    |                  12
 South        | Mesquite, TX   | Store124   |                  10
 South        | Cape Coral, FL | Store18    |                  11
 South        | Beaumont, TX   | Store226   |                  11
 South        | Houston, TX    | Store33    |                  11
(10 rows)

Using LIMIT with window partitioning

You can use LIMIT to apply window partitioning on query results, and limit the number of rows that are returned in each window:

SELECT ... FROM dataset LIMIT num-rows OVER ( PARTITION BY column-expr-x, ORDER BY column-expr-y [ASC | DESC] )

where querying dataset returns num-rows rows in each column-expr-x partition with the highest or lowest values of column-expr-y.

For example, the following statement queries table store.store_dimension and specifies window partitioning on the result set. LIMIT is set to 2, so each window partition can display no more than two rows. The OVER clause specifies to partition the result set by store_region, where each partition window displays for one region the two stores with the smallest number of employees:

=> SELECT store_region, store_city||', '||store_state location, store_name, number_of_employees FROM store.store_dimension
     LIMIT 2 OVER (PARTITION BY store_region ORDER BY number_of_employees ASC);
 store_region |      location       | store_name | number_of_employees
--------------+---------------------+------------+---------------------
 West         | Norwalk, CA         | Store43    |                  10
 West         | Lancaster, CA       | Store95    |                  11
 East         | Stamford, CT        | Store219   |                  12
 East         | New York, NY        | Store122   |                  12
 SouthWest    | North Las Vegas, NV | Store170   |                  10
 SouthWest    | Phoenix, AZ         | Store228   |                  11
 NorthWest    | Bellevue, WA        | Store200   |                  19
 NorthWest    | Portland, OR        | Store39    |                  22
 MidWest      | South Bend, IN      | Store134   |                  10
 MidWest      | Evansville, IN      | Store30    |                  11
 South        | Mesquite, TX        | Store124   |                  10
 South        | Beaumont, TX        | Store226   |                  11
(12 rows)

8 - MATCH clause

A SQL extension that lets you screen large amounts of historical data in search of event patterns, the MATCH clause provides subclasses for analytic partitioning and ordering and matches rows from the result table based on a pattern you define.

A SQL extension that lets you screen large amounts of historical data in search of event patterns, the MATCH clause provides subclasses for analytic partitioning and ordering and matches rows from the result table based on a pattern you define.

You specify a pattern as a regular expression, which is composed of event types defined in the DEFINE subclause, where each event corresponds to a row in the input table. Then you can search for the pattern within a sequence of input events. Pattern matching returns the contiguous sequence of rows that conforms to PATTERN subclause. For example, pattern P (A B* C) consist of three event types: A, B, and C. When Vertica finds a match in the input table, the associated pattern instance must be an event of type A followed by 0 or more events of type B, and an event of type C.

Pattern matching is particularly useful for clickstream analysis where you might want to identify users' actions based on their Web browsing behavior (page clicks). For details, see Event series pattern matching.

Syntax

MATCH ( [ PARTITION BY table-column ] ORDER BY table-column
    DEFINE event-name AS boolean-expr [,...]
    PATTERN pattern-name AS ( regexp)
    [ rows-match-clause ] )

Arguments

PARTITION BY
Defines the window data scope in which the pattern, defined in the PATTERN subclause, is matched. The partition clause partitions the data by matched patterns defined in the PATTERN subclause. For each partition, data is sorted by the ORDER BY clause. If the partition clause is omitted, the entire data set is considered a single partition.
ORDER BY
Defines the window data scope in which the pattern, defined in the PATTERN subclause, is matched. For each partition, the order clause specifies how the input data is ordered for pattern matching.
DEFINE
Defines the boolean expressions that make up the event types in the regular expressions. For example:
DEFINE
 Entry    AS RefURL  NOT ILIKE '%website2.com%' AND PageURL ILIKE
                               '%website2.com%',
 Onsite   AS PageURL ILIKE     '%website2.com%' AND Action='V',
  
 Purchase AS PageURL ILIKE     '%website2.com%' AND Action='P'
  

The DEFINE subclause accepts a maximum of 52 events. See Event series pattern matching for examples.

event-name
Name of the event to evaluate for each row—in the earlier example, Entry, Onsite, Purchase.
boolean-expr
Expression that returns true or false. boolean_expr can include Logical operators and relational (comparison) operators. For example:
Purchase AS PageURL ILIKE '%website2.com%' AND Action = 'P'
PATTERN pattern-name
Name of the pattern defined in the PATTERN subclause; for example, P is the pattern name defined below:
 PATTERN P AS (...)

A PATTERN is a search pattern that is comprised of a name and a regular expression.

regexp
A regular expression comprised of event types defined in the DEFINE subclause and one or more quantifiers below. When Vertica evaluates the MATCH clause, the regular expression identifies the rows that meet the expression criteria.
  • *: Match 0 or more times

  • *?: Match 0 or more times, not greedily

  • +: Match 1 or more times

  • +?: Match 1 or more times, not greedily

  • ?: Match 0 or 1 time

  • ??: Match 0 or 1 time, not greedily

  • *+: Match 0 or more times, possessive

  • ++: Match 1 or more times, possessive

  • ?+: Match 0 or 1 time, possessive

  • |: Alternation. Matches expression before or after the vertical bar. Similar to a Boolean OR.

rows-match-clause
Specifies how to resolve more than one event evaluating to true for a single row, one of the following:
  • ROWS MATCH ALL EVENTS: If more than one event evaluates to true for a single row, Vertica returns this error :

    ERROR: pattern events must be mutually exclusive
    HINT:  try using ROWS MATCH FIRST EVENT
    
  • ROWS MATCH FIRST EVENT: If more than one event evaluates to true for a given row, Vertica uses the first event in the SQL statement for that row.

Pattern semantic evaluation

  • The semantic evaluating ordering of the SQL clauses is: FROM -> WHERE -> PATTERN MATCH -> SELECT.

  • Data is partitioned as specified in the PARTITION BY clause. If the partition clause is omitted, the entire data set is considered a single partition.

  • For each partition, the order clause specifies how the input data is ordered for pattern matching.

  • Events are evaluated for each row. A row could have 0, 1, or N events evaluate to true. If more than one event evaluates to true for the same row, Vertica returns a run-time error unless you specify ROWS MATCH FIRST EVENT. If you specify ROWS MATCH FIRST EVENT and more than one event evaluates to TRUE for a single row, Vertica chooses the event that was defined first in the SQL statement to be the event it uses for the row.

  • Vertica performs pattern matching by finding the contiguous sequence of rows that conforms to the pattern defined in the PATTERN subclause.

For each match, Vertica outputs the rows that contribute to the match. Rows not part of the match (do not satisfy one or more predicates) are not output.

  • Vertica reports only non-overlapping matches. If an overlap occurs, Vertica chooses the first match found in the input stream. After finding the match, Vertica looks for the next match, starting at the end of the previous match.

  • Vertica reports the longest possible match, not a subset of a match. For example, consider pattern: AB with input: AAAB. Because A uses the greedy regular expression quantifier (), Vertica reports all A inputs (AAAB), not AAB, AB, or B.

Notes and restrictions

  • DISTINCT and GROUP BY/HAVING clauses are not allowed in pattern match queries.

  • The following expressions are not allowed in the DEFINE subclause:

    • Subqueries, such as DEFINE X AS c IN ELECT c FROM table

    • Analytic functions, such as DEFINE X AS c < LEA1) OVER (ORDER BY 1)

    • Aggregate functions, such as DEFINE X AS c < MA1)

  • You cannot use the same pattern name to define a different event; for example, the following is not allowed for X:

    DEFINE   X AS c1 <  3
      X AS c1  >= 3
    
  • Used with MATCH clause, Vertica MATCH clause functions provide additional data about the patterns it finds. For example, you can use the functions to return values representing the name of the event that matched the input row, the sequential number of the match, or a partition-wide unique identifier for the instance of the pattern that matched.

Examples

For examples, see Event series pattern matching.

See also

8.1 - Event series pattern matching

The SQL MATCH clause syntax lets you screen large amounts of historical data in search of event patterns.

The SQL MATCH clause syntax lets you screen large amounts of historical data in search of event patterns. You specify a pattern as a regular expression and can then search for the pattern within a sequence of input events. MATCH provides subclauses for analytic data partitioning and ordering, and the pattern matching occurs on a contiguous set of rows.

Pattern matching is particularly useful for clickstream analysis where you might want to identify users' actions based on their Web browsing behavior (page clicks). A typical online clickstream funnel is:

Company home page -> product home page -> search -> results -> purchase online

Using this clickstream funnel, you can search for a match on the user's sequence of web clicks and identify that user:

  • Landed on the company home page

  • Navigated to the product page

  • Ran a search

  • Clicked a link from the search results

  • Made a purchase

Clickstream funnel schema

The examples in this topic use this clickstream funnel and the following clickstream_log table schema:

=> CREATE TABLE clickstream_log (
  uid INT,             --user ID
  sid INT,             --browsing session ID, produced by previous sessionization computation
  ts TIME,             --timestamp that occurred during the user's page visit
  refURL VARCHAR(20),  --URL of the page referencing PageURL
  pageURL VARCHAR(20), --URL of the page being visited
  action CHAR(1)       --action the user took after visiting the page ('P' = Purchase, 'V' = View)
);

INSERT INTO clickstream_log VALUES (1,100,'12:00','website1.com','website2.com/home', 'V');
INSERT INTO clickstream_log VALUES (1,100,'12:01','website2.com/home','website2.com/floby', 'V');
INSERT INTO clickstream_log VALUES (1,100,'12:02','website2.com/floby','website2.com/shamwow', 'V');
INSERT INTO clickstream_log values (1,100,'12:03','website2.com/shamwow','website2.com/buy', 'P');
INSERT INTO clickstream_log values (2,100,'12:10','website1.com','website2.com/home', 'V');
INSERT INTO clickstream_log values (2,100,'12:11','website2.com/home','website2.com/forks', 'V');
INSERT INTO clickstream_log values (2,100,'12:13','website2.com/forks','website2.com/buy', 'P');
COMMIT;

Here's the clickstream_log table's output:

=> SELECT * FROM clickstream_log;
 uid | sid |    ts    |        refURL        |       pageURL        | action
-----+-----+----------+----------------------+----------------------+--------
   1 | 100 | 12:00:00 | website1.com         | website2.com/home    | V
   1 | 100 | 12:01:00 | website2.com/home    | website2.com/floby   | V
   1 | 100 | 12:02:00 | website2.com/floby   | website2.com/shamwow | V
   1 | 100 | 12:03:00 | website2.com/shamwow | website2.com/buy     | P
   2 | 100 | 12:10:00 | website1.com         | website2.com/home    | V
   2 | 100 | 12:11:00 | website2.com/home    | website2.com/forks   | V
   2 | 100 | 12:13:00 | website2.com/forks   | website2.com/buy     | P
(7 rows)

Examples

This example includes the Vertica MATCH clause functions to analyze users' browsing history over website2.com. It identifies patterns where the user performed the following tasks:

  • Landed on website2.com from another web site (Entry)

  • Browsed to any number of other pages (Onsite)

  • Made a purchase (Purchase)

In the following statement, pattern P (Entry Onsite* Purchase) consist of three event types: Entry, Onsite, and Purchase. When Vertica finds a match in the input table, the associated pattern instance must be an event of type Entry followed by 0 or more events of type Onsite, and an event of type Purchase

=> SELECT uid,
       sid,
       ts,
       refurl,
       pageurl,
       action,
       event_name(),
       pattern_id(),
       match_id()
FROM clickstream_log
MATCH
  (PARTITION BY uid, sid ORDER BY ts
   DEFINE
     Entry    AS RefURL  NOT ILIKE '%website2.com%' AND PageURL ILIKE '%website2.com%',
     Onsite   AS PageURL ILIKE     '%website2.com%' AND Action='V',
     Purchase AS PageURL ILIKE     '%website2.com%' AND Action = 'P'
   PATTERN
     P AS (Entry Onsite* Purchase)
   ROWS MATCH FIRST EVENT);

In the output below, the first four rows represent the pattern for user 1's browsing activity, while the following three rows show user 2's browsing habits.

 uid | sid |    ts    |        refurl        |       pageurl        | action | event_name | pattern_id | match_id
-----+-----+----------+----------------------+----------------------+--------+------------+------------+----------
   1 | 100 | 12:00:00 | website1.com         | website2.com/home    | V      | Entry      |          1 |        1
   1 | 100 | 12:01:00 | website2.com/home    | website2.com/floby   | V      | Onsite     |          1 |        2
   1 | 100 | 12:02:00 | website2.com/floby   | website2.com/shamwow | V      | Onsite     |          1 |        3
   1 | 100 | 12:03:00 | website2.com/shamwow | website2.com/buy     | P      | Purchase   |          1 |        4
   2 | 100 | 12:10:00 | website1.com         | website2.com/home    | V      | Entry      |          1 |        1
   2 | 100 | 12:11:00 | website2.com/home    | website2.com/forks   | V      | Onsite     |          1 |        2
   2 | 100 | 12:13:00 | website2.com/forks   | website2.com/buy     | P      | Purchase   |          1 |        3
(7 rows)

See also

9 - MINUS clause

MINUS is an alias for EXCEPT.

MINUS is an alias for EXCEPT.

10 - OFFSET clause

Omits a specified number of rows from the beginning of the result set.

Omits a specified number of rows from the beginning of the result set.

Syntax

OFFSET rows

Parameters

start-row
Specifies the first row to include in the result set. All preceding rows are omitted.

Dependencies

  • Use an ORDER BY clause with OFFSET. Otherwise, the query returns an undefined subset of the result set.

  • OFFSET must follow the ORDER BY clause in a SELECT statement or UNION clause.

  • When a SELECT statement or UNION clause specifies both LIMIT and OFFSET, Vertica first processes the OFFSET statement, and then applies the LIMIT statement to the remaining rows.

Examples

The following query returns 14 rows from the customer_dimension table:

=> SELECT customer_name, customer_gender FROM customer_dimension
   WHERE occupation='Dancer' AND customer_city = 'San Francisco' ORDER BY customer_name;
    customer_name     | customer_gender
----------------------+-----------------
 Amy X. Lang          | Female
 Anna H. Li           | Female
 Brian O. Weaver      | Male
 Craig O. Pavlov      | Male
 Doug Z. Goldberg     | Male
 Harold S. Jones      | Male
 Jack E. Perkins      | Male
 Joseph W. Overstreet | Male
 Kevin . Campbell     | Male
 Raja Y. Wilson       | Male
 Samantha O. Brown    | Female
 Steve H. Gauthier    | Male
 William . Nielson    | Male
 William Z. Roy       | Male
(14 rows)

If you modify the previous query to specify an offset of 8 (OFFSET 8), Vertica skips the first eight rows of the previous result set. The query returns the following results:

=> SELECT customer_name, customer_gender FROM customer_dimension
   WHERE occupation='Dancer' AND customer_city = 'San Francisco' ORDER BY customer_name OFFSET 8;
   customer_name   | customer_gender
-------------------+-----------------
 Kevin . Campbell  | Male
 Raja Y. Wilson    | Male
 Samantha O. Brown | Female
 Steve H. Gauthier | Male
 William . Nielson | Male
 William Z. Roy    | Male
(6 rows)

11 - ORDER BY clause

Sorts a query result set on one or more columns or column expressions.

Sorts a query result set on one or more columns or column expressions. Vertica uses the current locale and collation sequence to compare and sort string values.

Syntax

ORDER BY expression [ ASC | DESC ] [,...]

Parameters

expression
One of the following:
  • Name or ordinal number of a SELECT list item. The ordinal number refers to the position of the result column, counting from the left beginning at one. Use them to order by a column whose name is not unique. Ordinal numbers are invalid for an ORDER BY clause of an analytic function's OVER clause.

  • Arbitrary expression formed from columns that do not appear in the SELECT list

  • CASE expression.

ASC | DESC
Specifies whether to sort values in ascending or descending order. NULL values are either first or last in the sort order, depending on data type:
  • INTEGER, INT, DATE/TIME: NULL has the smallest value.

  • FLOAT, BOOLEAN, CHAR, VARCHAR, ARRAY, SET: NULL has the largest value

Examples

The follow example returns all the city and deal size for customer Metamedia, sorted by deal size in descending order.

=> SELECT customer_city, deal_siz FROM customer_dimension WHERE customer_name = 'Metamedia'
   ORDER BY deal_size DESC;
  customer_city   | deal_size
------------------+-----------
 El Monte         |   4479561
 Athens           |   3815416
 Ventura          |   3792937
 Peoria           |   3227765
 Arvada           |   2671849
 Coral Springs    |   2643674
 Fontana          |   2374465
 Rancho Cucamonga |   2214002
 Wichita Falls    |   2117962
 Beaumont         |   1898295
 Arvada           |   1321897
 Waco             |   1026854
 Joliet           |    945404
 Hartford         |    445795
(14 rows)

The following example uses a transform function. It returns an error because the ORDER BY column is not in the window partition.

=> CREATE TABLE t(geom geometry(200), geog geography(200));
=> SELECT PolygonPoint(geom) OVER(PARTITION BY geom)
   AS SEL_0 FROM t ORDER BY geog;
ERROR 2521: Cannot specify anything other than user defined transforms and partitioning expressions in the ORDER BY list

The following example, using the same table, corrects this error.

=> SELECT PolygonPoint(geom) OVER(PARTITION BY geom)
   AS SEL_0 FROM t ORDER BY geom;

The following example uses an array in the ORDER BY clause.

=> SELECT * FROM employees
   ORDER BY grant_values;
 id | department |          grants          |  grant_values
----+------------+--------------------------+----------------
 36 | Astronomy  | ["US-7376","DARPA-1567"] | [5000,4000]
 36 | Physics    | ["US-7376","DARPA-1567"] | [10000,25000]
 33 | Physics    | ["US-7376"]              | [30000]
 42 | Physics    | ["US-7376","DARPA-1567"] | [65000,135000]
(4 rows)

12 - TIMESERIES clause

Provides gap-filling and interpolation (GFI) computation, an important component of time series analytics computation.

Provides gap-filling and interpolation (GFI) computation, an important component of time series analytics computation. See Time series analytics for details and examples.

Syntax

TIMESERIES slice-time AS 'length-and-time-unit-expr' OVER (
  [ PARTITION BY column-expr[,...] ] ORDER BY time-expr ) [ ORDER BY table-column[,...] ]

Parameters

slice-time
A time column produced by the TIMESERIES clause, which stores the time slice start times generated from gap filling.

Note: This parameter is an alias, so you can use any name that an alias would take.

length-and-time-unit-expr
An INTERVAL DAY TO SECOND literal that specifies the length of time unit of time slice computation. For example:

`TIMESERIES slice_time AS '3 seconds' ...

OVER()
Specifies partitioning and ordering for the function. OVER() also specifies that the time series function operates on a query result set—that is, the rows that are returned after the FROM, WHERE, GROUP BY, and HAVING clauses are evaluated.
PARTITION BY (column-expr`[,...] )`
Partitions the data by the specified column expressions. Gap filling and interpolation is performed on each partition separately.
ORDER BY time-expr
Sorts the data by the TIMESTAMP expression time-expr, which computes the time information of the time series data.

Notes

If the window-partition-clause is not specified in TIMESERIES OVER(), for each defined time slice, exactly one output record is produced; otherwise, one output record is produced per partition per time slice. Interpolation is computed there.

Given a query block that contains a TIMESERIES clause, the following are the semantic phases of execution (after evaluating the FROM and the optional WHERE clauses):

  1. Compute time-expression.

  2. Perform the same computation as the TIME_SLICE() function on each input record based on the result of time-exp and 'length-and-time-unit-expr'.

    1. Perform gap filling to generate time slices missing from the input.

    2. Name the result of this computation as slice_time, which represents the generated "time series" column (alias) after gap filling.

  3. Partition the data by expression, slice-time. For each partition, do step 4.

  4. Sort the data by time-expr. Interpolation is computed here.

There is semantic overlap between the TIMESERIES clause and the TIME_SLICE function with the following key differences:

  • TIMESERIES only supports the interval qualifier DAY TO SECOND; it does not allow YEAR TO MONTH.

  • Unlike TIME_SLICE, the time slice length and time unit expressed in *length-and-time-unit-expr *must be constants so gaps in the time slices are well-defined.

  • TIMESERIES performs gap filling; the TIME_SLICE function does not.

  • TIME_SLICE can return the start or end time of a time slice, depending on the value of its fourth input parameter (start-or-end). TIMESERIES, on the other hand, always returns the start time of each time slice. To output the end time of each time slice, write a SELECT statement like the following:

    => SELECT slice_time + <slice_length>;
    

Restrictions

  • When the TIMESERIES clause occurs in a SQL query block, only the following clauses can be used in the same query block:

    • SELECT

    • FROM

    • WHERE

    • ORDER BY

    GROUP BY and HAVING clauses are not allowed. If a GROUP BY operation is needed before or after gap-filling and interpolation (GFI), use a subquery and place the GROUP BY In the outer query. For example:

    => SELECT symbol, AVG(first_bid) as avg_bid FROM (
            SELECT symbol, slice_time, TS_FIRST_VALUE(bid1) AS first_bid
            FROM Tickstore
            WHERE symbol IN ('MSFT', 'IBM')
            TIMESERIES slice_time AS '5 seconds' OVER (PARTITION BY symbol ORDER BY ts)
            ) AS resultOfGFI
    GROUP BY symbol;
    
  • When the TIMESERIES clause is present in the SQL query block, the SELECT list can include only the following:

    For example, the following two queries return a syntax error because bid1 is not a PARTITION BY or GROUP BY column:

    => SELECT bid, symbol, TS_FIRST_VALUE(bid) FROM Tickstore
       TIMESERIES slice_time AS '5 seconds' OVER (PARTITION BY symbol ORDER BY ts);
       ERROR:  column "Tickstore.bid" must appear in the PARTITION BY list of Timeseries clause or be used in a Timeseries Output function
    => SELECT bid, symbol, AVG(bid) FROM Tickstore
       GROUP BY symbol;
       ERROR:  column "Tickstore.bid" must appear in the GROUP BY clause or be used in an aggregate function
    

Examples

For examples, see Gap filling and interpolation (GFI).

See also

13 - UNION clause

Combines the results of multiple SELECT statements.

Combines the results of multiple SELECT statements. You can include UNION in FROM, WHERE, and HAVING clauses.

Syntax

select { UNION [ ALL | DISTINCT ] select }[...]
    [ order-by-clause  [ offset-clause ]]
    [ limit-clause ]

Arguments

select
A SELECT statement that returns one or more rows, depending on whether you specify keywords DISTINCT or ALL.

The first SELECT statement can include the LABEL hint. Vertica ignores LABEL hints in subsequent SELECT statements.

Each SELECT statement can specify its own ORDER BY, LIMIT, and OFFSET clauses. A SELECT statement with one or more of these clauses must be enclosed by parentheses. See also: ORDER BY, LIMIT, and OFFSET Clauses in UNION.

DISTINCT, ALL
How to return duplicate rows:
  • DISTINCT (default) returns only unique rows.

  • ALL concatenates all rows, including duplicates. For best performance, use UNION ALL.

UNION ALL supports columns of complex types; UNION DISTINCT does not.

Requirements

  • Each row of the UNION result set must be in the result set of at least one of its SELECT statements.

  • Each SELECT statement must specify the same number of columns.

  • Data types of corresponding SELECT statement columns must be compatible, otherwise Vertica returns an error.

ORDER BY, LIMIT, and OFFSET clauses in UNION

A UNION statement can specify its own ORDER BY, LIMIT, and OFFSET clauses, as in the following example:

=> SELECT id, emp_name FROM company_a UNION ALL SELECT id, emp_name FROM company_b ORDER BY emp_name LIMIT 2;
  id  | emp_name
------+----------
 5678 | Alice
 8765 | Bob
(2 rows)

Each SELECT statement in a UNION clause can specify its own ORDER BY, LIMIT, and OFFSET clauses. Vertica processes the SELECT statement clauses before it processes the UNION clauses. In the following example, Vertica processes the individual queries and then concatenates the two result sets:

=> (SELECT id, emp_name FROM company_a ORDER BY emp_name LIMIT 2)
   UNION ALL
   (SELECT id, emp_name FROM company_b ORDER BY emp_name LIMIT 2);
  id  | emp_name
------+-----------
 5678 | Alice
 9012 | Katherine
 8765 | Bob
 9012 | Katherine
(4 rows)

The following requirements and restrictions determine how Vertica processes a UNION clause that contains ORDER BY, LIMIT, and OFFSET clauses:

  • A UNION's ORDER BY clause must specify columns from the first (leftmost) SELECT statement.

  • ORDER BY must precede LIMIT and OFFSET.

  • When a SELECT or UNION statement specifies both LIMIT and OFFSET, Vertica first processes the OFFSET statement, and then applies the LIMIT statement to the remaining rows.

UNION in non-correlated subqueries

Vertica supports UNION in noncorrelated subquery predicates. For example:

=> SELECT DISTINCT customer_key, customer_name FROM public.customer_dimension
   WHERE customer_key IN
     (SELECT customer_key FROM store.store_sales_fact WHERE sales_dollar_amount > 500
      UNION ALL
      SELECT customer_key FROM online_sales.online_sales_fact WHERE sales_dollar_amount > 500)
   AND customer_state = 'CT';
 customer_key |     customer_name
--------------+------------------------
         7021 | Luigi T. Dobisz
         1971 | Betty V. Dobisz
        46284 | Ben C. Gauthier
        33885 | Tanya Y. Taylor
         5449 | Sarah O. Robinson
        29059 | Sally Z. Fortin
        11200 | Foodhope
        15582 | John J. McNulty
        24638 | Alexandra F. Jones
 ...

UNION ALL with complex types

You can use UNION ALL with complex types. Consider a table with the following definition:

=> CREATE TABLE restaurants(
    name VARCHAR, cuisine VARCHAR,
    locations ARRAY[ROW(city VARCHAR(50), state VARCHAR(2)),50],
    menu ARRAY[ROW(item VARCHAR(50), price FLOAT),100] );

Suppose you are in a new city looking for a place to eat. The database has information about the following restaurants:

=> SELECT name, cuisine FROM restaurants
   WHERE CONTAINS(locations,ROW('Pittsburgh', 'PA'));
       name        | cuisine
-------------------+----------
 Bakersfield Tacos | Mexican
 Bob's pizzeria    | Italian
 Succulent Steaks  | American
 Sushi House       | Asian
 Villa Milano      | Italian
(5 rows)

Suppose you are hungry for Italian food. If you cannot have Italian, you want something inexpensive. The following query uses two SELECT clauses from the same table, one finding menu items for Italian restaurants and one finding menu items under $10:

=> WITH menu_entries AS
    (SELECT name, cuisine,
            EXPLODE(menu USING PARAMETERS skip_partitioning=true) AS (idx, menu_entry)
     FROM restaurants WHERE CONTAINS(locations,ROW('Pittsburgh', 'PA')))
   SELECT name, cuisine, menu_entry FROM menu_entries WHERE cuisine = 'Italian'
   UNION ALL
   SELECT name, cuisine, menu_entry FROM menu_entries WHERE menu_entry.price <= 10;
       name        | cuisine |                 menu_entry
-------------------+---------+--------------------------------------------
 Bob's pizzeria    | Italian | {"item":"cheese pizza","price":8.25}
 Bob's pizzeria    | Italian | {"item":"spinach pizza","price":10.5}
 Villa Milano      | Italian | {"item":"pasta carbonara","price":24.99}
 Villa Milano      | Italian | {"item":"eggplant parmesan","price":23.49}
 Villa Milano      | Italian | {"item":"herbed salmon","price":28.99}
 Bakersfield Tacos | Mexican | {"item":"veggie taco","price":9.95}
 Bob's pizzeria    | Italian | {"item":"cheese pizza","price":8.25}
(7 rows)

You cannot use LIMIT OVER with UNION ALL if the selected columns are of complex types. In this case, the statement returns an error like "Multi-value expressions are not supported in this context". You can still use LIMIT OVER in a single SELECT statement by using parentheses to make the scoping explicit.

Examples

The examples that follow use these two tables:

=> SELECT * FROM company_a;
  ID    emp_name       dept       sales
------+------------+-------------+-------
1234  | Stephen    | auto parts  | 1000
5678  | Alice      | auto parts  | 2500
9012  | Katherine  | floral      |  500

=> SELECT * FROM company_b;
  ID    emp_name       dept       sales
------+------------+-------------+-------
4321  | Marvin     | home goods  |   250
9012  | Katherine  | home goods  |   500
8765  | Bob        | electronics | 20000

The following query finds all employee IDs and names from the two tables. The UNION statement uses DISTINCT to combine unique IDs and last names of employees. Katherine works for both companies, so she appears only once in the result set. DISTINCT is the default and can be omitted:

=> SELECT id, emp_name FROM company_a
   UNION DISTINCT SELECT id, emp_name FROM company_b ORDER BY id;
  id  | emp_name
------+-----------
 1234 | Stephen
 4321 | Marvin
 5678 | Alice
 8765 | Bob
 9012 | Katherine
(5 rows)

If the UNION statement instead uses ALL, the query returns two records for Katherine:

=> SELECT id, emp_name FROM company_a
   UNION ALL SELECT id, emp_name FROM company_b ORDER BY id;
  id  | emp_name
------+-----------
 1234 | Stephen
 5678 | Alice
 9012 | Katherine
 4321 | Marvin
 9012 | Katherine
 8765 | Bob
(6 rows)

The following query returns the top two salespeople in each company. Each SELECT statement specifies its own ORDER BY and LIMIT clauses, and the UNION statement concatenates the result sets as returned by each query:

=> (SELECT id, emp_name, sales FROM company_a ORDER BY sales DESC LIMIT 2)
   UNION ALL
   (SELECT id, emp_name, sales FROM company_b ORDER BY sales DESC LIMIT 2);
  id  |  emp_name | sales
------+-----------+-------
 8765 | Bob       | 20000
 5678 | Alice     |  2500
 1234 | Stephen   |  1000
 9012 | Katherine |   500
(4 rows)

The following query returns all employees in both companies with an overall ordering. The ORDER BY clause is part of the UNION statement:

=> SELECT id, emp_name, sales FROM company_a
   UNION
   SELECT id, emp_name, sales FROM company_b
   ORDER BY sales;
  id  |  emp_name | sales
------+-----------+-------
 4321 | Marvin    |   250
 9012 | Katherine |   500
 1234 | Stephen   |  1000
 5678 | Alice     |  2500
 8765 | Bob       | 20000
(5 rows)

The following query groups total sales by department within each company. Each SELECT statement has its own GROUP BY clause. UNION combines the aggregate results from each query:

=> (SELECT 'Company A' as company, dept, SUM(sales) FROM company_a
    GROUP BY dept)
    UNION
   (SELECT 'Company B' as company, dept, SUM(sales) FROM company_b
    GROUP BY dept)
    ORDER BY 1;
 company   |    dept     |  sum
-----------+-------------+-------
 Company A | auto parts  |  3500
 Company A | floral      |   500
 Company B | electronics | 20000
 Company B | home goods  |   750
(4 rows)

See also

14 - WHERE clause

Specifies which rows to include in a query's result set.

Specifies which rows to include in a query's result set.

Syntax

WHERE boolean-expression [ subquery ]...

Arguments

boolean-expression
An expression that returns true or false. The result set only includes rows that evaluate to true. The expression can include boolean operators and the following predicate elements:

Use parentheses to group expressions, predicates, and boolean operators. For example:

... WHERE NOT (A=1 AND B=2) OR C=3;

Examples

The following example returns the names of all customers in the Eastern region whose name starts with the string Amer:

=> SELECT DISTINCT customer_name
   FROM customer_dimension
   WHERE customer_region = 'East'
   AND customer_name ILIKE 'Amer%';
 customer_name
---------------
 Americare
 Americom
 Americore
 Americorp
 Ameridata
 Amerigen
 Amerihope
 Amerimedia
 Amerishop
 Ameristar
 Ameritech
(11 rows)

15 - WITH clause

A WITH clause defines one or more named common table expressions (CTEs), where each CTE encapsulates a result set that can be referenced by another CTE in the same WITH clause, or by the primary query.

A WITH clause defines one or more named common table expressions (CTEs), where each CTE encapsulates a result set that can be referenced by another CTE in the same WITH clause, or by the primary query. Vertica can evaluate WITH clauses in two ways:

  • Inline expansion (default): Vertica evaluates each WITH clause every time it is referenced by the primary query.
  • Materialization: Vertica evaluates each WITH clause once, stores results in a temporary table, and references this table as often as the query requires.

In both cases, WITH clauses can help simplify complicated queries and avoid statement repetition.

Syntax

WITH [ /*+ENABLE_WITH_CLAUSE_MATERIALIZATION */ ] [ RECURSIVE ] {
   cte-identifier [ ( column-aliases ) ] AS (
   [ subordinate-WITH-clause ]
   query-expression )
} [,...]

Arguments

/*+ENABLE_WITH_CLAUSE_MATERIALIZATION*/
Enables materialization of all queries in the current WITH clause. Otherwise, materialization is set by configuration parameter WithClauseMaterialization, by default set to 0 (disabled). If WithClauseMaterialization is disabled, materialization is automatically cleared when the primary query of the WITH clause returns. For details, see Materialization of WITH clause.
RECURSIVE
Specifies to iterate over the WITH clause's own result set, through repeated execution of an embedded UNION or UNION ALL statement. For details, see WITH clause recursion.
cte-identifier
Identifies a common table expression (CTE) within a WITH clause. This identifier is available to CTEs of the same WITH clause, and of parent and child WITH clauses (if any). CTE identifiers of the outermost (primary) WITH clause are also available to the primary query.

All CTE identifiers of the same WITH clause must be unique. For example, the following WITH clause defines two CTEs, so they require unique identifiers: regional_sales and top_regions:

  
WITH
-- query sale amounts for each region
   regional_sales AS (SELECT ... ),
   top_regions AS ( SELECT ... )
   )
column-aliases
A comma-delimited list of result set column aliases. The list of aliases must map to all column expressions in the CTE query. If omitted, result set columns can only be referenced by the names used in the query.

In the following example, the revenue CTE specifies two column aliases: vkey and total_revenue. These map to column vendor_key and aggregate expression SUM(total_order_cost), respectively. The primary query references these aliases:

  
WITH revenue ( vkey, total_revenue ) AS (
   SELECT vendor_key, SUM(total_order_cost)
   FROM store.store_orders_fact
   GROUP BY vendor_key ORDER BY vendor_key)
  
SELECT v.vendor_name, v.vendor_address, v.vendor_city, r.total_revenue
FROM vendor_dimension v JOIN revenue r ON v.vendor_key = r.vkey
WHERE r.total_revenue = (SELECT MAX(total_revenue) FROM revenue )
ORDER BY vendor_name;
subordinate-WITH-clause
A WITH clause that is nested within the current one. CTEs of this WITH clause can only reference CTEs of the same clause, and of parent and child WITH clauses.
query-expression
The query of a given CTE.

Restrictions

WITH clauses only support SELECT and INSERT statements. They do not support UPDATE or DELETE statements.

Examples

Single WITH clause with single CTE

The following SQL defines a WITH clause with one CTE, revenue, which aggregates data in table store.store_orders_fact. The primary query references the WITH clause result set twice: in its JOIN clause and predicate:

-- define WITH clause
WITH revenue ( vkey, total_revenue ) AS (
      SELECT vendor_key, SUM(total_order_cost)
      FROM store.store_orders_fact
      GROUP BY vendor_key ORDER BY 1)
-- End WITH clause

-- primary query
SELECT v.vendor_name, v.vendor_address, v.vendor_city, r.total_revenue
FROM vendor_dimension v JOIN revenue r ON v.vendor_key = r.vkey
WHERE r.total_revenue = (SELECT MAX(total_revenue) FROM revenue )
ORDER BY vendor_name;
   vendor_name    | vendor_address | vendor_city | total_revenue
------------------+----------------+-------------+---------------
 Frozen Suppliers | 471 Mission St | Peoria      |      49877044
(1 row)

Single WITH clause and multiple CTEs

In the following example, the WITH clause contains two CTEs:

  • regional_sales totals sales for each region

  • top_regions uses the result set from regional_sales to identify the three regions with the highest sales:

The primary query aggregates sales by region and departments in the top_regions result set:


WITH
-- query sale amounts for each region
   regional_sales (region, total_sales) AS (
        SELECT sd.store_region, SUM(of.total_order_cost) AS total_sales
        FROM store.store_dimension sd JOIN store.store_orders_fact of ON sd.store_key = of.store_key
        GROUP BY store_region ),
-- query previous result set
   top_regions AS (
        SELECT region, total_sales
        FROM regional_sales ORDER BY total_sales DESC LIMIT 3
     )

-- primary query
-- aggregate sales in top_regions result set
SELECT sd.store_region AS region, pd.department_description AS department, SUM(of.total_order_cost) AS product_sales
FROM store.store_orders_fact of
JOIN store.store_dimension sd ON sd.store_key = of.store_key
JOIN public.product_dimension pd ON of.product_key = pd.product_key
WHERE sd.store_region IN (SELECT region FROM top_regions)
GROUP BY ROLLUP (region, department) ORDER BY region, product_sales DESC, GROUPING_ID();

 region  |            department            | product_sales
---------+----------------------------------+---------------
 East    |                                  |    1716917786
 East    | Meat                             |     189837962
 East    | Produce                          |     170607880
 East    | Photography                      |     162271618
 East    | Frozen Goods                     |     141077867
 East    | Gifts                            |     137604397
 East    | Bakery                           |     136497842
 East    | Liquor                           |     130410463
 East    | Canned Goods                     |     128683257
 East    | Cleaning supplies                |     118996326
 East    | Dairy                            |     118866901
 East    | Seafood                          |     109986665
 East    | Medical                          |     100404891
 East    | Pharmacy                         |      71671717
 MidWest |                                  |    1287550770
 MidWest | Meat                             |     141446607
 MidWest | Produce                          |     125156100
 MidWest | Photography                      |     122666753
 MidWest | Frozen Goods                     |     105893534
 MidWest | Gifts                            |     103088595
 MidWest | Bakery                           |     102844467
 MidWest | Canned Goods                     |      97647270
 MidWest | Liquor                           |      97306898
 MidWest | Cleaning supplies                |      90775242
 MidWest | Dairy                            |      89065443
 MidWest | Seafood                          |      82541528
 MidWest | Medical                          |      76674814
 MidWest | Pharmacy                         |      52443519
 West    |                                  |    2159765937
 West    | Meat                             |     235841506
 West    | Produce                          |     215277204
 West    | Photography                      |     205949467
 West    | Frozen Goods                     |     178311593
 West    | Bakery                           |     172824555
 West    | Gifts                            |     172134780
 West    | Liquor                           |     164798022
 West    | Canned Goods                     |     163330813
 West    | Cleaning supplies                |     148776443
 West    | Dairy                            |     145244575
 West    | Seafood                          |     139464407
 West    | Medical                          |     126184049
 West    | Pharmacy                         |      91628523
         |                                  |    5164234493
(43 rows)

INSERT statement that includes WITH clause

The following SQL uses a WITH clause to insert data from a JOIN query into table total_store_sales:

CREATE TABLE total_store_sales (store_key int, region VARCHAR(20), store_sales numeric (12,2));

INSERT INTO total_store_sales
WITH store_sales AS (
        SELECT sd.store_key, sd.store_region::VARCHAR(20), SUM (of.total_order_cost)
        FROM store.store_dimension sd JOIN store.store_orders_fact of ON sd.store_key = of.store_key
        GROUP BY sd.store_region, sd.store_key ORDER BY sd.store_region, sd.store_key)
SELECT * FROM store_sales;

=> SELECT * FROM total_store_sales ORDER BY region, store_key;
 store_key |  region   | store_sales
-----------+-----------+-------------
         2 | East      | 47668303.00
         6 | East      | 48136354.00
        12 | East      | 46673113.00
        22 | East      | 48711211.00
        24 | East      | 48603836.00
        31 | East      | 46836469.00
        36 | East      | 48461449.00
        37 | East      | 48018279.00
        41 | East      | 48713084.00
        44 | East      | 47808362.00
        49 | East      | 46990023.00
        50 | East      | 47643329.00
         9 | MidWest   | 46851087.00
        15 | MidWest   | 48787354.00
        27 | MidWest   | 48497620.00
        29 | MidWest   | 47639234.00
        30 | MidWest   | 49013483.00
        38 | MidWest   | 48856012.00
        42 | MidWest   | 47297912.00
        45 | MidWest   | 48544521.00
        46 | MidWest   | 48887255.00
         4 | NorthWest | 47580215.00
        39 | NorthWest | 47136892.00
        47 | NorthWest | 48477574.00
         8 | South     | 48131455.00
        13 | South     | 47605422.00
        17 | South     | 46054367.00
...
(50 rows)

15.1 - Inline expansion of WITH clause

By default, Vertica uses inline expansion to evaluate WITH clauses.

By default, Vertica uses inline expansion to evaluate WITH clauses. Vertica evaluates each WITH clause every time it is referenced by the primary query. Inline expansion often works best if the query does not reference the same WITH clause multiple times, or if some local optimizations are possible after inline expansion.

Example

The following example shows a WITH clause that is a good candidate for inline expansion. The WITH clause is used in a query that obtains order information for all 2007 orders shipped between December 01-07:


-- Begin WITH
WITH store_orders_fact_new AS(
   SELECT * FROM store.store_orders_fact WHERE date_shipped between '2007-12-01' and '2007-12-07')
-- End WITH
-- Begin primary query
SELECT store_key, product_key, product_version, SUM(quantity_ordered*unit_price) AS total_price
FROM store_orders_fact_new
GROUP BY store_key, product_key, product_version
ORDER BY total_price DESC;

 store_key | product_key | product_version | total_price
-----------+-------------+-----------------+-------------
       232 |        1855 |               2 |       29008
       125 |        8500 |               4 |       28812
       139 |        3707 |               2 |       28812
       212 |        3203 |               1 |       28000
       236 |        8023 |               4 |       27548
       123 |       10598 |               2 |       27146
        34 |        8888 |               4 |       27100
       203 |        2243 |               1 |       27027
       117 |       13932 |               2 |       27000
        84 |         768 |               1 |       26936
       123 |        1038 |               1 |       26885
       106 |       18932 |               1 |       26864
        93 |       10395 |               3 |       26790
       162 |       13073 |               1 |       26754
        15 |        3679 |               1 |       26675
        52 |        5957 |               5 |       26656
       190 |        8114 |               3 |       26611
         5 |        7772 |               1 |       26588
       139 |        6953 |               3 |       26572
       202 |       14735 |               1 |       26404
       133 |        2740 |               1 |       26312
       198 |        8545 |               3 |       26287
       221 |        7582 |               2 |       26280
       127 |        9468 |               3 |       26224
        63 |        8115 |               4 |       25960
       171 |        2088 |               1 |       25650
       250 |       11210 |               3 |       25608
...

Vertica processes the query as follows:

  1. Expands the WITH clause reference to store_orders_fact_new within the primary query.

  2. After expanding the WITH clause, evaluates the primary query.

15.2 - Materialization of WITH clause

When materialization is enabled, Vertica evaluates each WITH clause once, stores results in a temporary table, and references this table as often as the query requires.

When materialization is enabled, Vertica evaluates each WITH clause once, stores results in a temporary table, and references this table as often as the query requires. Vertica drops the temporary table after primary query execution completes.

Materialization can facilitate better performance when WITH clauses are complex—for example, when the WITH clauses contain JOIN and GROUP BY clauses, and are referenced multiple times in the primary query.

If materialization is enabled, WITH statements perform an auto-commit of the user transaction. This occurs even when using EXPLAIN with the WITH statement.

Enabling WITH clause materialization

WITH materialization is set by configuration parameter WithClauseMaterialization, by default set to 0 (disabled). You can enable and disable materialization by setting WithClauseMaterialization at database and session levels, with ALTER DATABASE and ALTER SESSION, respectively:

  • Database:

    => ALTER DATABASE db-spec SET PARAMETER WithClauseMaterialization={ 0 | 1 };
    => ALTER DATABASE db-spec CLEAR PARAMETER WithClauseMaterialization;
    
  • Session: Parameter setting remains in effect until you explicitly clear it, or the session ends.

    => ALTER SESSION SET PARAMETER WithClauseMaterialization={ 0 | 1 };
    => ALTER SESSION CLEAR PARAMETER WithClauseMaterialization;
    

You can also enable WITH materialization for individual queries with the hint ENABLE_WITH_CLAUSE_MATERIALIZATION. Materialization is automatically cleared when the query returns. For example:


=> WITH /*+ENABLE_WITH_CLAUSE_MATERIALIZATION */ revenue AS (
      SELECT vendor_key, SUM(total_order_cost) AS total_revenue
      FROM store.store_orders_fact
      GROUP BY vendor_key ORDER BY 1)
     ...

Processing WITH clauses using EE5 temp relations

By default, when WITH clause queries are reused, Vertica saves those WITH clause query outputs in EE5 temp relations. However, this option can be changed. EE5 temp relation support is set by configuration parameter EnableWITHTempRelReuseLimit, which can be set in the following ways:

  • 0: Disables this feature.

  • 1: Force-saves all WITH clause queries into EE5 temp relations, whether or not they are reused.

  • 2 (default): Saves only reused WITH clause queries into EE5 temp relations.

  • 3 or more: Saves WITH clause queries into EE5 temp relations only when they are used at least this number of times.

EnableWITHTempRelReuseLimit can be set at database and session levels, with ALTER DATABASE and ALTER SESSION, respectively. When WithClauseMaterialization is set to 1, that setting overrides any EnableWITHTempRelReuseLimit settings.

Note that for WITH queries with complex types, temp relations are disabled.

Example

The following example shows a WITH clause that is a good candidate for materialization. The query obtains data for the vendor who has the highest combined order cost for all orders:

-- Enable materialization
=> ALTER SESSION SET PARAMETER WithClauseMaterialization=1;

-- Define WITH clause
=> WITH revenue AS (
      SELECT vendor_key, SUM(total_order_cost) AS total_revenue
      FROM store.store_orders_fact
      GROUP BY vendor_key ORDER BY 1)
-- End WITH clause

-- Primary query
=> SELECT vendor_name, vendor_address, vendor_city, total_revenue
FROM vendor_dimension v, revenue r
WHERE v.vendor_key = r.vendor_key AND total_revenue = (SELECT MAX(total_revenue) FROM revenue )
ORDER BY vendor_name;
   vendor_name    | vendor_address | vendor_city | total_revenue
------------------+----------------+-------------+---------------
 Frozen Suppliers | 471 Mission St | Peoria      |      49877044
(1 row)

Vertica processes this query as follows:

  1. WITH clause revenue evaluates its SELECT statement from table store.store_orders_fact.

  2. Results of the revenue clause are stored in a local temporary table.

  3. Whenever the revenue clause statement is referenced, the results stored in the table are used.

  4. The temporary table is dropped when query execution is complete.

15.3 - WITH clause recursion

A WITH clause that includes the RECURSIVE option iterates over its own output through repeated execution of a UNION or UNION ALL query.

A WITH clause that includes the RECURSIVE option iterates over its own output through repeated execution of a UNION or UNION ALL query. Recursive queries are useful when working with self-referential data—hierarchies such as manager-subordinate relationships, or tree-structured data such as taxonomies.

The configuration parameter WithClauseRecursionLimit—by default set to 8—sets the maximum depth of recursion. You can set this parameter at database and session scopes with ALTER DATABASE and ALTER SESSION, respectively. Recursion continues until it reaches the configured maximum depth, or until the last iteration returns with no data.

You specify a recursive WITH clause as follows:

WITH [ /*+ENABLE_WITH_CLAUSE_MATERIALIZATION*/ ] RECURSIVE
   cte-identifier [ ( column-aliases ) ] AS (
     non-recursive-term
     UNION [ ALL ]
     recursive-term
   )

Non-recursive and recursive terms are separated by UNION or UNION ALL:

  • The non-recursive-term query sets its result set in cte-identifier, which is subject to recursion in recursive-term.

  • The UNION statement's recursive-term recursively iterates over its own output. When recursion is complete, the results of all iterations are compiled and set in cte-identifier.

For example:


=> ALTER SESSION SET PARAMETER WithClauseRecursionLimit=4; -- maximum recursion depth = 4
=> WITH RECURSIVE nums (n) AS (
   SELECT 1 -- non-recursive (base) term
   UNION ALL
     SELECT n+1 FROM nums -- recursive term
  )
SELECT n FROM nums; -- primary query

This simple query executes as follows:

  1. Executes the WITH RECURSIVE clause:

    • Evaluates the non-recursive term SELECT 1, and places the result set—1—in nums.

    • Iterates over the UNION ALL query (SELECT n+1) until the number of iterations is greater than the configuration parameter WithClauseRecursionLimit.

    • Combines the results of all UNION queries and sets the result set in nums, and then exits to the primary query.

  2. Executes the primary query SELECT n FROM nums:

    
     n
    ---
     1
     2
     3
     4
     5
    (5 rows)
    

In this case , WITH RECURSIVE clause exits after four iterations as per WithClauseRecursionLimit. If you restore WithClauseRecursionLimit to its default value of 8, then the clause exits after eight iterations:


=> ALTER SESSION CLEAR PARAMETER WithClauseRecursionLimit;
=> WITH RECURSIVE nums (n) AS (
   SELECT 1
   UNION ALL
     SELECT n+1 FROM nums
  )
SELECT n FROM nums;
 n
---
 1
 2
 3
 4
 5
 6
 7
 8
 9
(9 rows)

Restrictions

The following restrictions apply:

  • The SELECT list of a non-recursive term cannot include the wildcard * (asterisk) or the function MATCH_COLUMNS.

  • A recursive term can reference the target CTE only once.

  • Recursive reference cannot appear within an outer join.

  • Recursive reference cannot appear within a subquery.

  • WITH clauses do not support UNION options ORDER BY, LIMIT, and OFFSET.

Examples

A small software company maintains the following data on employees and their managers:

=> SELECT * FROM personnel.employees ORDER BY emp_id;
 emp_id |   fname   |   lname   | section_id |    section_name     |  section_leader  | leader_id
--------+-----------+-----------+------------+---------------------+------------------+-----------
      0 | Stephen   | Mulligan  |          0 |                     |                  |
      1 | Michael   | North     |        201 | Development         | Zoe Black        |         3
      2 | Megan     | Berry     |        202 | QA                  | Richard Chan     |        18
      3 | Zoe       | Black     |        101 | Product Development | Renuka Patil     |        24
      4 | Tim       | James     |        203 | IT                  | Ebuka Udechukwu  |        17
      5 | Bella     | Tucker    |        201 | Development         | Zoe Black        |         3
      6 | Alexandra | Climo     |        202 | QA                  | Richard Chan     |        18
      7 | Leonard   | Gray      |        203 | IT                  | Ebuka Udechukwu  |        17
      8 | Carolyn   | Henderson |        201 | Development         | Zoe Black        |         3
      9 | Ryan      | Henderson |        201 | Development         | Zoe Black        |         3
     10 | Frank     | Tucker    |        205 | Sales               | Benjamin Glover  |        29
     11 | Nathan    | Ferguson  |        102 | Sales Marketing     | Eric Redfield    |        28
     12 | Kevin     | Rampling  |        101 | Product Development | Renuka Patil     |        24
     13 | Tuy Kim   | Duong     |        201 | Development         | Zoe Black        |         3
     14 | Dwipendra | Sing      |        204 | Tech Support        | Sarah Feldman    |        26
     15 | Dylan     | Wijman    |        206 | Documentation       | Kevin Rampling   |        12
     16 | Tamar     | Sasson    |        207 | Marketing           | Nathan Ferguson  |        11
     17 | Ebuka     | Udechukwu |        101 | Product Development | Renuka Patil     |        24
     18 | Richard   | Chan      |        101 | Product Development | Renuka Patil     |        24
     19 | Maria     | del Rio   |        201 | Development         | Zoe Black        |         3
     20 | Hua       | Song      |        204 | Tech Support        | Sarah Feldman    |        26
     21 | Carmen    | Lopez     |        204 | Tech Support        | Sarah Feldman    |        26
     22 | Edgar     | Mejia     |        206 | Documentation       | Kevin Rampling   |        12
     23 | Riad      | Salim     |        201 | Development         | Zoe Black        |         3
     24 | Renuka    | Patil     |        100 | Executive Office    | Stephen Mulligan |         0
     25 | Rina      | Dsouza    |        202 | QA                  | Richard Chan     |        18
     26 | Sarah     | Feldman   |        101 | Product Development | Renuka Patil     |        24
     27 | Max       | Mills     |        102 | Sales Marketing     | Eric Redfield    |        28
     28 | Eric      | Redfield  |        100 | Executive Office    | Stephen Mulligan |         0
     29 | Benjamin  | Glover    |        102 | Sales Marketing     | Eric Redfield    |        28
     30 | Dominic   | King      |        205 | Sales               | Benjamin Glover  |        29
     32 | Ryan      | Metcalfe  |        206 | Documentation       | Kevin Rampling   |        12
     33 | Piers     | Paige     |        201 | Development         | Zoe Black        |         3
     34 | Nicola    | Kelly     |        207 | Marketing           | Nathan Ferguson  |        11
(34 rows)

You can query this data for employee-manager relationships through WITH RECURSIVE. For example, the following query's WITH RECURSIVE clause gets employee-manager relationships for employee Eric Redfield, including all employees who report directly and indirectly to him:

WITH RECURSIVE managers (employeeID, employeeName, sectionID, section, lead, leadID)
 AS (SELECT emp_id, fname||' '||lname, section_id, section_name, section_leader, leader_id
      FROM personnel.employees WHERE fname||' '||lname = 'Eric Redfield'
 UNION
    SELECT emp_id, fname||' '||lname AS employee_name, section_id, section_name, section_leader, leader_id FROM personnel.employees e
      JOIN managers m ON m.employeeID = e.leader_id)
 SELECT employeeID, employeeName, lead AS 'Reports to', section, leadID from managers ORDER BY sectionID, employeeName;

The WITH RECURSIVE clause defines the CTE managers, and then executes in two phases:

  1. The non-recursive term populates managers with data that it queries from personnel.employees.

  2. The recursive term's UNION query iterates over its own output until, on the fourth cycle, it finds no more data. The results of all iterations are then compiled and set in managers, and the WITH CLAUSE exits to the primary query.

The primary query returns three levels of data from managers—one for each recursive iteration:

Similarly, the following query iterates over the same data to get all employee-manager relationships for employee Richard Chan, who is one level lower in the company chain of command:

WITH RECURSIVE managers (employeeID, employeeName, sectionID, section, lead, leadID)
 AS (SELECT emp_id, fname||' '||lname, section_id, section_name, section_leader, leader_id
      FROM personnel.employees WHERE fname||' '||lname = 'Richard Chan'
 UNION
    SELECT emp_id, fname||' '||lname AS employee_name, section_id, section_name, section_leader, leader_id FROM personnel.employees e
      JOIN managers m ON m.employeeID = e.leader_id)
 SELECT employeeID, employeeName, lead AS 'Reports to', section, leadID from managers ORDER BY sectionID, employeeName;

The WITH RECURSIVE clause executes as before, except this time it finds no more data after two iterations and exits. Accordingly, the primary query returns two levels of data from managers:

WITH RECURSIVE materialization

By default, materialization is disabled. In this case, Vertica rewrites the WITH RECURSIVE query into subqueries, as many as necessary for the required level of recursion.

If recursion is very deep, the high number of query rewrites is liable to incur considerable overhead that adversely affects performance and exhausts system resources. In this case, consider enabling materialization, either with the configuration parameter WithClauseMaterialization, or the hint ENABLE_WITH_CLAUSE_MATERIALIZATION. In either case, intermediate result sets from all recursion levels are written to local temporary tables. When recursion is complete, the intermediate results in all temporary tables are compiled and passed on to the primary query.