Analytic functions

The following sections describe how to optimize SQL-99 analytic functions that Vertica supports.

1: Empty OVER clauses
2: NULL sort order
3: Runtime sorting of NULL values in analytic functions

The following sections describe how to optimize SQL-99 analytic functions that Vertica supports.

1 - Empty OVER clauses

The OVER() clause does not require a windowing clause.

The OVER() clause does not require a windowing clause. If your query uses an analytic function like SUM(x) and you specify an empty OVER() clause, the analytic function is used as a reporting function, where the entire input is treated as a single partition; the aggregate returns the same aggregated value for each row of the result set. The query executes on a single node, potentially resulting in poor performance.

If you add a PARTITION BY clause to the OVER() clause, the query executes on multiple nodes, improving its performance.

2 - NULL sort order

By default, projection column values are stored in ascending order, but placement of NULL values depends on a column's data type.

NULL placement differences with ORDER BY clauses

The analytic OVER( window-order-clause) and the SQL ORDER BY clause have slightly different semantics:

OVER(ORDER BY ...)

The analytic window order clause uses the ASC or DESC sort order to determine NULLS FIRST or NULLS LAST placement for analytic function results. NULL values are placed as follows:

ASC, NULLS LAST — NULL values appear at the end of the sorted result.
DESC, NULLS FIRST — NULL values appear at the beginning of the sorted result.

(SQL) ORDER BY

The SQL and Vertica ORDER BY clauses produce different results. The SQL ORDER BY clause specifies only ascending or descending sort order. The Vertica ORDER BY clause determines NULL placement based on the column data type:

NUMERIC, INTEGER, DATE, TIME, TIMESTAMP, and INTERVAL columns: NULLS FIRST (NULL values appear at the beginning of a sorted projection.)
FLOAT, STRING, and BOOLEAN columns: NULLS LAST (NULL values appear at the end of a sorted projection.)

NULL sort options

If you do not care about NULL placement in queries that involve analytic computations, or if you know that columns do not contain any NULL values, specify NULLS AUTO—irrespective of data type. Vertica chooses the placement that gives the fastest performance, as in the following query. Otherwise, specify NULLS FIRST or NULLS LAST.

=> SELECT x, RANK() OVER (ORDER BY x NULLS AUTO) FROM t;

You can carefully formulate queries so Vertica can avoid sorting the data and increase query performance, as illustrated by the following example. Vertica sorts inputs from table t on column x, as specified in the OVER(ORDER BY) clause, and then evaluates RANK():

=> CREATE TABLE t (
    x FLOAT,
    y FLOAT );
=> CREATE PROJECTION t_p (x, y) AS SELECT * FROM t
   ORDER BY x, y UNSEGMENTED ALL NODES;
=> SELECT x, RANK() OVER (ORDER BY x) FROM t;

In the preceding SELECT statement, Vertica eliminates the ORDER BY clause and executes the query quickly because column x is a FLOAT data type. As a result, the projection sort order matches the analytic default ordering (ASC + NULLS LAST). Vertica can also avoid having to sort the data when the underlying projection is already sorted.

However, if column x is an INTEGER data type, Vertica must sort the data because the projection sort order for INTEGER data types (ASC + NULLS FIRST) does not match the default analytic ordering (ASC + NULLS LAST). To help Vertica eliminate the sort, specify the placement of NULLs to match the default ordering:

=> SELECT x, RANK() OVER (ORDER BY x NULLS FIRST) FROM t;

If column x is a STRING, the following query eliminates the sort:

=> SELECT x, RANK() OVER (ORDER BY x NULLS LAST) FROM t;

If you omit NULLS LAST in the preceding query, Ver eliminates the sort because ASC + NULLS LAST is the default sort specification for both the analytic ORDER BY clause and for string-related columns in Vertica.

3 - Runtime sorting of NULL values in analytic functions

By carefully writing queries or creating your design (or both), you can help the Vertica query optimizer skip sorting all columns in a table when performing an analytic function, which can improve query performance.

To minimize Vertica's need to sort projections during query execution, redefine the employee table and specify that NULL values are not allowed in the sort fields:

=> DROP TABLE employee CASCADE;
=> CREATE TABLE employee
   (empno INT,
    deptno INT NOT NULL,
    sal INT NOT NULL);
CREATE TABLE
=> CREATE PROJECTION employee_p AS
   SELECT * FROM employee
   ORDER BY deptno, sal;
CREATE PROJECTION
=> INSERT INTO employee VALUES(101,10,50000);
=> INSERT INTO employee VALUES(103,10,43000);
=> INSERT INTO employee VALUES(104,10,45000);
=> INSERT INTO employee VALUES(105,20,97000);
=> INSERT INTO employee VALUES(108,20,33000);
=> INSERT INTO employee VALUES(109,20,51000);
=> COMMIT;
COMMIT

=> SELECT * FROM employee;
 empno | deptno |  sal
-------+--------+-------
   101 |     10 | 50000
   103 |     10 | 43000
   104 |     10 | 45000
   105 |     20 | 97000
   108 |     20 | 33000
   109 |     20 | 51000
(6 rows)
=> SELECT deptno, sal, empno, RANK() OVER
     (PARTITION BY deptno ORDER BY sal)
   FROM employee;
 deptno |  sal  | empno | ?column?
--------+-------+-------+----------
     10 | 43000 |   103 |        1
     10 | 45000 |   104 |        2
     10 | 50000 |   101 |        3
     20 | 33000 |   108 |        1
     20 | 51000 |   109 |        2
     20 | 97000 |   105 |        3
(6 rows)

Tip

If you do not care about NULL placement in queries that involve analytic computations, or if you know that columns contain no NULL values, specify NULLS AUTO in your queries. Vertica attempts to choose the placement that gives the fastest performance. Otherwise, specify NULLS FIRST or NULLS LAST.