The following sections describe how to optimize SQL-99 analytic functions that Vertica supports.
This is the multi-page printable view of this section. Click here to print.
Analytic functions
1 - Empty OVER clauses
The OVER()
clause does not require a windowing clause. If your query uses an analytic function like SUM(x)
and you specify an empty OVER()
clause, the analytic function is used as a reporting function, where the entire input is treated as a single partition; the aggregate returns the same aggregated value for each row of the result set. The query executes on a single node, potentially resulting in poor performance.
If you add a PARTITION BY
clause to the OVER()
clause, the query executes on multiple nodes, improving its performance.
2 - NULL sort order
By default, projection column values are stored in ascending order, but placement of NULL values depends on a column's data type.
NULL placement differences with ORDER BY clauses
The analytic OVER(
window-order-clause
)
and the SQL ORDER BY
clause have slightly different semantics:
OVER(ORDER BY ...)
The analytic window order clause uses the ASC
or DESC
sort order to determine NULLS FIRST
or NULLS LAST
placement for analytic function results. NULL values are placed as follows:
-
ASC
,NULLS LAST
— NULL values appear at the end of the sorted result. -
DESC
,NULLS FIRST
— NULL values appear at the beginning of the sorted result.
(SQL) ORDER BY
The SQL and Vertica ORDER BY
clauses produce different results. The SQL ORDER BY
clause specifies only ascending or descending sort order. The Vertica ORDER BY
clause determines NULL placement based on the column data type:
-
NUMERIC, INTEGER, DATE, TIME, TIMESTAMP, and INTERVAL columns:
NULLS FIRST
(NULL values appear at the beginning of a sorted projection.) -
FLOAT, STRING, and BOOLEAN columns:
NULLS LAST
(NULL values appear at the end of a sorted projection.)
NULL sort options
If you do not care about NULL placement in queries that involve analytic computations, or if you know that columns do not contain any NULL values, specify NULLS AUTO
—irrespective of data type. Vertica chooses the placement that gives the fastest performance, as in the following query. Otherwise, specify NULLS FIRST
or NULLS LAST
.
=> SELECT x, RANK() OVER (ORDER BY x NULLS AUTO) FROM t;
You can carefully formulate queries so Vertica can avoid sorting the data and increase query performance, as illustrated by the following example. Vertica sorts inputs from table t
on column x
, as specified in the OVER(ORDER BY)
clause, and then evaluates RANK()
:
=> CREATE TABLE t (
x FLOAT,
y FLOAT );
=> CREATE PROJECTION t_p (x, y) AS SELECT * FROM t
ORDER BY x, y UNSEGMENTED ALL NODES;
=> SELECT x, RANK() OVER (ORDER BY x) FROM t;
In the preceding SELECT
statement, Vertica eliminates the ORDER BY
clause and executes the query quickly because column x
is a FLOAT data type. As a result, the projection sort order matches the analytic default ordering (ASC + NULLS LAST
). Vertica can also avoid having to sort the data when the underlying projection is already sorted.
However, if column x
is an INTEGER data type, Vertica must sort the data because the projection sort order for INTEGER data types (ASC + NULLS FIRST
) does not match the default analytic ordering (ASC + NULLS LAST
). To help Vertica eliminate the sort, specify the placement of NULLs to match the default ordering:
=> SELECT x, RANK() OVER (ORDER BY x NULLS FIRST) FROM t;
If column x
is a STRING, the following query eliminates the sort:
=> SELECT x, RANK() OVER (ORDER BY x NULLS LAST) FROM t;
If you omit NULLS LAST
in the preceding query, Ver eliminates the sort because ASC + NULLS LAST
is the default sort specification for both the analytic ORDER BY
clause and for string-related columns in Vertica.
See also
3 - Runtime sorting of NULL values in analytic functions
By carefully writing queries or creating your design (or both), you can help the Vertica query optimizer skip sorting all columns in a table when performing an analytic function, which can improve query performance.
To minimize Vertica's need to sort projections during query execution, redefine the employee
table and specify that NULL values are not allowed in the sort fields:
=> DROP TABLE employee CASCADE;
=> CREATE TABLE employee
(empno INT,
deptno INT NOT NULL,
sal INT NOT NULL);
CREATE TABLE
=> CREATE PROJECTION employee_p AS
SELECT * FROM employee
ORDER BY deptno, sal;
CREATE PROJECTION
=> INSERT INTO employee VALUES(101,10,50000);
=> INSERT INTO employee VALUES(103,10,43000);
=> INSERT INTO employee VALUES(104,10,45000);
=> INSERT INTO employee VALUES(105,20,97000);
=> INSERT INTO employee VALUES(108,20,33000);
=> INSERT INTO employee VALUES(109,20,51000);
=> COMMIT;
COMMIT
=> SELECT * FROM employee;
empno | deptno | sal
-------+--------+-------
101 | 10 | 50000
103 | 10 | 43000
104 | 10 | 45000
105 | 20 | 97000
108 | 20 | 33000
109 | 20 | 51000
(6 rows)
=> SELECT deptno, sal, empno, RANK() OVER
(PARTITION BY deptno ORDER BY sal)
FROM employee;
deptno | sal | empno | ?column?
--------+-------+-------+----------
10 | 43000 | 103 | 1
10 | 45000 | 104 | 2
10 | 50000 | 101 | 3
20 | 33000 | 108 | 1
20 | 51000 | 109 | 2
20 | 97000 | 105 | 3
(6 rows)
Tip
If you do not care about NULL placement in queries that involve analytic computations, or if you know that columns contain no NULL values, specifyNULLS AUTO
in your queries.
Vertica attempts to choose the placement that gives the fastest performance. Otherwise, specify NULLS FIRST
or NULLS LAST
.