NULL sort order

By default, projection column values are stored in ascending order, but placement of NULL values depends on a column's data type.

By default, projection column values are stored in ascending order, but placement of NULL values depends on a column's data type.

NULL placement differences with ORDER BY clauses

The analytic OVER(window-order-clause) and the SQL ORDER BY clause have slightly different semantics:

OVER(ORDER BY ...)

The analytic window order clause uses the ASC or DESC sort order to determine NULLS FIRST or NULLS LAST placement for analytic function results. NULL values are placed as follows:

  • ASC, NULLS LAST — NULL values appear at the end of the sorted result.

  • DESC, NULLS FIRST — NULL values appear at the beginning of the sorted result.

(SQL) ORDER BY

The SQL and Vertica ORDER BY clauses produce different results. The SQL ORDER BY clause specifies only ascending or descending sort order. The Vertica ORDER BY clause determines NULL placement based on the column data type:

  • NUMERIC, INTEGER, DATE, TIME, TIMESTAMP, and INTERVAL columns: NULLS FIRST (NULL values appear at the beginning of a sorted projection.)

  • FLOAT, STRING, and BOOLEAN columns: NULLS LAST (NULL values appear at the end of a sorted projection.)

NULL sort options

If you do not care about NULL placement in queries that involve analytic computations, or if you know that columns do not contain any NULL values, specify NULLS AUTO—irrespective of data type. Vertica chooses the placement that gives the fastest performance, as in the following query. Otherwise, specify NULLS FIRST or NULLS LAST.

=> SELECT x, RANK() OVER (ORDER BY x NULLS AUTO) FROM t;

You can carefully formulate queries so Vertica can avoid sorting the data and increase query performance, as illustrated by the following example. Vertica sorts inputs from table t on column x, as specified in the OVER(ORDER BY) clause, and then evaluates RANK():

=> CREATE TABLE t (
    x FLOAT,
    y FLOAT );
=> CREATE PROJECTION t_p (x, y) AS SELECT * FROM t
   ORDER BY x, y UNSEGMENTED ALL NODES;
=> SELECT x, RANK() OVER (ORDER BY x) FROM t;

In the preceding SELECT statement, Vertica eliminates the ORDER BY clause and executes the query quickly because column x is a FLOAT data type. As a result, the projection sort order matches the analytic default ordering (ASC + NULLS LAST). Vertica can also avoid having to sort the data when the underlying projection is already sorted.

However, if column x is an INTEGER data type, Vertica must sort the data because the projection sort order for INTEGER data types (ASC + NULLS FIRST) does not match the default analytic ordering (ASC + NULLS LAST). To help Vertica eliminate the sort, specify the placement of NULLs to match the default ordering:

=> SELECT x, RANK() OVER (ORDER BY x NULLS FIRST) FROM t;

If column x is a STRING, the following query eliminates the sort:

=> SELECT x, RANK() OVER (ORDER BY x NULLS LAST) FROM t;

If you omit NULLS LAST in the preceding query, Ver eliminates the sort because ASC + NULLS LAST is the default sort specification for both the analytic ORDER BY clause and for string-related columns in Vertica.

See also