This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Analytic functions

All analytic functions in this section with an aggregate counterpart are appended with [Analytics] in the heading to avoid confusion between the two function types.

Vertica analytics are SQL functions based on the ANSI 99 standard. These functions handle complex analysis and reporting tasks—for example:

  • Rank the longest-standing customers in a particular state.

  • Calculate the moving average of retail volume over a specified time.

  • Find the highest score among all students in the same grade.

  • Compare the current sales bonus that salespersons received against their previous bonus.

Analytic functions return aggregate results but they do not group the result set. They return the group value multiple times, once per record. You can sort group values, or partitions, using a window ORDER BY clause, but the order affects only the function result set, not the entire query result set.

Syntax

General

analytic-function(arguments) OVER(
  [ window-partition-clause ]
  [ window-order-clause [ window-frame-clause ] ]
)

With named window

analytic-function(arguments) OVER(
  [  named-window [ window-frame-clause ] ]
)

Parameters

analytic-function(arguments)
A Vertica analytic function and its arguments.
OVER
Specifies how to partition, sort, and window frame function input with respect to the current row. The input data is the result set that the query returns after it evaluates FROM, WHERE, GROUP BY, and HAVING clauses.

An empty OVER clause provides the best performance for single threaded queries on a single node.

window-partition-clause
Groups input rows according to one or more columns or expressions.

If you omit this clause, no grouping occurs and the analytic function processes all input rows as a single partition.

window-order-clause
Optionally specifies how to sort rows that are supplied to the analytic function. If the OVER clause also includes a partition clause, rows are sorted within each partition.
window-frame-clause
Only valid for some analytic functions, specifies as input a set of rows relative to the row that is currently being evaluated by the analytic function. After the function processes that row and its window, Vertica advances the current row and adjusts the window boundaries accordingly.
named-window
The name of a window that you define in the same query with a window-name-clause. This definition encapsulates window partitioning and sorting. Named windows are useful when the query invokes multiple analytic functions with similar OVER clauses.

A window name clause cannot specify a window frame clause. However, you can qualify the named window in an OVER clause with a window frame clause.

Requirements

The following requirements apply to analytic functions:

  • All require an OVER clause. Each function has its own OVER clause requirements. For example, you can supply an empty OVER clause for some analytic aggregate functions such as SUM. For other functions, window frame and order clauses might be required, or might be invalid.

  • Analytic functions can be invoked only in a query's SELECT and ORDER BY clauses.

  • Analytic functions cannot be nested. For example, the following query is not allowed:

    => SELECT MEDIAN(RANK() OVER(ORDER BY sal) OVER()).
    
  • WHERE, GROUP BY and HAVING operators are technically not part of the analytic function. However, they determine input to that function.

See also

1 - ARGMAX [analytic]

This function is patterned after the mathematical function argmax(f(x)), which returns the value of x that maximizes f(x).

This function is patterned after the mathematical function argmax(f(x)), which returns the value of x that maximizes f(x). Similarly, ARGMAX takes two arguments target and arg, where both are columns or column expressions in the queried dataset. ARGMAX finds the row with the largest non-null value in target and returns the value of arg in that row. If multiple rows contain the largest target value, ARGMAX returns arg from the first row that it finds.

Behavior type

Immutable

Syntax

ARGMAX ( target, arg )  OVER ( [ PARTITION BY expression[,...] ] [ window-order-clause ] )

Arguments

target, arg
Columns in the queried dataset.
OVER()
Specifies the following window clauses:
  • PARTITION BY expression: Groups (partitions) input rows according to the values in expression, which resolves to one or more columns in the queried dataset. If you omit this clause, ARGMAX processes all input rows as a single partition.

  • window-order-clause: Specifies how to sort input rows. If the OVER clause also includes a partition clause, rows are sorted separately within each partition.

For details, see Analytic Functions.

Examples

Create and populate table service_info, which contains information on various services, their respective development groups, and their userbase. A NULL in the users column indicates that the service has not been released, and so it cannot have users.

=> CREATE TABLE service_info(dev_group VARCHAR(10), product_name VARCHAR(30), users INT);
=> COPY t FROM stdin NULL AS 'null';
>> iris|chat|48193
>> aspen|trading|3000
>> orchid|cloud|990322
>> iris|video call| 10203
>> daffodil|streaming|44123
>> hydrangea|password manager|null
>> hydrangea|totp|1837363
>> daffodil|clip share|3000
>> hydrangea|e2e sms|null
>> rose|crypto|null
>> iris|forum|48193
>> \.

ARGMAX returns the value in the product_name column that maximizes the value in the users column. In this case, ARGMAX returns totp, which indicates that the totp service has the largest user base:


=> SELECT dev_group, product_name, users, ARGMAX(users, product_name) OVER (ORDER BY dev_group ASC) FROM service_info;
 dev_group |   product_name   |  users  | ARGMAX
-----------+------------------+---------+--------
 aspen     | trading          |    3000 | totp
 daffodil  | clip share       |    3000 | totp
 daffodil  | streaming        |   44123 | totp
 hydrangea | e2e sms          |         | totp
 hydrangea | password manager |         | totp
 hydrangea | totp             | 1837363 | totp
 iris      | chat             |   48193 | totp
 iris      | forum            |   48193 | totp
 iris      | video call       |   10203 | totp
 orchid    | cloud            |  990322 | totp
 rose      | crypto           |         | totp
(11 rows)

The next query partitions the data on dev_group to identify the most popular service created by each development group. ARGMAX returns NULL if the partition's users column contains only NULL values and breaks ties using the first value in product_name from the top of the partition.


=> SELECT dev_group, product_name, users, ARGMAX(users, product_name) OVER (PARTITION BY dev_group ORDER BY product_name ASC) FROM service_info;
 dev_group |   product_name   |  users  |  ARGMAX
-----------+------------------+---------+-----------
 iris      | chat             |   48193 | chat
 iris      | forum            |   48193 | chat
 iris      | video call       |   10203 | chat
 orchid    | cloud            |  990322 | cloud
 aspen     | trading          |    3000 | trading
 daffodil  | clip share       |    3000 | streaming
 daffodil  | streaming        |   44123 | streaming
 rose      | crypto           |         |
 hydrangea | e2e sms          |         | totp
 hydrangea | password manager |         | totp
 hydrangea | totp             | 1837363 | totp
(11 rows)

See also

ARGMIN [analytic]

2 - ARGMIN [analytic]

This function is patterned after the mathematical function argmin(f(x)), which returns the value of x that minimizes f(x).

This function is patterned after the mathematical function argmin(f(x)), which returns the value of x that minimizes f(x). Similarly, ARGMIN takes two arguments target and arg, where both are columns or column expressions in the queried dataset. ARGMIN finds the row with the smallest non-null value in target and returns the value of arg in that row. If multiple rows contain the smallest target value, ARGMIN returns arg from the first row that it finds.

Behavior type

Immutable

Syntax

ARGMIN ( target, arg )  OVER ( [ PARTITION BY expression[,...] ] [ window-order-clause ] )

Arguments

target, arg
Columns in the queried dataset.
OVER()
Specifies the following window clauses:
  • PARTITION BY expression: Groups (partitions) input rows according to the values in expression, which resolves to one or more columns in the queried dataset. If you omit this clause, ARGMIN processes all input rows as a single partition.

  • window-order-clause: Specifies how to sort input rows. If the OVER clause also includes a partition clause, rows are sorted separately within each partition.

For details, see Analytic Functions.

Examples

Create and populate table service_info, which contains information on various services, their respective development groups, and their userbase. A NULL in the users column indicates that the service has not been released, and so it cannot have users.

=> CREATE TABLE service_info(dev_group VARCHAR(10), product_name VARCHAR(30), users INT);
=> COPY t FROM stdin NULL AS 'null';
>> iris|chat|48193
>> aspen|trading|3000
>> orchid|cloud|990322
>> iris|video call| 10203
>> daffodil|streaming|44123
>> hydrangea|password manager|null
>> hydrangea|totp|1837363
>> daffodil|clip share|3000
>> hydrangea|e2e sms|null
>> rose|crypto|null
>> iris|forum|48193
>> \.

ARGMIN returns the value in the product_name column that minimizes the value in the users column. In this case, ARGMIN returns totp, which indicates that the totp service has the smallest user base:


=> SELECT dev_group, product_name, users, ARGMIN(users, product_name) OVER (ORDER BY dev_group ASC) FROM service_info;
 dev_group |   product_name   |  users  | ARGMIN
-----------+------------------+---------+---------
 aspen     | trading          |    3000 | trading
 daffodil  | clip share       |    3000 | trading
 daffodil  | streaming        |   44123 | trading
 hydrangea | e2e sms          |         | trading
 hydrangea | password manager |         | trading
 hydrangea | totp             | 1837363 | trading
 iris      | chat             |   48193 | trading
 iris      | forum            |   48193 | trading
 iris      | video call       |   10203 | trading
 orchid    | cloud            |  990322 | trading
 rose      | crypto           |         | trading
(11 rows)

The next query partitions the data on dev_group to identify the least popular service created by each development group. ARGMIN returns NULL if the partition's users column contains only NULL values and breaks ties using the first value in product_name from the top of the partition.


=> SELECT dev_group, product_name, users, ARGMIN(users, product_name) OVER (PARTITION BY dev_group ORDER BY product_name ASC) FROM service_info;
 dev_group |   product_name   |  users  |   ARGMIN
-----------+------------------+---------+------------
 iris      | chat             |   48193 | video call
 iris      | forum            |   48193 | video call
 iris      | video call       |   10203 | video call
 orchid    | cloud            |  990322 | cloud
 aspen     | trading          |    3000 | trading
 daffodil  | clip share       |    3000 | clip share
 daffodil  | streaming        |   44123 | clip share
 rose      | crypto           |         |
 hydrangea | e2e sms          |         | totp
 hydrangea | password manager |         | totp
 hydrangea | totp             | 1837363 | totp
(11 rows)

See also

ARGMAX [analytic]

3 - AVG [analytic]

Computes an average of an expression in a group within a.

Computes an average of an expression in a group within a window. AVG returns the same data type as the expression's numeric data type.

The AVG analytic function differs from the AVG aggregate function, which computes the average of an expression over a group of rows.

Behavior type

Immutable

Syntax

AVG ( expression ) OVER (
    [ window-partition-clause ]
    [ window-order-clause ]
    [ window-frame-clause ] )

Parameters

expression
Any data that can be implicitly converted to a numeric data type.
OVER()
See Analytic Functions.

Overflow handling

By default, Vertica allows silent numeric overflow when you call this function on numeric data types. For more information on this behavior and how to change it, seeNumeric data type overflow with SUM, SUM_FLOAT, and AVG.

Examples

The following query finds the sales for that calendar month and returns a running/cumulative average (sometimes called a moving average) using the default window of RANGE UNBOUNDED PRECEDING AND CURRENT ROW:

=> SELECT calendar_month_number_in_year Mo, SUM(product_price) Sales,
   AVG(SUM(product_price)) OVER (ORDER BY calendar_month_number_in_year)::INTEGER Average
   FROM product_dimension pd, date_dimension dm, inventory_fact if
   WHERE dm.date_key = if.date_key AND pd.product_key = if.product_key GROUP BY Mo;
 Mo |  Sales   | Average
----+----------+----------
  1 | 23869547 | 23869547
  2 | 19604661 | 21737104
  3 | 22877913 | 22117374
  4 | 22901263 | 22313346
  5 | 23670676 | 22584812
  6 | 22507600 | 22571943
  7 | 21514089 | 22420821
  8 | 24860684 | 22725804
  9 | 21687795 | 22610470
 10 | 23648921 | 22714315
 11 | 21115910 | 22569005
 12 | 24708317 | 22747281
(12 rows)

To return a moving average that is not a running (cumulative) average, the window can specify ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING:

=> SELECT calendar_month_number_in_year Mo, SUM(product_price) Sales,
   AVG(SUM(product_price)) OVER (ORDER BY calendar_month_number_in_year
     ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)::INTEGER Average
   FROM product_dimension pd, date_dimension dm, inventory_fact if
   WHERE dm.date_key = if.date_key AND pd.product_key = if.product_key GROUP BY Mo;
 Mo |  Sales   | Average
----+----------+----------
  1 | 23869547 | 22117374
  2 | 19604661 | 22313346
  3 | 22877913 | 22584812
  4 | 22901263 | 22312423
  5 | 23670676 | 22694308
  6 | 22507600 | 23090862
  7 | 21514089 | 22848169
  8 | 24860684 | 22843818
  9 | 21687795 | 22565480
 10 | 23648921 | 23204325
 11 | 21115910 | 22790236
 12 | 24708317 | 23157716
(12 rows)

See also

4 - BOOL_AND [analytic]

Returns the Boolean value of an expression within a.

Returns the Boolean value of an expression within a window. If all input values are true, BOOL_AND returns t. Otherwise, it returns f.

Behavior type

Immutable

Syntax

BOOL_AND ( expression ) OVER (
    [ window-partition-clause ]
    [ window-order-clause ]
    [ window-frame-clause ] )

Parameters

expression
A Boolean data type or any non-Boolean data type that can be implicitly converted to a Boolean data type. The function returns a Boolean value.
OVER()
See Analytic Functions.

Examples

The following example illustrates how you can use the BOOL_AND, BOOL_OR, and BOOL_XOR analytic functions. The sample table, employee, includes a column for type of employee and years paid.

=> CREATE TABLE employee(emptype VARCHAR, yearspaid VARCHAR);
CREATE TABLE

Insert sample data into the table to show years paid. In more than one case, an employee could be paid more than once within one year.

=> INSERT INTO employee
SELECT 'contractor1', '2014'
UNION ALL
SELECT 'contractor2', '2015'
UNION ALL
SELECT 'contractor3', '2014'
UNION ALL
SELECT 'contractor1', '2014'
UNION ALL
SELECT 'contractor2', '2014'
UNION ALL
SELECT 'contractor3', '2015'
UNION ALL
SELECT 'contractor4', '2014'
UNION ALL
SELECT 'contractor4', '2014'
UNION ALL
SELECT 'contractor5', '2015'
UNION ALL
SELECT 'contractor5', '2016';
 OUTPUT
--------
     10
(1 row)

Query the table. The result shows employees that were paid twice in 2014 (BOOL_AND), once or twice in 2014 (BOOL_OR), and specifically not more than once in 2014 (BOOL_XOR).

=> SELECT DISTINCT emptype,
BOOL_AND(yearspaid='2014') OVER (PARTITION BY emptype) AS paidtwicein2014,
BOOL_OR(yearspaid='2014') OVER (PARTITION BY emptype) AS paidonceortwicein2014,
BOOL_XOR(yearspaid='2014') OVER (PARTITION BY emptype) AS paidjustoncein2014
FROM employee;


   emptype   | paidtwicein2014 | paidonceortwicein2014 | paidjustoncein2014
-------------+-----------------+-----------------------+--------------------
 contractor1 | t               | t                     | f
 contractor2 | f               | t                     | t
 contractor3 | f               | t                     | t
 contractor4 | t               | t                     | f
 contractor5 | f               | f                     | f
(5 rows)

See also

5 - BOOL_OR [analytic]

Returns the Boolean value of an expression within a.

Returns the Boolean value of an expression within a window. If at least one input value is true, BOOL_OR returns t. Otherwise, it returns f.

Behavior type

Immutable

Syntax

BOOL_OR ( expression ) OVER (
    [ window-partition-clause ]
    [ window-order-clause ]
    [ window-frame-clause ] )

Parameters

expression
A Boolean data type or any non-Boolean data type that can be implicitly converted to a Boolean data type. The function returns a Boolean value.
OVER()
See Analytic Functions.

Examples

The following example illustrates how you can use the BOOL_AND, BOOL_OR, and BOOL_XOR analytic functions. The sample table, employee, includes a column for type of employee and years paid.

=> CREATE TABLE employee(emptype VARCHAR, yearspaid VARCHAR);
CREATE TABLE

Insert sample data into the table to show years paid. In more than one case, an employee could be paid more than once within one year.

=> INSERT INTO employee
SELECT 'contractor1', '2014'
UNION ALL
SELECT 'contractor2', '2015'
UNION ALL
SELECT 'contractor3', '2014'
UNION ALL
SELECT 'contractor1', '2014'
UNION ALL
SELECT 'contractor2', '2014'
UNION ALL
SELECT 'contractor3', '2015'
UNION ALL
SELECT 'contractor4', '2014'
UNION ALL
SELECT 'contractor4', '2014'
UNION ALL
SELECT 'contractor5', '2015'
UNION ALL
SELECT 'contractor5', '2016';
 OUTPUT
--------
     10
(1 row)

Query the table. The result shows employees that were paid twice in 2014 (BOOL_AND), once or twice in 2014 (BOOL_OR), and specifically not more than once in 2014 (BOOL_XOR).

=> SELECT DISTINCT emptype,
BOOL_AND(yearspaid='2014') OVER (PARTITION BY emptype) AS paidtwicein2014,
BOOL_OR(yearspaid='2014') OVER (PARTITION BY emptype) AS paidonceortwicein2014,
BOOL_XOR(yearspaid='2014') OVER (PARTITION BY emptype) AS paidjustoncein2014
FROM employee;


   emptype   | paidtwicein2014 | paidonceortwicein2014 | paidjustoncein2014
-------------+-----------------+-----------------------+--------------------
 contractor1 | t               | t                     | f
 contractor2 | f               | t                     | t
 contractor3 | f               | t                     | t
 contractor4 | t               | t                     | f
 contractor5 | f               | f                     | f
(5 rows)

See also

6 - BOOL_XOR [analytic]

Returns the Boolean value of an expression within a.

Returns the Boolean value of an expression within a window. If only one input value is true, BOOL_XOR returns t. Otherwise, it returns f.

Behavior type

Immutable

Syntax

BOOL_XOR ( expression ) OVER (
    [ window-partition-clause ]
    [ window-order-clause ]
    [ window-frame-clause ] )

Parameters

expression
A Boolean data type or any non-Boolean data type that can be implicitly converted to a Boolean data type. The function returns a Boolean value.
OVER()
See Analytic Functions.

Examples

The following example illustrates how you can use the BOOL_AND, BOOL_OR, and BOOL_XOR analytic functions. The sample table, employee, includes a column for type of employee and years paid.

=> CREATE TABLE employee(emptype VARCHAR, yearspaid VARCHAR);
CREATE TABLE

Insert sample data into the table to show years paid. In more than one case, an employee could be paid more than once within one year.

=> INSERT INTO employee
SELECT 'contractor1', '2014'
UNION ALL
SELECT 'contractor2', '2015'
UNION ALL
SELECT 'contractor3', '2014'
UNION ALL
SELECT 'contractor1', '2014'
UNION ALL
SELECT 'contractor2', '2014'
UNION ALL
SELECT 'contractor3', '2015'
UNION ALL
SELECT 'contractor4', '2014'
UNION ALL
SELECT 'contractor4', '2014'
UNION ALL
SELECT 'contractor5', '2015'
UNION ALL
SELECT 'contractor5', '2016';
 OUTPUT
--------
     10
(1 row)

Query the table. The result shows employees that were paid twice in 2014 (BOOL_AND), once or twice in 2014 (BOOL_OR), and specifically not more than once in 2014 (BOOL_XOR).

=> SELECT DISTINCT emptype,
BOOL_AND(yearspaid='2014') OVER (PARTITION BY emptype) AS paidtwicein2014,
BOOL_OR(yearspaid='2014') OVER (PARTITION BY emptype) AS paidonceortwicein2014,
BOOL_XOR(yearspaid='2014') OVER (PARTITION BY emptype) AS paidjustoncein2014
FROM employee;


   emptype   | paidtwicein2014 | paidonceortwicein2014 | paidjustoncein2014
-------------+-----------------+-----------------------+--------------------
 contractor1 | t               | t                     | f
 contractor2 | f               | t                     | t
 contractor3 | f               | t                     | t
 contractor4 | t               | t                     | f
 contractor5 | f               | f                     | f
(5 rows)

See also

7 - CONDITIONAL_CHANGE_EVENT [analytic]

Assigns an event window number to each row, starting from 0, and increments by 1 when the result of evaluating the argument expression on the current row differs from that on the previous row.

Assigns an event window number to each row, starting from 0, and increments by 1 when the result of evaluating the argument expression on the current row differs from that on the previous row.

Behavior type

Immutable

Syntax

CONDITIONAL_CHANGE_EVENT ( expression ) OVER (
    [ window-partition-clause ]
    window-order-clause )

Parameters

expression
SQL scalar expression that is evaluated on an input record. The result of *expression *can be of any data type.
OVER()
See Analytic Functions.

Notes

The analytic window-order-clause is required but the window-partition-clause is optional.

Examples

=> SELECT CONDITIONAL_CHANGE_EVENT(bid)
          OVER (PARTITION BY symbol ORDER BY ts) AS cce
   FROM TickStore;

The system returns an error when no ORDER BY clause is present:

=> SELECT CONDITIONAL_CHANGE_EVENT(bid)
          OVER (PARTITION BY symbol) AS cce
   FROM TickStore;

ERROR:  conditional_change_event must contain an
ORDER BY clause within its analytic clause

For more examples, see Event-based windows.

See also

8 - CONDITIONAL_TRUE_EVENT [analytic]

Assigns an event window number to each row, starting from 0, and increments the number by 1 when the result of the boolean argument expression evaluates true.

Assigns an event window number to each row, starting from 0, and increments the number by 1 when the result of the boolean argument expression evaluates true. For example, given a sequence of values for column a, as follows:

( 1, 2, 3, 4, 5, 6 )

CONDITIONAL_TRUE_EVENT(a > 3) returns 0, 0, 0, 1, 2, 3.

Behavior type

Immutable

Syntax

CONDITIONAL_TRUE_EVENT ( boolean-expression ) OVER (
    [ window-partition-clause ]
    window-order-clause )

Parameters

boolean-expression
SQL scalar expression that is evaluated on an input record, type BOOLEAN.
OVER()
See Analytic functions.

Notes

The analytic window-order-clause is required but the window-partition-clause is optional.

Examples

> SELECT CONDITIONAL_TRUE_EVENT(bid > 10.6)
     OVER(PARTITION BY bid ORDER BY ts) AS cte
   FROM Tickstore;

The system returns an error if the ORDER BY clause is omitted:

> SELECT CONDITIONAL_TRUE_EVENT(bid > 10.6)
      OVER(PARTITION BY bid) AS cte
   FROM Tickstore;

ERROR:  conditional_true_event must contain an ORDER BY
clause within its analytic clause

For more examples, see Event-based windows.

See also

9 - COUNT [analytic]

Counts occurrences within a group within a.

Counts occurrences within a group within a window. If you specify * or some non-null constant, COUNT() counts all rows.

Behavior type

Immutable

Syntax

COUNT ( expression ) OVER (
    [ window-partition-clause ]
    [ window-order-clause ]
    [ window-frame-clause ] )

Parameters

expression
Returns the number of rows in each group for which the expression is not null. Can be any expression resulting in BIGINT.
OVER()
See Analytic Functions.

Examples

Using the schema defined in Window framing, the following COUNT function omits window order and window frame clauses; otherwise Vertica would treat it as a window aggregate. Think of the window of reporting aggregates as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

=> SELECT deptno, sal, empno, COUNT(sal)
     OVER (PARTITION BY deptno) AS count FROM emp;

 deptno | sal | empno | count
--------+-----+-------+-------
     10 | 101 |     1 |     2
     10 | 104 |     4 |     2
     20 | 110 |    10 |     6
     20 | 110 |     9 |     6
     20 | 109 |     7 |     6
     20 | 109 |     6 |     6
     20 | 109 |     8 |     6
     20 | 109 |    11 |     6
     30 | 105 |     5 |     3
     30 | 103 |     3 |     3
     30 | 102 |     2 |     3

Using ORDER BY sal creates a moving window query with default window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

=> SELECT deptno, sal, empno, COUNT(sal)
     OVER (PARTITION BY deptno ORDER BY sal) AS count
   FROM emp;

 deptno | sal | empno | count
--------+-----+-------+-------
     10 | 101 |     1 |     1
     10 | 104 |     4 |     2
     20 | 100 |    11 |     1
     20 | 109 |     7 |     4
     20 | 109 |     6 |     4
     20 | 109 |     8 |     4
     20 | 110 |    10 |     6
     20 | 110 |     9 |     6
     30 | 102 |     2 |     1
     30 | 103 |     3 |     2
     30 | 105 |     5 |     3

Using the VMart schema, the following query finds the number of employees who make less than or equivalent to the hourly rate of the current employee. The query returns a running/cumulative average (sometimes called a moving average) using the default window of RANGE UNBOUNDED PRECEDING AND CURRENT ROW:

=> SELECT employee_last_name AS "last_name", hourly_rate, COUNT(*)
   OVER (ORDER BY hourly_rate) AS moving_count from employee_dimension;

 last_name  | hourly_rate | moving_count
------------+-------------+--------------
 Gauthier   |           6 |            4
 Taylor     |           6 |            4
 Jefferson  |           6 |            4
 Nielson    |           6 |            4
 McNulty    |        6.01 |           11
 Robinson   |        6.01 |           11
 Dobisz     |        6.01 |           11
 Williams   |        6.01 |           11
 Kramer     |        6.01 |           11
 Miller     |        6.01 |           11
 Wilson     |        6.01 |           11
 Vogel      |        6.02 |           14
 Moore      |        6.02 |           14
 Vogel      |        6.02 |           14
 Carcetti   |        6.03 |           19
...

To return a moving average that is not also a running (cumulative) average, the window should specify ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING:

=> SELECT employee_last_name AS "last_name", hourly_rate, COUNT(*)
      OVER (ORDER BY hourly_rate ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)
   AS moving_count from employee_dimension;

See also

10 - CUME_DIST [analytic]

Calculates the cumulative distribution, or relative rank, of the current row with regard to other rows in the same partition within a .

Calculates the cumulative distribution, or relative rank, of the current row with regard to other rows in the same partition within a window.

CUME_DIST() returns a number greater then 0 and less then or equal to 1, where the number represents the relative position of the specified row within a group of n rows. For a row x (assuming ASC ordering), the CUME_DIST of x is the number of rows with values lower than or equal to the value of x, divided by the number of rows in the partition. For example, in a group of three rows, the cumulative distribution values returned would be 1/3, 2/3, and 3/3.

Behavior type

Immutable

Syntax

CUME_DIST ( ) OVER (
    [ window-partition-clause ]
    window-order-clause  )

Parameters

OVER()
See Analytic Functions.

Examples

The following example returns the cumulative distribution of sales for different transaction types within each month of the first quarter.

=> SELECT calendar_month_name AS month, tender_type, SUM(sales_quantity),
       CUME_DIST()
   OVER (PARTITION BY calendar_month_name ORDER BY SUM(sales_quantity)) AS
CUME_DIST
   FROM store.store_sales_fact JOIN date_dimension
   USING(date_key) WHERE calendar_month_name IN ('January','February','March')
   AND tender_type NOT LIKE 'Other'
   GROUP BY calendar_month_name, tender_type;


  month   | tender_type |  SUM   | CUME_DIST
----------+-------------+--------+-----------
 March    | Credit      | 469858 |      0.25
 March    | Cash        | 470449 |       0.5
 March    | Check       | 473033 |      0.75
 March    | Debit       | 475103 |         1
 January  | Cash        | 441730 |      0.25
 January  | Debit       | 443922 |       0.5
 January  | Check       | 446297 |      0.75
 January  | Credit      | 450994 |         1
 February | Check       | 425665 |      0.25
 February | Debit       | 426726 |       0.5
 February | Credit      | 430010 |      0.75
 February | Cash        | 430767 |         1
(12 rows)

See also

11 - DENSE_RANK [analytic]

Within each window partition, ranks all rows in the query results set according to the order specified by the window's ORDER BY clause.

Within each window partition, ranks all rows in the query results set according to the order specified by the window's ORDER BY clause. A DENSE_RANK function returns a sequence of ranking numbers without any gaps.

DENSE_RANK executes as follows:

  1. Sorts partition rows as specified by the ORDER BY clause.

  2. Compares the ORDER BY values of the preceding row and current row and ranks the current row as follows:

    • If ORDER BY values are the same, the current row gets the same ranking as the preceding row.

    • If the ORDER BY values are different, DENSE_RANK increments or decrements the current row's ranking by 1, depending whether sort order is ascending or descending.

DENSE_RANK always changes the ranking by 1, so no gaps appear in the ranking sequence. The largest rank value is the number of unique ORDER BY values returned by the query.

Behavior type

Immutable

Syntax

DENSE_RANK() OVER (
    [ window-partition-clause ]
    window-order-clause  )

Parameters

OVER()
See Analytic Functions.

See Analytic Functions

Compared with RANK

RANK leaves gaps in the ranking sequence, while DENSE_RANK does not. The example below compares the behavior of the two functions.

Examples

The following query invokes RANK and DENSE_RANK to rank customers by annual income. The two functions return different rankings, as follows:

  • If annual_salary contains duplicate values, RANK() inserts duplicate rankings and then skips one or more values—for example, from 4 to 6 and 7 to 9.

  • In the parallel column Dense Rank, DENSE_RANK() also inserts duplicate rankings, but leaves no gaps in the rankings sequence:

=>  SELECT employee_region region, employee_key, annual_salary,
     RANK() OVER (PARTITION BY employee_region ORDER BY annual_salary) Rank,
     DENSE_RANK() OVER (PARTITION BY employee_region ORDER BY annual_salary) "Dense Rank"
     FROM employee_dimension;
              region              | employee_key | annual_salary | Rank | Dense Rank
----------------------------------+--------------+---------------+------+------------
 West                             |         5248 |          1200 |    1 |          1
 West                             |         6880 |          1204 |    2 |          2
 West                             |         5700 |          1214 |    3 |          3
 West                             |         9857 |          1218 |    4 |          4
 West                             |         6014 |          1218 |    4 |          4
 West                             |         9221 |          1220 |    6 |          5
 West                             |         7646 |          1222 |    7 |          6
 West                             |         6621 |          1222 |    7 |          6
 West                             |         6488 |          1224 |    9 |          7
 West                             |         7659 |          1226 |   10 |          8
 West                             |         7432 |          1226 |   10 |          8
 West                             |         9905 |          1226 |   10 |          8
 West                             |         9021 |          1228 |   13 |          9
 ...
 West                             |           56 |        963104 | 2794 |       2152
 West                             |          100 |        992363 | 2795 |       2153
 East                             |         8353 |          1200 |    1 |          1
 East                             |         9743 |          1202 |    2 |          2
 East                             |         9975 |          1202 |    2 |          2
 East                             |         9205 |          1204 |    4 |          3
 East                             |         8894 |          1206 |    5 |          4
 East                             |         7740 |          1206 |    5 |          4
 East                             |         7324 |          1208 |    7 |          5
 East                             |         6505 |          1208 |    7 |          5
 East                             |         5404 |          1208 |    7 |          5
 East                             |         5010 |          1208 |    7 |          5
 East                             |         9114 |          1212 |   11 |          6
 ...

See also

SQL analytics

12 - EXPONENTIAL_MOVING_AVERAGE [analytic]

Calculates the exponential moving average (EMA) of expression E with smoothing factor X.

Calculates the exponential moving average (EMA) of expression E with smoothing factor X. An EMA differs from a simple moving average in that it provides a more stable picture of changes to data over time.

The EMA is calculated by adding the previous EMA value to the current data point scaled by the smoothing factor, as in the following formula:

EMA=EMA0 + (X * (E-EMA0))

where:

  • E is the current data point

  • EMA0 is the previous row's EMA value.

  • X is the smoothing factor.

This function also works at the row level. For example, EMA assumes the data in a given column is sampled at uniform intervals. If the users' data points are sampled at non-uniform intervals, they should run the time series gap filling and interpolation (GFI) operations before EMA()

Behavior type

Immutable

Syntax

EXPONENTIAL_MOVING_AVERAGE ( E, X ) OVER (
    [ window-partition-clause ]
    window-order-clause  )

Parameters

E
The value whose average is calculated over a set of rows. Can be INTEGER, FLOAT or NUMERIC type and must be a constant.
X
A positive FLOAT value between 0 and 1 that is used as the smoothing factor.
OVER()
See Analytic Functions.

Examples

The following example uses time series gap filling and interpolation (GFI) first in a subquery, and then performs an EXPONENTIAL_MOVING_AVERAGE operation on the subquery result.

Create a simple four-column table:

=> CREATE TABLE ticker(
     time TIMESTAMP,
     symbol VARCHAR(8),
     bid1 FLOAT,
     bid2 FLOAT );

Insert some data, including nulls, so GFI can do its interpolation and gap filling:

=> INSERT INTO ticker VALUES ('2009-07-12 03:00:00', 'ABC', 60.45, 60.44);
=> INSERT INTO ticker VALUES ('2009-07-12 03:00:01', 'ABC', 60.49, 65.12);
=> INSERT INTO ticker VALUES ('2009-07-12 03:00:02', 'ABC', 57.78, 59.25);
=> INSERT INTO ticker VALUES ('2009-07-12 03:00:03', 'ABC', null, 65.12);
=> INSERT INTO ticker VALUES ('2009-07-12 03:00:04', 'ABC', 67.88, null);
=> INSERT INTO ticker VALUES ('2009-07-12 03:00:00', 'XYZ', 47.55, 40.15);
=> INSERT INTO ticker VALUES ('2009-07-12 03:00:01', 'XYZ', 44.35, 46.78);
=> INSERT INTO ticker VALUES ('2009-07-12 03:00:02', 'XYZ', 71.56, 75.78);
=> INSERT INTO ticker VALUES ('2009-07-12 03:00:03', 'XYZ', 85.55, 70.21);
=> INSERT INTO ticker VALUES ('2009-07-12 03:00:04', 'XYZ', 45.55, 58.65);
=> COMMIT;

Query the table that you just created to you can see the output:

=> SELECT * FROM ticker;
        time         | symbol | bid1  | bid2
---------------------+--------+-------+-------
 2009-07-12 03:00:00 | ABC    | 60.45 | 60.44
 2009-07-12 03:00:01 | ABC    | 60.49 | 65.12
 2009-07-12 03:00:02 | ABC    | 57.78 | 59.25
 2009-07-12 03:00:03 | ABC    |       | 65.12
 2009-07-12 03:00:04 | ABC    | 67.88 |
 2009-07-12 03:00:00 | XYZ    | 47.55 | 40.15
 2009-07-12 03:00:01 | XYZ    | 44.35 | 46.78
 2009-07-12 03:00:02 | XYZ    | 71.56 | 75.78
 2009-07-12 03:00:03 | XYZ    | 85.55 | 70.21
 2009-07-12 03:00:04 | XYZ    | 45.55 | 58.65
(10 rows)

The following query processes the first and last values that belong to each 2-second time slice in table trades' column a. The query then calculates the exponential moving average of expression fv and lv with a smoothing factor of 50%:

=> SELECT symbol, slice_time, fv, lv,
     EXPONENTIAL_MOVING_AVERAGE(fv, 0.5)
       OVER (PARTITION BY symbol ORDER BY slice_time) AS ema_first,
   EXPONENTIAL_MOVING_AVERAGE(lv, 0.5)
       OVER (PARTITION BY symbol ORDER BY slice_time) AS ema_last
   FROM (
     SELECT symbol, slice_time,
        TS_FIRST_VALUE(bid1 IGNORE NULLS) as fv,
        TS_LAST_VALUE(bid2 IGNORE NULLS) AS lv
      FROM ticker TIMESERIES slice_time AS '2 seconds'
      OVER (PARTITION BY symbol ORDER BY time) ) AS sq;


 symbol |     slice_time      |  fv   |  lv   | ema_first | ema_last
--------+---------------------+-------+-------+-----------+----------
 ABC    | 2009-07-12 03:00:00 | 60.45 | 65.12 |     60.45 |    65.12
 ABC    | 2009-07-12 03:00:02 | 57.78 | 65.12 |    59.115 |    65.12
 ABC    | 2009-07-12 03:00:04 | 67.88 | 65.12 |   63.4975 |    65.12
 XYZ    | 2009-07-12 03:00:00 | 47.55 | 46.78 |     47.55 |    46.78
 XYZ    | 2009-07-12 03:00:02 | 71.56 | 70.21 |    59.555 |   58.495
 XYZ    | 2009-07-12 03:00:04 | 45.55 | 58.65 |   52.5525 |  58.5725
(6 rows)

See also

13 - FIRST_VALUE [analytic]

Lets you select the first value of a table or partition (determined by the window-order-clause) without having to use a self join.

Lets you select the first value of a table or partition (determined by the window-order-clause) without having to use a self join. This function is useful when you want to use the first value as a baseline in calculations.

Use FIRST_VALUE() with the window-order-clause to produce deterministic results. If no window is specified for the current row, the default window is UNBOUNDED PRECEDING AND CURRENT ROW.

Behavior type

Immutable

Syntax

FIRST_VALUE ( expression [ IGNORE NULLS ] ) OVER (
    [ window-partition-clause ]
    [ window-order-clause ]
    [ window-frame-clause ] )

Parameters

expression
Expression to evaluate—or example, a constant, column, nonanalytic function, function expression, or expressions involving any of these.
IGNORE NULLS
Specifies to return the first non-null value in the set, or NULL if all values are NULL. If you omit this option and the first value in the set is null, the function returns NULL.
OVER()
See Analytic Functions.

Examples

The following query asks for the first value in the partitioned day of week, and illustrates the potential nondeterministic nature of FIRST_VALUE():

=> SELECT calendar_year, date_key, day_of_week, full_date_description,
   FIRST_VALUE(full_date_description)
     OVER(PARTITION BY calendar_month_number_in_year ORDER BY day_of_week)
     AS "first_value"
   FROM date_dimension
   WHERE calendar_year=2003 AND calendar_month_number_in_year=1;

The first value returned is January 31, 2003; however, the next time the same query is run, the first value might be January 24 or January 3, or the 10th or 17th. This is because the analytic ORDER BY column day_of_week returns rows that contain ties (multiple Fridays). These repeated values make the ORDER BY evaluation result nondeterministic, because rows that contain ties can be ordered in any way, and any one of those rows qualifies as being the first value of day_of_week.

 calendar_year | date_key | day_of_week | full_date_description |    first_value
 --------------+----------+-------------+-----------------------+------------------
          2003 |       31 | Friday      | January 31, 2003      | January 31, 2003
          2003 |       24 | Friday      | January 24, 2003      | January 31, 2003
          2003 |        3 | Friday      | January 3, 2003       | January 31, 2003
          2003 |       10 | Friday      | January 10, 2003      | January 31, 2003
          2003 |       17 | Friday      | January 17, 2003      | January 31, 2003
          2003 |        6 | Monday      | January 6, 2003       | January 31, 2003
          2003 |       27 | Monday      | January 27, 2003      | January 31, 2003
          2003 |       13 | Monday      | January 13, 2003      | January 31, 2003
          2003 |       20 | Monday      | January 20, 2003      | January 31, 2003
          2003 |       11 | Saturday    | January 11, 2003      | January 31, 2003
          2003 |       18 | Saturday    | January 18, 2003      | January 31, 2003
          2003 |       25 | Saturday    | January 25, 2003      | January 31, 2003
          2003 |        4 | Saturday    | January 4, 2003       | January 31, 2003
          2003 |       12 | Sunday      | January 12, 2003      | January 31, 2003
          2003 |       26 | Sunday      | January 26, 2003      | January 31, 2003
          2003 |        5 | Sunday      | January 5, 2003       | January 31, 2003
          2003 |       19 | Sunday      | January 19, 2003      | January 31, 2003
          2003 |       23 | Thursday    | January 23, 2003      | January 31, 2003
          2003 |        2 | Thursday    | January 2, 2003       | January 31, 2003
          2003 |        9 | Thursday    | January 9, 2003       | January 31, 2003
          2003 |       16 | Thursday    | January 16, 2003      | January 31, 2003
          2003 |       30 | Thursday    | January 30, 2003      | January 31, 2003
          2003 |       21 | Tuesday     | January 21, 2003      | January 31, 2003
          2003 |       14 | Tuesday     | January 14, 2003      | January 31, 2003
          2003 |        7 | Tuesday     | January 7, 2003       | January 31, 2003
          2003 |       28 | Tuesday     | January 28, 2003      | January 31, 2003
          2003 |       22 | Wednesday   | January 22, 2003      | January 31, 2003
          2003 |       29 | Wednesday   | January 29, 2003      | January 31, 2003
          2003 |       15 | Wednesday   | January 15, 2003      | January 31, 2003
          2003 |        1 | Wednesday   | January 1, 2003       | January 31, 2003
          2003 |        8 | Wednesday   | January 8, 2003       | January 31, 2003
(31 rows)

To return deterministic results, modify the query so that it performs its analytic ORDER BY operations on a unique field, such as date_key:

=> SELECT calendar_year, date_key, day_of_week, full_date_description,
   FIRST_VALUE(full_date_description) OVER
     (PARTITION BY calendar_month_number_in_year ORDER BY date_key) AS "first_value"
   FROM date_dimension WHERE calendar_year=2003;

FIRST_VALUE() returns a first value of January 1 for the January partition and the first value of February 1 for the February partition. Also, the full_date_description column contains no ties:

 calendar_year | date_key | day_of_week | full_date_description | first_value
---------------+----------+-------------+-----------------------+------------
          2003 |        1 | Wednesday   | January 1, 2003       | January 1, 2003
          2003 |        2 | Thursday    | January 2, 2003       | January 1, 2003
          2003 |        3 | Friday      | January 3, 2003       | January 1, 2003
          2003 |        4 | Saturday    | January 4, 2003       | January 1, 2003
          2003 |        5 | Sunday      | January 5, 2003       | January 1, 2003
          2003 |        6 | Monday      | January 6, 2003       | January 1, 2003
          2003 |        7 | Tuesday     | January 7, 2003       | January 1, 2003
          2003 |        8 | Wednesday   | January 8, 2003       | January 1, 2003
          2003 |        9 | Thursday    | January 9, 2003       | January 1, 2003
          2003 |       10 | Friday      | January 10, 2003      | January 1, 2003
          2003 |       11 | Saturday    | January 11, 2003      | January 1, 2003
          2003 |       12 | Sunday      | January 12, 2003      | January 1, 2003
          2003 |       13 | Monday      | January 13, 2003      | January 1, 2003
          2003 |       14 | Tuesday     | January 14, 2003      | January 1, 2003
          2003 |       15 | Wednesday   | January 15, 2003      | January 1, 2003
          2003 |       16 | Thursday    | January 16, 2003      | January 1, 2003
          2003 |       17 | Friday      | January 17, 2003      | January 1, 2003
          2003 |       18 | Saturday    | January 18, 2003      | January 1, 2003
          2003 |       19 | Sunday      | January 19, 2003      | January 1, 2003
          2003 |       20 | Monday      | January 20, 2003      | January 1, 2003
          2003 |       21 | Tuesday     | January 21, 2003      | January 1, 2003
          2003 |       22 | Wednesday   | January 22, 2003      | January 1, 2003
          2003 |       23 | Thursday    | January 23, 2003      | January 1, 2003
          2003 |       24 | Friday      | January 24, 2003      | January 1, 2003
          2003 |       25 | Saturday    | January 25, 2003      | January 1, 2003
          2003 |       26 | Sunday      | January 26, 2003      | January 1, 2003
          2003 |       27 | Monday      | January 27, 2003      | January 1, 2003
          2003 |       28 | Tuesday     | January 28, 2003      | January 1, 2003
          2003 |       29 | Wednesday   | January 29, 2003      | January 1, 2003
          2003 |       30 | Thursday    | January 30, 2003      | January 1, 2003
          2003 |       31 | Friday      | January 31, 2003      | January 1, 2003
          2003 |       32 | Saturday    | February 1, 2003      | February 1, 2003
          2003 |       33 | Sunday      | February 2, 2003      | February 1,2003
      ...
(365 rows)

See also

14 - LAG [analytic]

Returns the value of the input expression at the given offset before the current row within a.

Returns the value of the input expression at the given offset before the current row within a window. This function lets you access more than one row in a table at the same time. This is useful for comparing values when the relative positions of rows can be reliably known. It also lets you avoid the more costly self join, which enhances query processing speed.

For information on getting the rows that follow, see LEAD.

Behavior type

Immutable

Syntax

LAG ( expression[, offset ] [, default ] ) OVER (
    [ window-partition-clause ]
    window-order-clause )

Parameters

expression
The expression to evaluate—for example, a constant, column, non-analytic function, function expression, or expressions involving any of these.
offset
Indicates how great is the lag. The default value is 1 (the previous row). This parameter must evaluate to a constant positive integer.
default
The value returned if offset falls outside the bounds of the table or partition. This value must be a constant value or an expression that can be evaluated to a constant; its data type is coercible to that of the first argument.

Examples

This example sums the current balance by date in a table and also sums the previous balance from the last day. Given the inputs that follow, the data satisfies the following conditions:

  • For each some_id, there is exactly 1 row for each date represented by month_date.

  • For each some_id, the set of dates is consecutive; that is, if there is a row for February 24 and a row for February 26, there would also be a row for February 25.

  • Each some_id has the same set of dates.

    => CREATE TABLE balances (
           month_date DATE,
           current_bal INT,
           some_id INT);
    => INSERT INTO balances values ('2009-02-24', 10, 1);
    => INSERT INTO balances values ('2009-02-25', 10, 1);
    => INSERT INTO balances values ('2009-02-26', 10, 1);
    => INSERT INTO balances values ('2009-02-24', 20, 2);
    => INSERT INTO balances values ('2009-02-25', 20, 2);
    => INSERT INTO balances values ('2009-02-26', 20, 2);
    => INSERT INTO balances values ('2009-02-24', 30, 3);
    => INSERT INTO balances values ('2009-02-25', 20, 3);
    => INSERT INTO balances values ('2009-02-26', 30, 3);
    

Now run LAG to sum the current balance for each date and sum the previous balance from the last day:

=> SELECT month_date,
     SUM(current_bal) as current_bal_sum,
     SUM(previous_bal) as previous_bal_sum FROM
       (SELECT month_date, current_bal,
     LAG(current_bal, 1, 0) OVER
       (PARTITION BY some_id ORDER BY month_date)
     AS previous_bal FROM balances) AS subQ
     GROUP BY month_date ORDER BY month_date;
month_date  | current_bal_sum | previous_bal_sum
------------+-----------------+------------------
 2009-02-24 |              60 |                0
 2009-02-25 |              50 |               60
 2009-02-26 |              60 |               50
(3 rows)

Using the same example data, the following query would not be allowed because LAG is nested inside an aggregate function:

=> SELECT month_date,
    SUM(current_bal) as current_bal_sum,
   SUM(LAG(current_bal, 1, 0) OVER
      (PARTITION BY some_id ORDER BY month_date)) AS previous_bal_sum
   FROM some_table GROUP BY month_date ORDER BY month_date;

The following example uses the VMart database. LAG first returns the annual income from the previous row, and then it calculates the difference between the income in the current row from the income in the previous row:

=> SELECT occupation, customer_key, customer_name, annual_income,
   LAG(annual_income, 1, 0) OVER (PARTITION BY occupation
   ORDER BY annual_income) AS prev_income, annual_income -
   LAG(annual_income, 1, 0) OVER (PARTITION BY occupation
   ORDER BY annual_income) AS difference
   FROM customer_dimension ORDER BY occupation, customer_key LIMIT 20;
 occupation | customer_key |    customer_name     | annual_income | prev_income | difference
------------+--------------+----------------------+---------------+-------------+------------
 Accountant |           15 | Midori V. Peterson   |        692610 |      692535 |         75
 Accountant |           43 | Midori S. Rodriguez  |        282359 |      280976 |       1383
 Accountant |           93 | Robert P. Campbell   |        471722 |      471355 |        367
 Accountant |          102 | Sam T. McNulty       |        901636 |      901561 |         75
 Accountant |          134 | Martha B. Overstreet |        705146 |      704335 |        811
 Accountant |          165 | James C. Kramer      |        376841 |      376474 |        367
 Accountant |          225 | Ben W. Farmer        |         70574 |       70449 |        125
 Accountant |          270 | Jessica S. Lang      |        684204 |      682274 |       1930
 Accountant |          273 | Mark X. Lampert      |        723294 |      722737 |        557
 Accountant |          295 | Sharon K. Gauthier   |         29033 |       28412 |        621
 Accountant |          338 | Anna S. Jackson      |        816858 |      815557 |       1301
 Accountant |          377 | William I. Jones     |        915149 |      914872 |        277
 Accountant |          438 | Joanna A. McCabe     |        147396 |      144482 |       2914
 Accountant |          452 | Kim P. Brown         |        126023 |      124797 |       1226
 Accountant |          467 | Meghan K. Carcetti   |        810528 |      810284 |        244
 Accountant |          478 | Tanya E. Greenwood   |        639649 |      639029 |        620
 Accountant |          511 | Midori P. Vogel      |        187246 |      185539 |       1707
 Accountant |          525 | Alexander K. Moore   |        677433 |      677050 |        383
 Accountant |          550 | Sam P. Reyes         |        735691 |      735355 |        336
 Accountant |          577 | Robert U. Vu         |        616101 |      615439 |        662
(20 rows)

The next example uses LEAD and LAG to return the third row after the salary in the current row and fifth salary before the salary in the current row:

=> SELECT hire_date, employee_key, employee_last_name,
   LEAD(hire_date, 1) OVER (ORDER BY hire_date) AS "next_hired" ,
   LAG(hire_date, 1) OVER (ORDER BY hire_date) AS "last_hired"
   FROM employee_dimension ORDER BY hire_date, employee_key;
 hire_date  | employee_key | employee_last_name | next_hired | last_hired
------------+--------------+--------------------+------------+------------
 1956-04-11 |         2694 | Farmer             | 1956-05-12 |
 1956-05-12 |         5486 | Winkler            | 1956-09-18 | 1956-04-11
 1956-09-18 |         5525 | McCabe             | 1957-01-15 | 1956-05-12
 1957-01-15 |          560 | Greenwood          | 1957-02-06 | 1956-09-18
 1957-02-06 |         9781 | Bauer              | 1957-05-25 | 1957-01-15
 1957-05-25 |         9506 | Webber             | 1957-07-04 | 1957-02-06
 1957-07-04 |         6723 | Kramer             | 1957-07-07 | 1957-05-25
 1957-07-07 |         5827 | Garnett            | 1957-11-11 | 1957-07-04
 1957-11-11 |          373 | Reyes              | 1957-11-21 | 1957-07-07
 1957-11-21 |         3874 | Martin             | 1958-02-06 | 1957-11-11
(10 rows)

See also

  • [LEAD](/en/sql-reference/functions/analytic-functions/lead-analytic/)

  • SQL analytics

15 - LAST_VALUE [analytic]

Lets you select the last value of a table or partition (determined by the window-order-clause) without having to use a self join.

Lets you select the last value of a table or partition (determined by the window-order-clause) without having to use a self join. LAST_VALUE takes the last record from the partition after the window order clause. The function then computes the expression against the last record, and returns the results. This function is useful when you want to use the last value as a baseline in calculations.

Use LAST_VALUE() with the window-order-clause to produce deterministic results. If no window is specified for the current row, the default window is UNBOUNDED PRECEDING AND CURRENT ROW.

Behavior type

Immutable

Syntax

LAST_VALUE ( expression [ IGNORE NULLS ] ) OVER (
    [ window-partition-clause ]
    [ window-order-clause ]
    [ window-frame-clause ] )

Parameters

expression
Expression to evaluate—for example, a constant, column, nonanalytic function, function expression, or expressions involving any of these.
IGNORE NULLS
Specifies to return the last non-null value in the set, or NULL if all values are NULL. If you omit this option and the last value in the set is null, the function returns NULL.
OVER()
See Analytic Functions.

Examples

Using the schema defined in Window framing in Analyzing Data, the following query does not show the highest salary value by department; instead it shows the highest salary value by department by salary.

=> SELECT deptno, sal, empno, LAST_VALUE(sal)
       OVER (PARTITION BY deptno ORDER BY sal) AS lv
   FROM emp;
 deptno | sal | empno |    lv
--------+-----+-------+--------
     10 | 101 |     1 |     101
     10 | 104 |     4 |     104
     20 | 100 |    11 |     100
     20 | 109 |     7 |     109
     20 | 109 |     6 |     109
     20 | 109 |     8 |     109
     20 | 110 |    10 |     110
     20 | 110 |     9 |     110
     30 | 102 |     2 |     102
     30 | 103 |     3 |     103
     30 | 105 |     5 |     105

If you include the window frame clause ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, LAST_VALUE() returns the highest salary by department, an accurate representation of the information:

=> SELECT deptno, sal, empno, LAST_VALUE(sal)
       OVER (PARTITION BY deptno ORDER BY sal
            ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS lv
   FROM emp;
 deptno | sal | empno |    lv
--------+-----+-------+--------
     10 | 101 |     1 |     104
     10 | 104 |     4 |     104
     20 | 100 |    11 |     110
     20 | 109 |     7 |     110
     20 | 109 |     6 |     110
     20 | 109 |     8 |     110
     20 | 110 |    10 |     110
     20 | 110 |     9 |     110
     30 | 102 |     2 |     105
     30 | 103 |     3 |     105
     30 | 105 |     5 |     105

For more examples, see FIRST_VALUE().

See also

16 - LEAD [analytic]

Returns values from the row after the current row within a , letting you access more than one row in a table at the same time.

Returns values from the row after the current row within a window, letting you access more than one row in a table at the same time. This is useful for comparing values when the relative positions of rows can be reliably known. It also lets you avoid the more costly self join, which enhances query processing speed.

Behavior type

Immutable

Syntax

LEAD ( expression[, offset ] [, default ] ) OVER (
    [ window-partition-clause ]
    window-order-clause )

Parameters

expression
The expression to evaluate—for example, a constant, column, non-analytic function, function expression, or expressions involving any of these.
offset
Is an optional parameter that defaults to 1 (the next row). This parameter must evaluate to a constant positive integer.
default
The value returned if offset falls outside the bounds of the table or partition. This value must be a constant value or an expression that can be evaluated to a constant; its data type is coercible to that of the first argument.

Examples

LEAD finds the hire date of the employee hired just after the current row:


=> SELECT employee_region, hire_date, employee_key, employee_last_name,
   LEAD(hire_date, 1) OVER (PARTITION BY employee_region ORDER BY hire_date) AS "next_hired"
   FROM employee_dimension ORDER BY employee_region, hire_date, employee_key;
  employee_region  | hire_date  | employee_key | employee_last_name | next_hired
-------------------+------------+--------------+--------------------+------------
 East              | 1956-04-08 |         9218 | Harris             | 1957-02-06
 East              | 1957-02-06 |         7799 | Stein              | 1957-05-25
 East              | 1957-05-25 |         3687 | Farmer             | 1957-06-26
 East              | 1957-06-26 |         9474 | Bauer              | 1957-08-18
 East              | 1957-08-18 |          570 | Jefferson          | 1957-08-24
 East              | 1957-08-24 |         4363 | Wilson             | 1958-02-17
 East              | 1958-02-17 |         6457 | McCabe             | 1958-06-26
 East              | 1958-06-26 |         6196 | Li                 | 1958-07-16
 East              | 1958-07-16 |         7749 | Harris             | 1958-09-18
 East              | 1958-09-18 |         9678 | Sanchez            | 1958-11-10
(10 rows)

The next example uses LEAD and LAG to return the third row after the salary in the current row and fifth salary before the salary in the current row.

=> SELECT hire_date, employee_key, employee_last_name,
   LEAD(hire_date, 1) OVER (ORDER BY hire_date) AS "next_hired" ,
   LAG(hire_date, 1) OVER (ORDER BY hire_date) AS "last_hired"
   FROM employee_dimension ORDER BY hire_date, employee_key;
 hire_date  | employee_key | employee_last_name | next_hired | last_hired
------------+--------------+--------------------+------------+------------
 1956-04-11 |         2694 | Farmer             | 1956-05-12 |
 1956-05-12 |         5486 | Winkler            | 1956-09-18 | 1956-04-11
 1956-09-18 |         5525 | McCabe             | 1957-01-15 | 1956-05-12
 1957-01-15 |          560 | Greenwood          | 1957-02-06 | 1956-09-18
 1957-02-06 |         9781 | Bauer              | 1957-05-25 | 1957-01-15
 1957-05-25 |         9506 | Webber             | 1957-07-04 | 1957-02-06
 1957-07-04 |         6723 | Kramer             | 1957-07-07 | 1957-05-25
 1957-07-07 |         5827 | Garnett            | 1957-11-11 | 1957-07-04
 1957-11-11 |          373 | Reyes              | 1957-11-21 | 1957-07-07
 1957-11-21 |         3874 | Martin             | 1958-02-06 | 1957-11-11
(10 rows)

The following example returns employee name and salary, along with the next highest and lowest salaries.

=> SELECT employee_last_name, annual_salary,
       NVL(LEAD(annual_salary) OVER (ORDER BY annual_salary),
         MIN(annual_salary) OVER()) "Next Highest",
       NVL(LAG(annual_salary) OVER (ORDER BY annual_salary),
         MAX(annual_salary)  OVER()) "Next Lowest"
   FROM employee_dimension;
 employee_last_name | annual_salary | Next Highest | Next Lowest
--------------------+---------------+--------------+-------------
 Nielson            |          1200 |         1200 |      995533
 Lewis              |          1200 |         1200 |        1200
 Harris             |          1200 |         1202 |        1200
 Robinson           |          1202 |         1202 |        1200
 Garnett            |          1202 |         1202 |        1202
 Weaver             |          1202 |         1202 |        1202
 Nielson            |          1202 |         1202 |        1202
 McNulty            |          1202 |         1204 |        1202
 Farmer             |          1204 |         1204 |        1202
 Martin             |          1204 |         1204 |        1204
(10 rows)

The next example returns, for each assistant director in the employees table, the hire date of the director hired just after the director on the current row. For example, Jackson was hired on 2016-12-28, and the next director hired was Bauer:

=> SELECT employee_last_name, hire_date,
       LEAD(hire_date, 1) OVER (ORDER BY hire_date DESC) as "NextHired"
   FROM employee_dimension WHERE job_title = 'Assistant Director';
 employee_last_name | hire_date  | NextHired
--------------------+------------+------------
 Jackson            | 2016-12-28 | 2016-12-26
 Bauer              | 2016-12-26 | 2016-12-11
 Miller             | 2016-12-11 | 2016-12-07
 Fortin             | 2016-12-07 | 2016-11-27
 Harris             | 2016-11-27 | 2016-11-15
 Goldberg           | 2016-11-15 |
(5 rows)

See also

17 - MAX [analytic]

Returns the maximum value of an expression within a.

Returns the maximum value of an expression within a window. The return value has the same type as the expression data type.

The analytic functions MIN() and MAX() can operate with Boolean values. The MAX() function acts upon a Boolean data type or a value that can be implicitly converted to a Boolean value. If at least one input value is true, MAX() returns t (true). Otherwise, it returns f (false). In the same scenario, the MIN() function returns t (true) if all input values are true. Otherwise, it returns f.

Behavior type

Immutable

Syntax


MAX ( expression ) OVER (
    [ window-partition-clause ]
    [ window-order-clause ]
    [ window-frame-clause ] )

Parameters

expression
Any expression for which the maximum value is calculated, typically a column reference.
OVER()
See Analytic Functions.

Examples

The following query computes the deviation between the employees' annual salary and the maximum annual salary in Massachusetts:

=> SELECT employee_state, annual_salary,
     MAX(annual_salary)
       OVER(PARTITION BY employee_state ORDER BY employee_key) max,
          annual_salary- MAX(annual_salary)
       OVER(PARTITION BY employee_state ORDER BY employee_key) diff
   FROM employee_dimension
   WHERE employee_state = 'MA';
 employee_state | annual_salary |  max   |  diff
----------------+---------------+--------+---------
 MA             |          1918 | 995533 | -993615
 MA             |          2058 | 995533 | -993475
 MA             |          2586 | 995533 | -992947
 MA             |          2500 | 995533 | -993033
 MA             |          1318 | 995533 | -994215
 MA             |          2072 | 995533 | -993461
 MA             |          2656 | 995533 | -992877
 MA             |          2148 | 995533 | -993385
 MA             |          2366 | 995533 | -993167
 MA             |          2664 | 995533 | -992869
(10 rows)

The following example shows you the difference between the MIN and MAX analytic functions when you use them with a Boolean value. The sample creates a table with two columns, adds two rows of data, and shows sample output for MIN and MAX.

CREATE TABLE min_max_functions (emp VARCHAR, torf BOOL);

INSERT INTO min_max_functions VALUES ('emp1', 1);
INSERT INTO min_max_functions VALUES ('emp1', 0);

SELECT DISTINCT emp,
min(torf) OVER (PARTITION BY emp) AS worksasbooleanand,
Max(torf) OVER (PARTITION BY emp) AS worksasbooleanor
FROM min_max_functions;

 emp  | worksasbooleanand | worksasbooleanor
------+-------------------+------------------
 emp1 | f                 | t
(1 row)

See also

18 - MEDIAN [analytic]

For each row, returns the median value of a value set within each partition.

For each row, returns the median value of a value set within each partition. MEDIAN determines the argument with the highest numeric precedence, implicitly converts the remaining arguments to that data type, and returns that data type.

MEDIAN is an alias of PERCENTILE_CONT [analytic] with an argument of 0.5 (50%).

Behavior type

Immutable

Syntax

MEDIAN ( expression ) OVER ( [ window-partition-clause ] )

Parameters

expression
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. The function returns the middle value or an interpolated value that would be the middle value once the values are sorted. Null values are ignored in the calculation.
OVER()
If the OVER clause specifies window-partition-clause, MEDIAN groups input rows according to one or more columns or expressions. If this clause is omitted, no grouping occurs and MEDIAN processes all input rows as a single partition.

Examples

See Calculating a median value

See also

19 - MIN [analytic]

Returns the minimum value of an expression within a.

Returns the minimum value of an expression within a window. The return value has the same type as the expression data type.

The analytic functions MIN() and MAX() can operate with Boolean values. The MAX() function acts upon a Boolean data type or a value that can be implicitly converted to a Boolean value. If at least one input value is true, MAX() returns t (true). Otherwise, it returns f (false). In the same scenario, the MIN() function returns t (true) if all input values are true. Otherwise, it returns f.

Behavior type

Immutable

Syntax


MIN ( expression ) OVER (
    [ window-partition-clause ]
    [ window-order-clause ]
    [ window-frame-clause ] )

Parameters

expression
Any expression for which the minimum value is calculated, typically a column reference.
OVER()
See Analytic Functions.

Examples

The following example shows how you can query to determine the deviation between the employees' annual salary and the minimum annual salary in Massachusetts:

=> SELECT employee_state, annual_salary,
      MIN(annual_salary)
      OVER(PARTITION BY employee_state ORDER BY employee_key) min,
        annual_salary- MIN(annual_salary)
      OVER(PARTITION BY employee_state ORDER BY employee_key) diff
   FROM employee_dimension
   WHERE employee_state = 'MA';
 employee_state | annual_salary | min  | diff
----------------+---------------+------+------
 MA             |          1918 | 1204 |  714
 MA             |          2058 | 1204 |  854
 MA             |          2586 | 1204 | 1382
 MA             |          2500 | 1204 | 1296
 MA             |          1318 | 1204 |  114
 MA             |          2072 | 1204 |  868
 MA             |          2656 | 1204 | 1452
 MA             |          2148 | 1204 |  944
 MA             |          2366 | 1204 | 1162
 MA             |          2664 | 1204 | 1460
(10 rows)

The following example shows you the difference between the MIN and MAX analytic functions when you use them with a Boolean value. The sample creates a table with two columns, adds two rows of data, and shows sample output for MIN and MAX.

CREATE TABLE min_max_functions (emp VARCHAR, torf BOOL);

INSERT INTO min_max_functions VALUES ('emp1', 1);
INSERT INTO min_max_functions VALUES ('emp1', 0);

SELECT DISTINCT emp,
min(torf) OVER (PARTITION BY emp) AS worksasbooleanand,
Max(torf) OVER (PARTITION BY emp) AS worksasbooleanor
FROM min_max_functions;

 emp  | worksasbooleanand | worksasbooleanor
------+-------------------+------------------
 emp1 | f                 | t
(1 row)

See also

20 - NTH_VALUE [analytic]

Returns the value evaluated at the row that is the nth row of the window (counting from 1).

Returns the value evaluated at the row that is the *n*th row of the window (counting from 1). If the specified row does not exist, NTH_VALUE returns NULL.

Behavior type

Immutable

Syntax

NTH_VALUE ( expression, row-number [ IGNORE NULLS ] ) OVER (
    [ window-frame-clause ]
    [ window-order-clause ])

Parameters

expression
Expression to evaluate. The expression can be a constant, column name, nonanalytic function, function expression, or expressions that include any of these.
row-number
Specifies the row to evaluate, where row-number evaluates to an integer ≥ 1.
IGNORE NULLS
Specifies to return the first non-NULL value in the set, or NULL if all values are NULL.
OVER()
See Analytic Functions.

Examples

In the following example, for each tuple (current row) in table t1, the window frame clause defines the window as follows:

ORDER BY b ROWS BETWEEN 3 PRECEDING AND CURRENT ROW

For each window, n for *n*th value is a+1. a is the value of column a in the tuple.

NTH_VALUE returns the result of the expression b+1, where b is the value of column b in the *n*th row, which is the a+1 row within the window.

=> SELECT * FROM t1 ORDER BY a;
 a | b
---+----
 1 | 10
 2 | 20
 2 | 21
 3 | 30
 4 | 40
 5 | 50
 6 | 60
(7 rows)

=> SELECT NTH_VALUE(b+1, a+1) OVER
     (ORDER BY b ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) FROM t1;
 ?column?
----------


       22
       31



(7 rows)

21 - NTILE [analytic]

Equally divides an ordered data set (partition) into a {value} number of subsets within a , where the subsets are numbered 1 through the value in parameter constant-value.

Equally divides an ordered data set (partition) into a {value} number of subsets within a window, where the subsets are numbered 1 through the value in parameter constant-value. For example, if constant-value= 4 and the partition contains 20 rows, NTILE divides the partition rows into four equal subsets of five rows. NTILE assigns each row to a subset by giving row a number from 1 to 4. The rows in the first subset are assigned 1, the next five are assigned 2, and so on.

If the number of partition rows is not evenly divisible by the number of subsets, the rows are distributed so no subset is more than one row larger than any other subset, and the lowest subsets have extra rows. For example, if constant-value= 4 and the number of rows = 21, the first subset has six rows, the second subset has five rows, and so on.

If the number of subsets is greater than the number of rows, then a number of subsets equal to the number of rows is filled, and the remaining subsets are empty.

Behavior type

Immutable

Syntax

NTILE ( constant-value ) OVER (
    [ window-partition-clause ]
    window-order-clause )

Parameters

constant-value
Specifies the number of subsets , where constant-value must resolve to a positive constant for each partition.
OVER()
See Analytic Functions.

Examples

The following query assigns each month's sales total into one of four subsets:

=> SELECT calendar_month_name AS MONTH, SUM(sales_quantity),
      NTILE(4) OVER (ORDER BY SUM(sales_quantity)) AS NTILE
   FROM store.store_sales_fact JOIN date_dimension
   USING(date_key)
   GROUP BY calendar_month_name
   ORDER BY NTILE;
   MONTH   |   SUM   | NTILE
-----------+---------+-------
 November  | 2040726 |     1
 June      | 2088528 |     1
 February  | 2134708 |     1
 April     | 2181767 |     2
 January   | 2229220 |     2
 October   | 2316363 |     2
 September | 2323914 |     3
 March     | 2354409 |     3
 August    | 2387017 |     3
 July      | 2417239 |     4
 May       | 2492182 |     4
 December  | 2531842 |     4
(12 rows)

See also

22 - PERCENT_RANK [analytic]

Calculates the relative rank of a row for a given row in a group within a by dividing that row’s rank less 1 by the number of rows in the partition, also less 1.

Calculates the relative rank of a row for a given row in a group within a window by dividing that row’s rank less 1 by the number of rows in the partition, also less 1. PERCENT_RANK always returns values from 0 to 1 inclusive. The first row in any set has a PERCENT_RANK of 0. The return value is NUMBER.

( rank - 1 ) / ( [ rows ] - 1 )

In the preceding formula, rank is the rank position of a row in the group and rows is the total number of rows in the partition defined by the OVER() clause.

Behavior type

Immutable

Syntax

PERCENT_RANK ( ) OVER (
    [ window-partition-clause ]
    window-order-clause  )

Parameters

OVER()
See Analytic Functions

Examples

The following example finds the percent rank of gross profit for different states within each month of the first quarter:

=> SELECT calendar_month_name AS MONTH, store_state,
      SUM(gross_profit_dollar_amount),
      PERCENT_RANK() OVER (PARTITION BY calendar_month_name
      ORDER BY SUM(gross_profit_dollar_amount)) AS PERCENT_RANK
   FROM store.store_sales_fact JOIN date_dimension
   USING(date_key)
   JOIN store.store_dimension
   USING (store_key)
   WHERE calendar_month_name IN ('January','February','March')
   AND store_state IN ('OR','IA','DC','NV','WI')
   GROUP BY calendar_month_name, store_state
   ORDER BY calendar_month_name, PERCENT_RANK;
  MONTH   | store_state |  SUM   | PERCENT_RANK
----------+-------------+--------+--------------
 February | IA          | 418490 |            0
 February | OR          | 460588 |         0.25
 February | DC          | 616553 |          0.5
 February | WI          | 619204 |         0.75
 February | NV          | 838039 |            1
 January  | OR          | 446528 |            0
 January  | IA          | 474501 |         0.25
 January  | DC          | 628496 |          0.5
 January  | WI          | 679382 |         0.75
 January  | NV          | 871824 |            1
 March    | IA          | 460282 |            0
 March    | OR          | 481935 |         0.25
 March    | DC          | 716063 |          0.5
 March    | WI          | 771575 |         0.75
 March    | NV          | 970878 |            1
(15 rows)

The following example calculates, for each employee, the percent rank of the employee's salary by their job title:

=> SELECT job_title, employee_last_name, annual_salary,
       PERCENT_RANK()
      OVER (PARTITION BY job_title ORDER BY annual_salary DESC) AS percent_rank
   FROM employee_dimension
   ORDER BY percent_rank, annual_salary;
     job_title      | employee_last_name | annual_salary |    percent_rank
--------------------+--------------------+---------------+---------------------
 Cashier            | Fortin             |          3196 |                   0
 Delivery Person    | Garnett            |          3196 |                   0
 Cashier            | Vogel              |          3196 |                   0
 Customer Service   | Sanchez            |          3198 |                   0
 Shelf Stocker      | Jones              |          3198 |                   0
 Custodian          | Li                 |          3198 |                   0
 Customer Service   | Kramer             |          3198 |                   0
 Greeter            | McNulty            |          3198 |                   0
 Greeter            | Greenwood          |          3198 |                   0
 Shift Manager      | Miller             |         99817 |                   0
 Advertising        | Vu                 |         99853 |                   0
 Branch Manager     | Jackson            |         99858 |                   0
 Marketing          | Taylor             |         99928 |                   0
 Assistant Director | King               |         99973 |                   0
 Sales              | Kramer             |         99973 |                   0
 Head of PR         | Goldberg           |        199067 |                   0
 Regional Manager   | Gauthier           |        199744 |                   0
 Director of HR     | Moore              |        199896 |                   0
 Head of Marketing  | Overstreet         |        199955 |                   0
 VP of Advertising  | Meyer              |        199975 |                   0
 VP of Sales        | Sanchez            |        199992 |                   0
 Founder            | Gauthier           |        927335 |                   0
 CEO                | Taylor             |        953373 |                   0
 Investor           | Garnett            |        963104 |                   0
 Co-Founder         | Vu                 |        977716 |                   0
 CFO                | Vogel              |        983634 |                   0
 President          | Sanchez            |        992363 |                   0
 Delivery Person    | Li                 |          3194 | 0.00114155251141553
 Delivery Person    | Robinson           |          3194 | 0.00114155251141553
 Custodian          | McCabe             |          3192 | 0.00126582278481013
 Shelf Stocker      | Moore              |          3196 | 0.00128040973111396
 Branch Manager     | Moore              |         99716 | 0.00186567164179104
...

See also

23 - PERCENTILE_CONT [analytic]

An inverse distribution function where, for each row, PERCENTILE_CONT returns the value that would fall into the specified percentile among a set of values in each partition within a.

An inverse distribution function where, for each row, PERCENTILE_CONT returns the value that would fall into the specified percentile among a set of values in each partition within a window. For example, if the argument to the function is 0.5, the result of the function is the median of the data set (50th percentile). PERCENTILE_CONT assumes a continuous distribution data model. NULL values are ignored.

PERCENTILE_CONT computes the percentile by first computing the row number where the percentile row would exist. For example:

row-number = 1 + percentile-value * (num-partition-rows -1)

If row-number is a whole number (within an error of 0.00001), the percentile is the value of row row-number.

Otherwise, Vertica interpolates the percentile value between the value of the CEILING(row-number) row and the value of the FLOOR(row-number) row. In other words, the percentile is calculated as follows:

  ( CEILING( row-number) - row-number ) * ( value of FLOOR(row-number) row )
+ ( row-number - FLOOR(row-number) ) * ( value of CEILING(row-number) row)

Behavior type

Immutable

Syntax

PERCENTILE_CONT ( percentile ) WITHIN GROUP ( ORDER BY expression [ ASC | DESC ] ) OVER ( [ window-partition-clause ] )

Parameters

percentile
Percentile value, a FLOAT constant that ranges from 0 to 1 (inclusive).
WITHIN GROUP (ORDER BY expression)
Specifies how to sort data within each group. ORDER BY takes only one column/expression that must be INTEGER, FLOAT, INTERVAL, or NUMERIC data type. NULL values are discarded.

The WITHIN GROUP(ORDER BY) clause does not guarantee the order of the SQL result. To order the final result , use the SQL ORDER BY clause set.

ASC | DESC
Specifies the ordering sequence as ascending (default) or descending.

Specifying ASC or DESC in the WITHIN GROUP clause affects results as long as the percentile is not 0.5.

OVER()
See Analytic Functions

Examples

This query computes the median annual income per group for the first 300 customers in Wisconsin and the District of Columbia.

=> SELECT customer_state, customer_key, annual_income, PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY annual_income)
      OVER (PARTITION BY customer_state) AS PERCENTILE_CONT
   FROM customer_dimension WHERE customer_state IN ('DC','WI') AND customer_key < 300
   ORDER BY customer_state, customer_key;
 customer_state | customer_key | annual_income | PERCENTILE_CONT
----------------+--------------+---------------+-----------------
 DC             |           52 |        168312 |        483266.5
 DC             |          118 |        798221 |        483266.5
 WI             |           62 |        283043 |          377691
 WI             |          139 |        472339 |          377691
(4 rows)

This query computes the median annual income per group for all customers in Wisconsin and the District of Columbia.

=> SELECT customer_state, customer_key, annual_income, PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY annual_income)
      OVER (PARTITION BY customer_state) AS PERCENTILE_CONT
   FROM customer_dimension WHERE customer_state IN ('DC','WI') ORDER BY customer_state, customer_key;
 customer_state | customer_key | annual_income | PERCENTILE_CONT
----------------+--------------+---------------+-----------------
 DC             |           52 |        168312 |        483266.5
 DC             |          118 |        798221 |        483266.5
 DC             |          622 |        220782 |          555088
 DC             |          951 |        178453 |          555088
 DC             |          972 |        961582 |          555088
 DC             |         1286 |        760445 |          555088
 DC             |         1434 |         44836 |          555088
 ...

 WI             |           62 |        283043 |          377691
 WI             |          139 |        472339 |          377691
 WI             |          359 |         42242 |          517717
 WI             |          364 |        867543 |          517717
 WI             |          403 |        509031 |          517717
 WI             |          455 |         32000 |          517717
 WI             |          485 |        373129 |          517717
 ...

(1353 rows)

See also

24 - PERCENTILE_DISC [analytic]

An inverse distribution function where, for each row, PERCENTILE_DISC returns the value that would fall into the specified percentile among a set of values in each partition within a.

An inverse distribution function where, for each row, PERCENTILE_DISC returns the value that would fall into the specified percentile among a set of values in each partition within a window. PERCENTILE_DISC() assumes a discrete distribution data model. NULL values are ignored.

PERCENTILE_DISC examines the cumulative distribution values in each group until it finds one that is greater than or equal to the specified percentile. Vertica computes the percentile where, for each row, PERCENTILE_DISC outputs the first value of the WITHIN GROUP(ORDER BY) column whose CUME_DIST (cumulative distribution) value is >= the argument FLOAT value—for example, 0.4:

PERCENTILE_DIST(0.4) WITHIN GROUP (ORDER BY salary) OVER(PARTITION BY deptno)...

Given the following query:

SELECT CUME_DIST() OVER(ORDER BY salary) FROM table-name;

The smallest CUME_DIST value that is greater than 0.4 is also the PERCENTILE_DISC.

Behavior type

Immutable

Syntax

PERCENTILE_DISC ( percentile ) WITHIN GROUP (
    ORDER BY expression [ ASC | DESC ] ) OVER (
    [ window-partition-clause ] )

Parameters

percentile
Percentile value, a FLOAT constant that ranges from 0 to 1 (inclusive).
WITHIN GROUP(ORDER BY expression)
Specifies how to sort data within each group. ORDER BY takes only one column/expression that must be INTEGER, FLOAT, INTERVAL, or NUMERIC data type. NULL values are discarded.

The WITHIN GROUP(ORDER BY) clause does not guarantee the order of the SQL result. To order the final result , use the SQL ORDER BY clause set.

ASC | DESC
Specifies the ordering sequence as ascending (default) or descending.
OVER()
See Analytic Functions

Examples

This query computes the 20th percentile annual income by group for first 300 customers in Wisconsin and the District of Columbia.

=> SELECT customer_state, customer_key, annual_income,
      PERCENTILE_DISC(.2) WITHIN GROUP(ORDER BY annual_income)
      OVER (PARTITION BY customer_state) AS PERCENTILE_DISC
   FROM customer_dimension
   WHERE customer_state IN ('DC','WI')
   AND customer_key < 300
   ORDER BY customer_state, customer_key;
 customer_state | customer_key | annual_income | PERCENTILE_DISC
----------------+--------------+---------------+-----------------
 DC             |          104 |        658383 |          417092
 DC             |          168 |        417092 |          417092
 DC             |          245 |        670205 |          417092
 WI             |          106 |        227279 |          227279
 WI             |          127 |        703889 |          227279
 WI             |          209 |        458607 |          227279
(6 rows)

See also

25 - RANK [analytic]

Within each window partition, ranks all rows in the query results set according to the order specified by the window's ORDER BY clause.

Within each window partition, ranks all rows in the query results set according to the order specified by the window's ORDER BY clause.

RANK executes as follows:

  1. Sorts partition rows as specified by the ORDER BY clause.

  2. Compares the ORDER BY values of the preceding row and current row and ranks the current row as follows:

    • If ORDER BY values are the same, the current row gets the same ranking as the preceding row.

    • If the ORDER BY values are different, DENSE_RANK increments or decrements the current row's ranking by 1, plus the number of consecutive duplicate values in the rows that precede it.

The largest rank value is the equal to the total number of rows returned by the query.

Behavior type

Immutable

Syntax

RANK() OVER (
    [ window-partition-clause ]
    window-order-clause )

Parameters

OVER()
See Analytic Functions

Compared with DENSE_RANK

RANK can leave gaps in the ranking sequence, while DENSE_RANK does not.

Examples

The following query ranks by state all company customers that have been customers since 2007. In rows where the customer_since dates are the same, RANK assigns the rows equal ranking. When the customer_since date changes, RANK skips one or more rankings—for example, within CA, from 12 to 14, and from 17 to 19.

=> SELECT customer_state, customer_name, customer_since,
    RANK() OVER (PARTITION BY customer_state ORDER BY customer_since) AS rank
    FROM customer_dimension WHERE customer_type='Company' AND customer_since > '01/01/2007'
    ORDER BY customer_state;
  customer_state | customer_name | customer_since | rank
----------------+---------------+----------------+------
 AZ             | Foodshop      | 2007-01-20     |    1
 AZ             | Goldstar      | 2007-08-11     |    2
 CA             | Metahope      | 2007-01-05     |    1
 CA             | Foodgen       | 2007-02-05     |    2
 CA             | Infohope      | 2007-02-09     |    3
 CA             | Foodcom       | 2007-02-19     |    4
 CA             | Amerihope     | 2007-02-22     |    5
 CA             | Infostar      | 2007-03-05     |    6
 CA             | Intracare     | 2007-03-14     |    7
 CA             | Infocare      | 2007-04-07     |    8
 ...
 CO             | Goldtech      | 2007-02-19     |    1
 CT             | Foodmedia     | 2007-02-11     |    1
 CT             | Metatech      | 2007-02-20     |    2
 CT             | Infocorp      | 2007-04-10     |    3
 ...

See also

SQL analytics

26 - ROW_NUMBER [analytic]

Assigns a sequence of unique numbers to each row in a partition, starting with 1.

Assigns a sequence of unique numbers to each row in a window partition, starting with 1. ROW_NUMBER and RANK are generally interchangeable, with the following differences:

  • ROW_NUMBER assigns a unique ordinal number to each row in the ordered set, starting with 1.

  • ROW_NUMBER() is a Vertica extension, while RANK conforms to the SQL-99 standard.

Behavior type

Immutable

Syntax

ROW_NUMBER () OVER (
    [ window-partition-clause ]
    [ window-order-clause ] )

Parameters

OVER()
See Analytic Functions

Examples

The following ROW_NUMBER query partitions customers in the VMart table customer_dimension by customer_region. Within each partition, the function ranks those customers in order of seniority, as specified by its window order clause:

=> SELECT * FROM
    (SELECT ROW_NUMBER() OVER (PARTITION BY customer_region ORDER BY customer_since) AS most_senior,
     customer_region, customer_name, customer_since FROM public.customer_dimension WHERE customer_type = 'Individual') sq
   WHERE most_senior <= 5;
 most_senior | customer_region |    customer_name     | customer_since
-------------+-----------------+----------------------+----------------
           1 | West            | Jack Y. Perkins      | 1965-01-01
           2 | West            | Linda Q. Winkler     | 1965-01-02
           3 | West            | Marcus K. Li         | 1965-01-03
           4 | West            | Carla R. Jones       | 1965-01-07
           5 | West            | Seth P. Young        | 1965-01-09
           1 | East            | Kim O. Vu            | 1965-01-01
           2 | East            | Alexandra L. Weaver  | 1965-01-02
           3 | East            | Steve L. Webber      | 1965-01-04
           4 | East            | Thom Y. Li           | 1965-01-05
           5 | East            | Martha B. Farmer     | 1965-01-07
           1 | SouthWest       | Martha V. Gauthier   | 1965-01-01
           2 | SouthWest       | Jessica U. Goldberg  | 1965-01-07
           3 | SouthWest       | Robert O. Stein      | 1965-01-07
           4 | SouthWest       | Emily I. McCabe      | 1965-01-18
           5 | SouthWest       | Jack E. Miller       | 1965-01-25
           1 | NorthWest       | Julie O. Greenwood   | 1965-01-08
           2 | NorthWest       | Amy X. McNulty       | 1965-01-25
           3 | NorthWest       | Kevin S. Carcetti    | 1965-02-09
           4 | NorthWest       | Sam K. Carcetti      | 1965-03-16
           5 | NorthWest       | Alexandra X. Winkler | 1965-04-05
           1 | MidWest         | Michael Y. Meyer     | 1965-01-01
           2 | MidWest         | Joanna W. Bauer      | 1965-01-06
           3 | MidWest         | Amy E. Harris        | 1965-01-08
           4 | MidWest         | Julie W. McCabe      | 1965-01-09
           5 | MidWest         | William . Peterson   | 1965-01-09
           1 | South           | Dean . Martin        | 1965-01-01
           2 | South           | Ruth U. Williams     | 1965-01-02
           3 | South           | Steve Y. Farmer      | 1965-01-03
           4 | South           | Mark V. King         | 1965-01-08
           5 | South           | Lucas Y. Young       | 1965-01-10
(30 rows)

See also

27 - STDDEV [analytic]

Computes the statistical sample standard deviation of the current row with respect to the group within a.

Computes the statistical sample standard deviation of the current row with respect to the group within a window. STDDEV_SAMP returns the same value as the square root of the variance defined for the VAR_SAMP function:

STDDEV( expression ) = SQRT(VAR_SAMP( expression ))

When VAR_SAMP returns NULL, this function returns NULL.

Behavior type

Immutable

Syntax

STDDEV ( expression ) OVER (
    [ window-partition-clause ]
    [ window-order-clause ]
    [ window-frame-clause ] )

Parameters

expression
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument.
OVER()
See Analytic Functions

Examples

The following example returns the standard deviations of salaries in the employee dimension table by job title Assistant Director:

=> SELECT employee_last_name, annual_salary,
       STDDEV(annual_salary) OVER (ORDER BY hire_date) as "stddev"
   FROM employee_dimension
   WHERE job_title =  'Assistant Director';
 employee_last_name | annual_salary |      stddev
--------------------+---------------+------------------
 Bauer              |         85003 |              NaN
 Reyes              |         91051 | 4276.58181261624
 Overstreet         |         53296 | 20278.6923394976
 Gauthier           |         97216 | 19543.7184537642
 Jones              |         82320 | 16928.0764028285
 Fortin             |         56166 | 18400.2738421652
 Carcetti           |         71135 | 16968.9453554483
 Weaver             |         74419 | 15729.0709901852
 Stein              |         85689 | 15040.5909495309
 McNulty            |         69423 | 14401.1524291943
 Webber             |         99091 | 15256.3160166536
 Meyer              |         74774 | 14588.6126417355
 Garnett            |         82169 | 14008.7223268494
 Roy                |         76974 | 13466.1270356647
 Dobisz             |         83486 | 13040.4887828347
 Martin             |         99702 | 13637.6804131055
 Martin             |         73589 | 13299.2838158566
 ...

See also

28 - STDDEV_POP [analytic]

Evaluates the statistical population standard deviation for each member of the group.

Computes the statistical population standard deviation and returns the square root of the population variance within a window. The STDDEV_POP() return value is the same as the square root of the VAR_POP() function:

STDDEV_POP( expression ) = SQRT(VAR_POP( expression ))

When VAR_POP returns null, STDDEV_POP returns null.

Behavior type

Immutable

Syntax

STDDEV_POP ( expression ) OVER (
    [ window-partition-clause ]
    [ window-order-clause ]
    [ window-frame-clause ] )

Parameters

expression
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument.
OVER()
See Analytic Functions.

Examples

The following example returns the population standard deviations of salaries in the employee dimension table by job title Assistant Director:

=> SELECT employee_last_name, annual_salary,
       STDDEV_POP(annual_salary) OVER (ORDER BY hire_date) as "stddev_pop"
   FROM employee_dimension WHERE job_title =  'Assistant Director';
 employee_last_name | annual_salary |    stddev_pop
--------------------+---------------+------------------
 Goldberg           |         61859 |                0
 Miller             |         79582 |           8861.5
 Goldberg           |         74236 | 7422.74712548456
 Campbell           |         66426 | 6850.22125098891
 Moore              |         66630 | 6322.08223926257
 Nguyen             |         53530 | 8356.55480080699
 Harris             |         74115 | 8122.72288970008
 Lang               |         59981 | 8053.54776538731
 Farmer             |         60597 | 7858.70140687825
 Nguyen             |         78941 | 8360.63150784682

See also

29 - STDDEV_SAMP [analytic]

Computes the statistical sample standard deviation of the current row with respect to the group within a.

Computes the statistical sample standard deviation of the current row with respect to the group within a window. STDDEV_SAM's return value is the same as the square root of the variance defined for the VAR_SAMP function:

STDDEV( expression ) = SQRT(VAR_SAMP( expression ))

When VAR_SAMP returns NULL, STDDEV_SAMP returns NULL.

Behavior type

Immutable

Syntax

STDDEV_SAMP ( expression ) OVER (
    [ window-partition-clause ]
    [ window-order-clause ]
    [ window-frame-clause ] )

Parameters

expression
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument..
OVER()
See Analytic Functions

Examples

The following example returns the sample standard deviations of salaries in the employee dimension table by job title Assistant Director:

=> SELECT employee_last_name, annual_salary,
      STDDEV(annual_salary) OVER (ORDER BY hire_date) as "stddev_samp"
      FROM employee_dimension WHERE job_title =  'Assistant Director';
 employee_last_name | annual_salary |   stddev_samp
--------------------+---------------+------------------
 Bauer              |         85003 |              NaN
 Reyes              |         91051 | 4276.58181261624
 Overstreet         |         53296 | 20278.6923394976
 Gauthier           |         97216 | 19543.7184537642
 Jones              |         82320 | 16928.0764028285
 Fortin             |         56166 | 18400.2738421652
 Carcetti           |         71135 | 16968.9453554483
 Weaver             |         74419 | 15729.0709901852
 Stein              |         85689 | 15040.5909495309
 McNulty            |         69423 | 14401.1524291943
 Webber             |         99091 | 15256.3160166536
 Meyer              |         74774 | 14588.6126417355
 Garnett            |         82169 | 14008.7223268494
 Roy                |         76974 | 13466.1270356647
 Dobisz             |         83486 | 13040.4887828347
 ...

See also

30 - SUM [analytic]

Computes the sum of an expression over a group of rows within a.

Computes the sum of an expression over a group of rows within a window. It returns a DOUBLE PRECISION value for a floating-point expression. Otherwise, the return value is the same as the expression data type.

Behavior type

Immutable

Syntax

SUM ( expression ) OVER (
    [ window-partition-clause ]
    [ window-order-clause ]
    [ window-frame-clause ] )

Parameters

expression
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument.
OVER()
See Analytic Functions

Overflow handling

If you encounter data overflow when using SUM, use SUM_FLOAT which converts data to a floating point.

By default, Vertica allows silent numeric overflow when you call this function on numeric data types. For more information on this behavior and how to change it, seeNumeric data type overflow with SUM, SUM_FLOAT, and AVG.

Examples

The following query returns the cumulative sum all of the returns made to stores in January:

=> SELECT calendar_month_name AS month, transaction_type, sales_quantity,
     SUM(sales_quantity)
     OVER (PARTITION BY calendar_month_name ORDER BY date_dimension.date_key) AS SUM
     FROM store.store_sales_fact JOIN date_dimension
     USING(date_key) WHERE calendar_month_name IN ('January')
     AND transaction_type= 'return';
  month  | transaction_type | sales_quantity | SUM
---------+------------------+----------------+------
 January | return           |              7 |  651
 January | return           |              3 |  651
 January | return           |              7 |  651
 January | return           |              7 |  651
 January | return           |              7 |  651
 January | return           |              3 |  651
 January | return           |              7 |  651
 January | return           |              5 |  651
 January | return           |              1 |  651
 January | return           |              6 |  651
 January | return           |              6 |  651
 January | return           |              3 |  651
 January | return           |              9 |  651
 January | return           |              7 |  651
 January | return           |              6 |  651
 January | return           |              8 |  651
 January | return           |              7 |  651
 January | return           |              2 |  651
 January | return           |              4 |  651
 January | return           |              5 |  651
 January | return           |              7 |  651
 January | return           |              8 |  651
 January | return           |              4 |  651
 January | return           |             10 |  651
 January | return           |              6 |  651
 ...

See also

31 - VAR_POP [analytic]

Returns the statistical population variance of a non-null set of numbers (nulls are ignored) in a group within a.

Returns the statistical population variance of a non-null set of numbers (nulls are ignored) in a group within a window. Results are calculated by the sum of squares of the difference of expression from the mean of expression, divided by the number of rows remaining:

(SUM( expression * expression ) - SUM( expression  ) * SUM(  expression ) /   COUNT( expression )) / COUNT( expression )

Behavior type

Immutable

Syntax

VAR_POP ( expression ) OVER (
    [ window-partition-clause ]
    [ window-order-clause ]
    [ window-frame-clause ] )

Parameters

expression
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument
OVER()
See Analytic Functions

Examples

The following example calculates the cumulative population in the store orders fact table of sales in January 2007:

=> SELECT date_ordered,
      VAR_POP(SUM(total_order_cost))
      OVER (ORDER BY date_ordered) "var_pop"
   FROM store.store_orders_fact s
   WHERE date_ordered BETWEEN '2007-01-01' AND '2007-01-31'
   GROUP BY s.date_ordered;
 date_ordered |     var_pop
--------------+------------------
 2007-01-01   |                0
 2007-01-02   |         89870400
 2007-01-03   |       3470302472
 2007-01-04   |  4466755450.6875
 2007-01-05   | 3816904780.80078
 2007-01-06   |   25438212385.25
 2007-01-07   | 22168747513.1016
 2007-01-08   | 23445191012.7344
 2007-01-09   | 39292879603.1113
 2007-01-10   | 48080574326.9609
(10 rows)

See also

32 - VAR_SAMP [analytic]

Returns the sample variance of a non-NULL set of numbers (NULL values in the set are ignored) for each row of the group within a.

Returns the sample variance of a non-NULL set of numbers (NULL values in the set are ignored) for each row of the group within a window. Results are calculated as follows:

(SUM( expression * expression ) - SUM( expression ) * SUM( expression ) / COUNT( expression ) )
/ (COUNT( expression ) - 1 )

This function and VARIANCE differ in one way: given an input set of one element, VARIANCE returns 0 and VAR_SAMP returns NULL.

Behavior type

Immutable

Syntax

VAR_SAMP ( expression ) OVER (
    [ window-partition-clause ]
    [ window-order-clause ]
    [ window-frame-clause ] )

Parameters

expression
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument
OVER()
See Analytic Functions

Null handling

  • VAR_SAMP returns the sample variance of a set of numbers after it discards the NULL values in the set.

  • If the function is applied to an empty set, then it returns NULL.

Examples

The following example calculates the sample variance in the store orders fact table of sales in December 2007:

=> SELECT date_ordered,
      VAR_SAMP(SUM(total_order_cost))
      OVER (ORDER BY date_ordered) "var_samp"
   FROM store.store_orders_fact s
   WHERE date_ordered BETWEEN '2007-12-01' AND '2007-12-31'
   GROUP BY s.date_ordered;
 date_ordered |     var_samp
--------------+------------------
 2007-12-01   |              NaN
 2007-12-02   |      90642601088
 2007-12-03   | 48030548449.3359
 2007-12-04   | 32740062504.2461
 2007-12-05   | 32100319112.6992
 2007-12-06   |  26274166814.668
 2007-12-07   | 23017490251.9062
 2007-12-08   | 21099374085.1406
 2007-12-09   | 27462205977.9453
 2007-12-10   | 26288687564.1758
(10 rows)

See also

33 - VARIANCE [analytic]

Returns the sample variance of a non-NULL set of numbers (NULL values in the set are ignored) for each row of the group within a.

Returns the sample variance of a non-NULL set of numbers (NULL values in the set are ignored) for each row of the group within a window. Results are calculated as follows:

( SUM( expression * expression ) - SUM( expression ) * SUM( expression ) / COUNT( expression )) / (COUNT( expression ) - 1 )

VARIANCE returns the variance of expression, which is calculated as follows:

  • 0 if the number of rows in expression = 1

  • VAR_SAMP if the number of rows in expression > 1

Behavior type

Immutable

Syntax

VAR_SAMP ( expression ) OVER (
    [ window-partition-clause ]
    [ window-order-clause ]
    [ window-frame-clause ] )

Parameters

expression
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument.
OVER()
See Analytic Functions

Examples

The following example calculates the cumulative variance in the store orders fact table of sales in December 2007:

=> SELECT date_ordered,
      VARIANCE(SUM(total_order_cost))
      OVER (ORDER BY date_ordered) "variance"
   FROM store.store_orders_fact s
   WHERE date_ordered BETWEEN '2007-12-01' AND '2007-12-31'
   GROUP BY s.date_ordered;
 date_ordered |     variance
--------------+------------------
 2007-12-01   |              NaN
 2007-12-02   |       2259129762
 2007-12-03   | 1809012182.33301
 2007-12-04   |   35138165568.25
 2007-12-05   | 26644110029.3003
 2007-12-06   |      25943125234
 2007-12-07   | 23178202223.9048
 2007-12-08   | 21940268901.1431
 2007-12-09   | 21487676799.6108
 2007-12-10   | 21521358853.4331
(10 rows)

See also