This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Analytic functions
All analytic functions in this section with an aggregate counterpart are appended with [Analytics] in the heading to avoid confusion between the two function types.
Note
All analytic functions in this section with an aggregate counterpart are appended with [Analytics] in the heading to avoid confusion between the two function types.
Vertica analytics are SQL functions based on the ANSI 99 standard. These functions handle complex analysis and reporting tasks—for example:
-
Rank the longest-standing customers in a particular state.
-
Calculate the moving average of retail volume over a specified time.
-
Find the highest score among all students in the same grade.
-
Compare the current sales bonus that salespersons received against their previous bonus.
Analytic functions return aggregate results but they do not group the result set. They return the group value multiple times, once per record. You can sort group values, or partitions, using a window ORDER BY
clause, but the order affects only the function result set, not the entire query result set.
Syntax
General
analytic-function(arguments) OVER(
[ window-partition-clause ]
[ window-order-clause [ window-frame-clause ] ]
)
With named window
analytic-function(arguments) OVER(
[ named-window [ window-frame-clause ] ]
)
Parameters
analytic-function
(
arguments
)
- A Vertica analytic function and its arguments.
OVER
- Specifies how to partition, sort, and window frame function input with respect to the current row. The input data is the result set that the query returns after it evaluates
FROM
, WHERE
, GROUP BY
, and HAVING
clauses.
An empty OVER
clause provides the best performance for single threaded queries on a single node.
- window-partition-clause
- Groups input rows according to one or more columns or expressions.
If you omit this clause, no grouping occurs and the analytic function processes all input rows as a single partition.
- window-order-clause
- Optionally specifies how to sort rows that are supplied to the analytic function. If the
OVER
clause also includes a partition clause, rows are sorted within each partition.
-
window-frame-clause
- Only valid for some analytic functions, specifies as input a set of rows relative to the row that is currently being evaluated by the analytic function. After the function processes that row and its window, Vertica advances the current row and adjusts the window boundaries accordingly.
named-window
- The name of a window that you define in the same query with a window-name-clause. This definition encapsulates window partitioning and sorting. Named windows are useful when the query invokes multiple analytic functions with similar
OVER
clauses.
A window name clause cannot specify a window frame clause. However, you can qualify the named window in an OVER
clause with a window frame clause.
Requirements
The following requirements apply to analytic functions:
-
All require an OVER
clause. Each function has its own OVER
clause requirements. For example, you can supply an empty OVER
clause for some analytic aggregate functions such as
SUM
. For other functions, window frame and order clauses might be required, or might be invalid.
-
Analytic functions can be invoked only in a query's SELECT
and ORDER BY
clauses.
-
Analytic functions cannot be nested. For example, the following query is not allowed:
=> SELECT MEDIAN(RANK() OVER(ORDER BY sal) OVER()).
-
WHERE
, GROUP BY
and HAVING
operators are technically not part of the analytic function. However, they determine input to that function.
See also
1 - ARGMAX [analytic]
This function is patterned after the mathematical function argmax(f(x)), which returns the value of x that maximizes f(x).
This function is patterned after the mathematical function argmax(
f
(
x
))
, which returns the value of x
that maximizes f
(
x
)
. Similarly, ARGMAX takes two arguments target
and arg
, where both are columns or column expressions in the queried dataset. ARGMAX finds the row with the largest non-null value in target
and returns the value of arg
in that row. If multiple rows contain the largest target
value, ARGMAX returns arg
from the first row that it finds.
Behavior type
Immutable
Syntax
ARGMAX ( target, arg ) OVER ( [ PARTITION BY expression[,...] ] [ window-order-clause ] )
Arguments
target
, arg
- Columns in the queried dataset.
OVER()
- Specifies the following window clauses:
-
PARTITION BY
expression
: Groups (partitions) input rows according to the values in expression
, which resolves to one or more columns in the queried dataset. If you omit this clause, ARGMAX processes all input rows as a single partition.
-
window-order-clause: Specifies how to sort input rows. If the OVER clause also includes a partition clause, rows are sorted separately within each partition.
Important
To ensure consistent results when multiple rows contain the largest target
value, include a window order clause that sorts on arg
.
For details, see Analytic Functions.
Examples
Create and populate table service_info
, which contains information on various services, their respective development groups, and their userbase. A NULL in the users
column indicates that the service has not been released, and so it cannot have users.
=> CREATE TABLE service_info(dev_group VARCHAR(10), product_name VARCHAR(30), users INT);
=> COPY t FROM stdin NULL AS 'null';
>> iris|chat|48193
>> aspen|trading|3000
>> orchid|cloud|990322
>> iris|video call| 10203
>> daffodil|streaming|44123
>> hydrangea|password manager|null
>> hydrangea|totp|1837363
>> daffodil|clip share|3000
>> hydrangea|e2e sms|null
>> rose|crypto|null
>> iris|forum|48193
>> \.
ARGMAX returns the value in the product_name
column that maximizes the value in the users
column. In this case, ARGMAX returns totp
, which indicates that the totp
service has the largest user base:
=> SELECT dev_group, product_name, users, ARGMAX(users, product_name) OVER (ORDER BY dev_group ASC) FROM service_info;
dev_group | product_name | users | ARGMAX
-----------+------------------+---------+--------
aspen | trading | 3000 | totp
daffodil | clip share | 3000 | totp
daffodil | streaming | 44123 | totp
hydrangea | e2e sms | | totp
hydrangea | password manager | | totp
hydrangea | totp | 1837363 | totp
iris | chat | 48193 | totp
iris | forum | 48193 | totp
iris | video call | 10203 | totp
orchid | cloud | 990322 | totp
rose | crypto | | totp
(11 rows)
The next query partitions the data on dev_group
to identify the most popular service created by each development group. ARGMAX returns NULL if the partition's users
column contains only NULL values and breaks ties using the first value in product_name
from the top of the partition.
=> SELECT dev_group, product_name, users, ARGMAX(users, product_name) OVER (PARTITION BY dev_group ORDER BY product_name ASC) FROM service_info;
dev_group | product_name | users | ARGMAX
-----------+------------------+---------+-----------
iris | chat | 48193 | chat
iris | forum | 48193 | chat
iris | video call | 10203 | chat
orchid | cloud | 990322 | cloud
aspen | trading | 3000 | trading
daffodil | clip share | 3000 | streaming
daffodil | streaming | 44123 | streaming
rose | crypto | |
hydrangea | e2e sms | | totp
hydrangea | password manager | | totp
hydrangea | totp | 1837363 | totp
(11 rows)
See also
ARGMIN [analytic]
2 - ARGMIN [analytic]
This function is patterned after the mathematical function argmin(f(x)), which returns the value of x that minimizes f(x).
This function is patterned after the mathematical function argmin(
f
(
x
))
, which returns the value of x
that minimizes f
(
x
)
. Similarly, ARGMIN takes two arguments target
and arg
, where both are columns or column expressions in the queried dataset. ARGMIN finds the row with the smallest non-null value in target
and returns the value of arg
in that row. If multiple rows contain the smallest target
value, ARGMIN returns arg
from the first row that it finds.
Behavior type
Immutable
Syntax
ARGMIN ( target, arg ) OVER ( [ PARTITION BY expression[,...] ] [ window-order-clause ] )
Arguments
target
, arg
- Columns in the queried dataset.
OVER()
- Specifies the following window clauses:
-
PARTITION BY
expression
: Groups (partitions) input rows according to the values in expression
, which resolves to one or more columns in the queried dataset. If you omit this clause, ARGMIN processes all input rows as a single partition.
-
window-order-clause: Specifies how to sort input rows. If the OVER
clause also includes a partition clause, rows are sorted separately within each partition.
Important
To ensure consistent results when multiple rows contain the smallest target
value, include a window order clause that sorts on arg
.
For details, see Analytic Functions.
Examples
Create and populate table service_info
, which contains information on various services, their respective development groups, and their userbase. A NULL in the users
column indicates that the service has not been released, and so it cannot have users.
=> CREATE TABLE service_info(dev_group VARCHAR(10), product_name VARCHAR(30), users INT);
=> COPY t FROM stdin NULL AS 'null';
>> iris|chat|48193
>> aspen|trading|3000
>> orchid|cloud|990322
>> iris|video call| 10203
>> daffodil|streaming|44123
>> hydrangea|password manager|null
>> hydrangea|totp|1837363
>> daffodil|clip share|3000
>> hydrangea|e2e sms|null
>> rose|crypto|null
>> iris|forum|48193
>> \.
ARGMIN returns the value in the product_name
column that minimizes the value in the users
column. In this case, ARGMIN returns totp
, which indicates that the totp
service has the smallest user base:
=> SELECT dev_group, product_name, users, ARGMIN(users, product_name) OVER (ORDER BY dev_group ASC) FROM service_info;
dev_group | product_name | users | ARGMIN
-----------+------------------+---------+---------
aspen | trading | 3000 | trading
daffodil | clip share | 3000 | trading
daffodil | streaming | 44123 | trading
hydrangea | e2e sms | | trading
hydrangea | password manager | | trading
hydrangea | totp | 1837363 | trading
iris | chat | 48193 | trading
iris | forum | 48193 | trading
iris | video call | 10203 | trading
orchid | cloud | 990322 | trading
rose | crypto | | trading
(11 rows)
The next query partitions the data on dev_group
to identify the least popular service created by each development group. ARGMIN returns NULL if the partition's users
column contains only NULL values and breaks ties using the first value in product_name
from the top of the partition.
=> SELECT dev_group, product_name, users, ARGMIN(users, product_name) OVER (PARTITION BY dev_group ORDER BY product_name ASC) FROM service_info;
dev_group | product_name | users | ARGMIN
-----------+------------------+---------+------------
iris | chat | 48193 | video call
iris | forum | 48193 | video call
iris | video call | 10203 | video call
orchid | cloud | 990322 | cloud
aspen | trading | 3000 | trading
daffodil | clip share | 3000 | clip share
daffodil | streaming | 44123 | clip share
rose | crypto | |
hydrangea | e2e sms | | totp
hydrangea | password manager | | totp
hydrangea | totp | 1837363 | totp
(11 rows)
See also
ARGMAX [analytic]
3 - AVG [analytic]
Computes an average of an expression in a group within a.
Computes an average of an expression in a group within a window. AVG
returns the same data type as the expression's numeric data type.
The AVG
analytic function differs from the
AVG
aggregate function, which computes the average of an expression over a group of rows.
Behavior type
Immutable
Syntax
AVG ( expression ) OVER (
[ window-partition-clause ]
[ window-order-clause ]
[ window-frame-clause ] )
Parameters
expression
- Any data that can be implicitly converted to a numeric data type.
OVER()
- See Analytic Functions.
Overflow handling
By default, Vertica allows silent numeric overflow when you call this function on numeric data types. For more information on this behavior and how to change it, seeNumeric data type overflow with SUM, SUM_FLOAT, and AVG.
Examples
The following query finds the sales for that calendar month and returns a running/cumulative average (sometimes called a moving average) using the default window of RANGE UNBOUNDED PRECEDING AND CURRENT ROW
:
=> SELECT calendar_month_number_in_year Mo, SUM(product_price) Sales,
AVG(SUM(product_price)) OVER (ORDER BY calendar_month_number_in_year)::INTEGER Average
FROM product_dimension pd, date_dimension dm, inventory_fact if
WHERE dm.date_key = if.date_key AND pd.product_key = if.product_key GROUP BY Mo;
Mo | Sales | Average
----+----------+----------
1 | 23869547 | 23869547
2 | 19604661 | 21737104
3 | 22877913 | 22117374
4 | 22901263 | 22313346
5 | 23670676 | 22584812
6 | 22507600 | 22571943
7 | 21514089 | 22420821
8 | 24860684 | 22725804
9 | 21687795 | 22610470
10 | 23648921 | 22714315
11 | 21115910 | 22569005
12 | 24708317 | 22747281
(12 rows)
To return a moving average that is not a running (cumulative) average, the window can specify ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
:
=> SELECT calendar_month_number_in_year Mo, SUM(product_price) Sales,
AVG(SUM(product_price)) OVER (ORDER BY calendar_month_number_in_year
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)::INTEGER Average
FROM product_dimension pd, date_dimension dm, inventory_fact if
WHERE dm.date_key = if.date_key AND pd.product_key = if.product_key GROUP BY Mo;
Mo | Sales | Average
----+----------+----------
1 | 23869547 | 22117374
2 | 19604661 | 22313346
3 | 22877913 | 22584812
4 | 22901263 | 22312423
5 | 23670676 | 22694308
6 | 22507600 | 23090862
7 | 21514089 | 22848169
8 | 24860684 | 22843818
9 | 21687795 | 22565480
10 | 23648921 | 23204325
11 | 21115910 | 22790236
12 | 24708317 | 23157716
(12 rows)
See also
4 - BOOL_AND [analytic]
Returns the Boolean value of an expression within a.
Returns the Boolean value of an expression within a window. If all input values are true, BOOL_AND
returns t
. Otherwise, it returns f
.
Behavior type
Immutable
Syntax
BOOL_AND ( expression ) OVER (
[ window-partition-clause ]
[ window-order-clause ]
[ window-frame-clause ] )
Parameters
expression
- A Boolean data type or any non-Boolean data type that can be implicitly converted to a Boolean data type. The function returns a Boolean value.
OVER()
- See Analytic Functions.
Examples
The following example illustrates how you can use the BOOL_AND
, BOOL_OR
, and BOOL_XOR
analytic functions. The sample table, employee, includes a column for type of employee and years paid.
=> CREATE TABLE employee(emptype VARCHAR, yearspaid VARCHAR);
CREATE TABLE
Insert sample data into the table to show years paid. In more than one case, an employee could be paid more than once within one year.
=> INSERT INTO employee
SELECT 'contractor1', '2014'
UNION ALL
SELECT 'contractor2', '2015'
UNION ALL
SELECT 'contractor3', '2014'
UNION ALL
SELECT 'contractor1', '2014'
UNION ALL
SELECT 'contractor2', '2014'
UNION ALL
SELECT 'contractor3', '2015'
UNION ALL
SELECT 'contractor4', '2014'
UNION ALL
SELECT 'contractor4', '2014'
UNION ALL
SELECT 'contractor5', '2015'
UNION ALL
SELECT 'contractor5', '2016';
OUTPUT
--------
10
(1 row)
Query the table. The result shows employees that were paid twice in 2014 (BOOL_AND
), once or twice in 2014 (BOOL_OR
), and specifically not more than once in 2014 (BOOL_XOR
).
=> SELECT DISTINCT emptype,
BOOL_AND(yearspaid='2014') OVER (PARTITION BY emptype) AS paidtwicein2014,
BOOL_OR(yearspaid='2014') OVER (PARTITION BY emptype) AS paidonceortwicein2014,
BOOL_XOR(yearspaid='2014') OVER (PARTITION BY emptype) AS paidjustoncein2014
FROM employee;
emptype | paidtwicein2014 | paidonceortwicein2014 | paidjustoncein2014
-------------+-----------------+-----------------------+--------------------
contractor1 | t | t | f
contractor2 | f | t | t
contractor3 | f | t | t
contractor4 | t | t | f
contractor5 | f | f | f
(5 rows)
See also
5 - BOOL_OR [analytic]
Returns the Boolean value of an expression within a.
Returns the Boolean value of an expression within a window. If at least one input value is true, BOOL_OR
returns t
. Otherwise, it returns f
.
Behavior type
Immutable
Syntax
BOOL_OR ( expression ) OVER (
[ window-partition-clause ]
[ window-order-clause ]
[ window-frame-clause ] )
Parameters
expression
- A Boolean data type or any non-Boolean data type that can be implicitly converted to a Boolean data type. The function returns a Boolean value.
OVER()
- See Analytic Functions.
Examples
The following example illustrates how you can use the BOOL_AND
, BOOL_OR
, and BOOL_XOR
analytic functions. The sample table, employee, includes a column for type of employee and years paid.
=> CREATE TABLE employee(emptype VARCHAR, yearspaid VARCHAR);
CREATE TABLE
Insert sample data into the table to show years paid. In more than one case, an employee could be paid more than once within one year.
=> INSERT INTO employee
SELECT 'contractor1', '2014'
UNION ALL
SELECT 'contractor2', '2015'
UNION ALL
SELECT 'contractor3', '2014'
UNION ALL
SELECT 'contractor1', '2014'
UNION ALL
SELECT 'contractor2', '2014'
UNION ALL
SELECT 'contractor3', '2015'
UNION ALL
SELECT 'contractor4', '2014'
UNION ALL
SELECT 'contractor4', '2014'
UNION ALL
SELECT 'contractor5', '2015'
UNION ALL
SELECT 'contractor5', '2016';
OUTPUT
--------
10
(1 row)
Query the table. The result shows employees that were paid twice in 2014 (BOOL_AND
), once or twice in 2014 (BOOL_OR
), and specifically not more than once in 2014 (BOOL_XOR
).
=> SELECT DISTINCT emptype,
BOOL_AND(yearspaid='2014') OVER (PARTITION BY emptype) AS paidtwicein2014,
BOOL_OR(yearspaid='2014') OVER (PARTITION BY emptype) AS paidonceortwicein2014,
BOOL_XOR(yearspaid='2014') OVER (PARTITION BY emptype) AS paidjustoncein2014
FROM employee;
emptype | paidtwicein2014 | paidonceortwicein2014 | paidjustoncein2014
-------------+-----------------+-----------------------+--------------------
contractor1 | t | t | f
contractor2 | f | t | t
contractor3 | f | t | t
contractor4 | t | t | f
contractor5 | f | f | f
(5 rows)
See also
6 - BOOL_XOR [analytic]
Returns the Boolean value of an expression within a.
Returns the Boolean value of an expression within a window. If only one input value is true, BOOL_XOR
returns t
. Otherwise, it returns f
.
Behavior type
Immutable
Syntax
BOOL_XOR ( expression ) OVER (
[ window-partition-clause ]
[ window-order-clause ]
[ window-frame-clause ] )
Parameters
expression
- A Boolean data type or any non-Boolean data type that can be implicitly converted to a Boolean data type. The function returns a Boolean value.
OVER()
- See Analytic Functions.
Examples
The following example illustrates how you can use the BOOL_AND
, BOOL_OR
, and BOOL_XOR
analytic functions. The sample table, employee, includes a column for type of employee and years paid.
=> CREATE TABLE employee(emptype VARCHAR, yearspaid VARCHAR);
CREATE TABLE
Insert sample data into the table to show years paid. In more than one case, an employee could be paid more than once within one year.
=> INSERT INTO employee
SELECT 'contractor1', '2014'
UNION ALL
SELECT 'contractor2', '2015'
UNION ALL
SELECT 'contractor3', '2014'
UNION ALL
SELECT 'contractor1', '2014'
UNION ALL
SELECT 'contractor2', '2014'
UNION ALL
SELECT 'contractor3', '2015'
UNION ALL
SELECT 'contractor4', '2014'
UNION ALL
SELECT 'contractor4', '2014'
UNION ALL
SELECT 'contractor5', '2015'
UNION ALL
SELECT 'contractor5', '2016';
OUTPUT
--------
10
(1 row)
Query the table. The result shows employees that were paid twice in 2014 (BOOL_AND
), once or twice in 2014 (BOOL_OR
), and specifically not more than once in 2014 (BOOL_XOR
).
=> SELECT DISTINCT emptype,
BOOL_AND(yearspaid='2014') OVER (PARTITION BY emptype) AS paidtwicein2014,
BOOL_OR(yearspaid='2014') OVER (PARTITION BY emptype) AS paidonceortwicein2014,
BOOL_XOR(yearspaid='2014') OVER (PARTITION BY emptype) AS paidjustoncein2014
FROM employee;
emptype | paidtwicein2014 | paidonceortwicein2014 | paidjustoncein2014
-------------+-----------------+-----------------------+--------------------
contractor1 | t | t | f
contractor2 | f | t | t
contractor3 | f | t | t
contractor4 | t | t | f
contractor5 | f | f | f
(5 rows)
See also
7 - CONDITIONAL_CHANGE_EVENT [analytic]
Assigns an event window number to each row, starting from 0, and increments by 1 when the result of evaluating the argument expression on the current row differs from that on the previous row.
Assigns an event window number to each row, starting from 0, and increments by 1 when the result of evaluating the argument expression on the current row differs from that on the previous row.
Behavior type
Immutable
Syntax
CONDITIONAL_CHANGE_EVENT ( expression ) OVER (
[ window-partition-clause ]
window-order-clause )
Parameters
expression
- SQL scalar expression that is evaluated on an input record. The result of *
expression
*can be of any data type.
OVER()
- See Analytic Functions.
Notes
The analytic window-order-clause
is required but the window-partition-clause
is optional.
Examples
=> SELECT CONDITIONAL_CHANGE_EVENT(bid)
OVER (PARTITION BY symbol ORDER BY ts) AS cce
FROM TickStore;
The system returns an error when no ORDER BY
clause is present:
=> SELECT CONDITIONAL_CHANGE_EVENT(bid)
OVER (PARTITION BY symbol) AS cce
FROM TickStore;
ERROR: conditional_change_event must contain an
ORDER BY clause within its analytic clause
For more examples, see Event-based windows.
See also
8 - CONDITIONAL_TRUE_EVENT [analytic]
Assigns an event window number to each row, starting from 0, and increments the number by 1 when the result of the boolean argument expression evaluates true.
Assigns an event window number to each row, starting from 0, and increments the number by 1 when the result of the boolean argument expression evaluates true. For example, given a sequence of values for column a, as follows:
( 1, 2, 3, 4, 5, 6 )
CONDITIONAL_TRUE_EVENT(a > 3)
returns 0, 0, 0, 1, 2, 3
.
Behavior type
Immutable
Syntax
CONDITIONAL_TRUE_EVENT ( boolean-expression ) OVER (
[ window-partition-clause ]
window-order-clause )
Parameters
boolean-expression
- SQL scalar expression that is evaluated on an input record, type BOOLEAN.
OVER()
- See Analytic functions.
Notes
The analytic window-order-clause
is required but the window-partition-clause
is optional.
Examples
> SELECT CONDITIONAL_TRUE_EVENT(bid > 10.6)
OVER(PARTITION BY bid ORDER BY ts) AS cte
FROM Tickstore;
The system returns an error if the ORDER BY
clause is omitted:
> SELECT CONDITIONAL_TRUE_EVENT(bid > 10.6)
OVER(PARTITION BY bid) AS cte
FROM Tickstore;
ERROR: conditional_true_event must contain an ORDER BY
clause within its analytic clause
For more examples, see Event-based windows.
See also
9 - COUNT [analytic]
Counts occurrences within a group within a.
Counts occurrences within a group within a window. If you specify * or some non-null constant, COUNT()
counts all rows.
Behavior type
Immutable
Syntax
COUNT ( expression ) OVER (
[ window-partition-clause ]
[ window-order-clause ]
[ window-frame-clause ] )
Parameters
expression
- Returns the number of rows in each group for which the
expression
is not null. Can be any expression resulting in BIGINT.
OVER()
- See Analytic Functions.
Examples
Using the schema defined in Window framing, the following COUNT
function omits window order and window frame clauses; otherwise Vertica would treat it as a window aggregate. Think of the window of reporting aggregates as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
.
=> SELECT deptno, sal, empno, COUNT(sal)
OVER (PARTITION BY deptno) AS count FROM emp;
deptno | sal | empno | count
--------+-----+-------+-------
10 | 101 | 1 | 2
10 | 104 | 4 | 2
20 | 110 | 10 | 6
20 | 110 | 9 | 6
20 | 109 | 7 | 6
20 | 109 | 6 | 6
20 | 109 | 8 | 6
20 | 109 | 11 | 6
30 | 105 | 5 | 3
30 | 103 | 3 | 3
30 | 102 | 2 | 3
Using ORDER BY sal
creates a moving window query with default window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
.
=> SELECT deptno, sal, empno, COUNT(sal)
OVER (PARTITION BY deptno ORDER BY sal) AS count
FROM emp;
deptno | sal | empno | count
--------+-----+-------+-------
10 | 101 | 1 | 1
10 | 104 | 4 | 2
20 | 100 | 11 | 1
20 | 109 | 7 | 4
20 | 109 | 6 | 4
20 | 109 | 8 | 4
20 | 110 | 10 | 6
20 | 110 | 9 | 6
30 | 102 | 2 | 1
30 | 103 | 3 | 2
30 | 105 | 5 | 3
Using the VMart schema, the following query finds the number of employees who make less than or equivalent to the hourly rate of the current employee. The query returns a running/cumulative average (sometimes called a moving average) using the default window of RANGE UNBOUNDED PRECEDING AND CURRENT ROW
:
=> SELECT employee_last_name AS "last_name", hourly_rate, COUNT(*)
OVER (ORDER BY hourly_rate) AS moving_count from employee_dimension;
last_name | hourly_rate | moving_count
------------+-------------+--------------
Gauthier | 6 | 4
Taylor | 6 | 4
Jefferson | 6 | 4
Nielson | 6 | 4
McNulty | 6.01 | 11
Robinson | 6.01 | 11
Dobisz | 6.01 | 11
Williams | 6.01 | 11
Kramer | 6.01 | 11
Miller | 6.01 | 11
Wilson | 6.01 | 11
Vogel | 6.02 | 14
Moore | 6.02 | 14
Vogel | 6.02 | 14
Carcetti | 6.03 | 19
...
To return a moving average that is not also a running (cumulative) average, the window should specify ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
:
=> SELECT employee_last_name AS "last_name", hourly_rate, COUNT(*)
OVER (ORDER BY hourly_rate ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)
AS moving_count from employee_dimension;
See also
10 - CUME_DIST [analytic]
Calculates the cumulative distribution, or relative rank, of the current row with regard to other rows in the same partition within a .
Calculates the cumulative distribution, or relative rank, of the current row with regard to other rows in the same partition within a window.
CUME_DIST()
returns a number greater then 0 and less then or equal to 1, where the number represents the relative position of the specified row within a group of n
rows. For a row x
(assuming ASC
ordering), the CUME_DIST
of x
is the number of rows with values lower than or equal to the value of x
, divided by the number of rows in the partition. For example, in a group of three rows, the cumulative distribution values returned would be 1/3, 2/3, and 3/3.
Note
Because the result for a given row depends on the number of rows preceding that row in the same partition, you should always specify a window-order-clause
when you call this function.
Behavior type
Immutable
Syntax
CUME_DIST ( ) OVER (
[ window-partition-clause ]
window-order-clause )
Parameters
OVER()
- See Analytic Functions.
Examples
The following example returns the cumulative distribution of sales for different transaction types within each month of the first quarter.
=> SELECT calendar_month_name AS month, tender_type, SUM(sales_quantity),
CUME_DIST()
OVER (PARTITION BY calendar_month_name ORDER BY SUM(sales_quantity)) AS
CUME_DIST
FROM store.store_sales_fact JOIN date_dimension
USING(date_key) WHERE calendar_month_name IN ('January','February','March')
AND tender_type NOT LIKE 'Other'
GROUP BY calendar_month_name, tender_type;
month | tender_type | SUM | CUME_DIST
----------+-------------+--------+-----------
March | Credit | 469858 | 0.25
March | Cash | 470449 | 0.5
March | Check | 473033 | 0.75
March | Debit | 475103 | 1
January | Cash | 441730 | 0.25
January | Debit | 443922 | 0.5
January | Check | 446297 | 0.75
January | Credit | 450994 | 1
February | Check | 425665 | 0.25
February | Debit | 426726 | 0.5
February | Credit | 430010 | 0.75
February | Cash | 430767 | 1
(12 rows)
See also
11 - DENSE_RANK [analytic]
Within each window partition, ranks all rows in the query results set according to the order specified by the window's ORDER BY clause.
Within each window partition, ranks all rows in the query results set according to the order specified by the window's ORDER BY
clause. A DENSE_RANK
function returns a sequence of ranking numbers without any gaps.
DENSE_RANK
executes as follows:
-
Sorts partition rows as specified by the ORDER BY
clause.
-
Compares the ORDER BY
values of the preceding row and current row and ranks the current row as follows:
-
If ORDER BY
values are the same, the current row gets the same ranking as the preceding row.
Note
Null values are considered equal. For detailed information on how null values are sorted, see
NULL sort order.
-
If the ORDER BY
values are different, DENSE_RANK
increments or decrements the current row's ranking by 1, depending whether sort order is ascending or descending.
DENSE_RANK
always changes the ranking by 1, so no gaps appear in the ranking sequence. The largest rank value is the number of unique ORDER BY
values returned by the query.
Behavior type
Immutable
Syntax
DENSE_RANK() OVER (
[ window-partition-clause ]
window-order-clause )
Parameters
OVER()
- See Analytic Functions.
See Analytic Functions
Compared with RANK
RANK
leaves gaps in the ranking sequence, while DENSE_RANK
does not. The example below compares the behavior of the two functions.
Examples
The following query invokes RANK
and DENSE_RANK
to rank customers by annual income. The two functions return different rankings, as follows:
-
If annual_salary
contains duplicate values, RANK()
inserts duplicate rankings and then skips one or more values—for example, from 4 to 6 and 7 to 9.
-
In the parallel column Dense Rank
, DENSE_RANK()
also inserts duplicate rankings, but leaves no gaps in the rankings sequence:
=> SELECT employee_region region, employee_key, annual_salary,
RANK() OVER (PARTITION BY employee_region ORDER BY annual_salary) Rank,
DENSE_RANK() OVER (PARTITION BY employee_region ORDER BY annual_salary) "Dense Rank"
FROM employee_dimension;
region | employee_key | annual_salary | Rank | Dense Rank
----------------------------------+--------------+---------------+------+------------
West | 5248 | 1200 | 1 | 1
West | 6880 | 1204 | 2 | 2
West | 5700 | 1214 | 3 | 3
West | 9857 | 1218 | 4 | 4
West | 6014 | 1218 | 4 | 4
West | 9221 | 1220 | 6 | 5
West | 7646 | 1222 | 7 | 6
West | 6621 | 1222 | 7 | 6
West | 6488 | 1224 | 9 | 7
West | 7659 | 1226 | 10 | 8
West | 7432 | 1226 | 10 | 8
West | 9905 | 1226 | 10 | 8
West | 9021 | 1228 | 13 | 9
...
West | 56 | 963104 | 2794 | 2152
West | 100 | 992363 | 2795 | 2153
East | 8353 | 1200 | 1 | 1
East | 9743 | 1202 | 2 | 2
East | 9975 | 1202 | 2 | 2
East | 9205 | 1204 | 4 | 3
East | 8894 | 1206 | 5 | 4
East | 7740 | 1206 | 5 | 4
East | 7324 | 1208 | 7 | 5
East | 6505 | 1208 | 7 | 5
East | 5404 | 1208 | 7 | 5
East | 5010 | 1208 | 7 | 5
East | 9114 | 1212 | 11 | 6
...
See also
SQL analytics
12 - EXPONENTIAL_MOVING_AVERAGE [analytic]
Calculates the exponential moving average (EMA) of expression E with smoothing factor X.
Calculates the exponential moving average (EMA) of expression E
with smoothing factor X
. An EMA differs from a simple moving average in that it provides a more stable picture of changes to data over time.
The EMA is calculated by adding the previous EMA value to the current data point scaled by the smoothing factor, as in the following formula:
EMA
=
EMA0
+ (
X
* (
E
-
EMA0
))
where:
-
E
is the current data point
-
EMA0
is the previous row's EMA value.
-
X
is the smoothing factor.
This function also works at the row level. For example, EMA assumes the data in a given column is sampled at uniform intervals. If the users' data points are sampled at non-uniform intervals, they should run the time series gap filling and interpolation (GFI) operations before EMA()
Behavior type
Immutable
Syntax
EXPONENTIAL_MOVING_AVERAGE ( E, X ) OVER (
[ window-partition-clause ]
window-order-clause )
Parameters
E
- The value whose average is calculated over a set of rows. Can be
INTEGER
, FLOAT
or NUMERIC
type and must be a constant.
X
- A positive
FLOAT
value between 0 and 1 that is used as the smoothing factor.
OVER()
- See Analytic Functions.
Examples
The following example uses time series gap filling and interpolation (GFI) first in a subquery, and then performs an EXPONENTIAL_MOVING_AVERAGE
operation on the subquery result.
Create a simple four-column table:
=> CREATE TABLE ticker(
time TIMESTAMP,
symbol VARCHAR(8),
bid1 FLOAT,
bid2 FLOAT );
Insert some data, including nulls, so GFI can do its interpolation and gap filling:
=> INSERT INTO ticker VALUES ('2009-07-12 03:00:00', 'ABC', 60.45, 60.44);
=> INSERT INTO ticker VALUES ('2009-07-12 03:00:01', 'ABC', 60.49, 65.12);
=> INSERT INTO ticker VALUES ('2009-07-12 03:00:02', 'ABC', 57.78, 59.25);
=> INSERT INTO ticker VALUES ('2009-07-12 03:00:03', 'ABC', null, 65.12);
=> INSERT INTO ticker VALUES ('2009-07-12 03:00:04', 'ABC', 67.88, null);
=> INSERT INTO ticker VALUES ('2009-07-12 03:00:00', 'XYZ', 47.55, 40.15);
=> INSERT INTO ticker VALUES ('2009-07-12 03:00:01', 'XYZ', 44.35, 46.78);
=> INSERT INTO ticker VALUES ('2009-07-12 03:00:02', 'XYZ', 71.56, 75.78);
=> INSERT INTO ticker VALUES ('2009-07-12 03:00:03', 'XYZ', 85.55, 70.21);
=> INSERT INTO ticker VALUES ('2009-07-12 03:00:04', 'XYZ', 45.55, 58.65);
=> COMMIT;
Note
During gap filling and interpolation, Vertica takes the closest non null value on either side of the time slice and uses that value. For example, if you use a linear interpolation scheme and you do not specify
IGNORE NULLS
, and your data has one real value and one null, the result is null. If the value on either side is null, the result is null. See
When Time Series Data Contains Nulls for details.
Query the table that you just created to you can see the output:
=> SELECT * FROM ticker;
time | symbol | bid1 | bid2
---------------------+--------+-------+-------
2009-07-12 03:00:00 | ABC | 60.45 | 60.44
2009-07-12 03:00:01 | ABC | 60.49 | 65.12
2009-07-12 03:00:02 | ABC | 57.78 | 59.25
2009-07-12 03:00:03 | ABC | | 65.12
2009-07-12 03:00:04 | ABC | 67.88 |
2009-07-12 03:00:00 | XYZ | 47.55 | 40.15
2009-07-12 03:00:01 | XYZ | 44.35 | 46.78
2009-07-12 03:00:02 | XYZ | 71.56 | 75.78
2009-07-12 03:00:03 | XYZ | 85.55 | 70.21
2009-07-12 03:00:04 | XYZ | 45.55 | 58.65
(10 rows)
The following query processes the first and last values that belong to each 2-second time slice in table trades
' column a
. The query then calculates the exponential moving average of expression fv and lv with a smoothing factor of 50%:
=> SELECT symbol, slice_time, fv, lv,
EXPONENTIAL_MOVING_AVERAGE(fv, 0.5)
OVER (PARTITION BY symbol ORDER BY slice_time) AS ema_first,
EXPONENTIAL_MOVING_AVERAGE(lv, 0.5)
OVER (PARTITION BY symbol ORDER BY slice_time) AS ema_last
FROM (
SELECT symbol, slice_time,
TS_FIRST_VALUE(bid1 IGNORE NULLS) as fv,
TS_LAST_VALUE(bid2 IGNORE NULLS) AS lv
FROM ticker TIMESERIES slice_time AS '2 seconds'
OVER (PARTITION BY symbol ORDER BY time) ) AS sq;
symbol | slice_time | fv | lv | ema_first | ema_last
--------+---------------------+-------+-------+-----------+----------
ABC | 2009-07-12 03:00:00 | 60.45 | 65.12 | 60.45 | 65.12
ABC | 2009-07-12 03:00:02 | 57.78 | 65.12 | 59.115 | 65.12
ABC | 2009-07-12 03:00:04 | 67.88 | 65.12 | 63.4975 | 65.12
XYZ | 2009-07-12 03:00:00 | 47.55 | 46.78 | 47.55 | 46.78
XYZ | 2009-07-12 03:00:02 | 71.56 | 70.21 | 59.555 | 58.495
XYZ | 2009-07-12 03:00:04 | 45.55 | 58.65 | 52.5525 | 58.5725
(6 rows)
See also
13 - FIRST_VALUE [analytic]
Lets you select the first value of a table or partition (determined by the window-order-clause) without having to use a self join.
Lets you select the first value of a table or partition (determined by the window-order-clause
) without having to use a self join. This function is useful when you want to use the first value as a baseline in calculations.
Use FIRST_VALUE()
with the window-order-clause
to produce deterministic results. If no window is specified for the current row, the default window is UNBOUNDED PRECEDING AND CURRENT ROW
.
Behavior type
Immutable
Syntax
FIRST_VALUE ( expression [ IGNORE NULLS ] ) OVER (
[ window-partition-clause ]
[ window-order-clause ]
[ window-frame-clause ] )
Parameters
expression
- Expression to evaluate—or example, a constant, column, nonanalytic function, function expression, or expressions involving any of these.
IGNORE NULLS
- Specifies to return the first non-null value in the set, or
NULL
if all values are NULL
. If you omit this option and the first value in the set is null, the function returns NULL
.
OVER()
- See Analytic Functions.
Examples
The following query asks for the first value in the partitioned day of week, and illustrates the potential nondeterministic nature of FIRST_VALUE()
:
=> SELECT calendar_year, date_key, day_of_week, full_date_description,
FIRST_VALUE(full_date_description)
OVER(PARTITION BY calendar_month_number_in_year ORDER BY day_of_week)
AS "first_value"
FROM date_dimension
WHERE calendar_year=2003 AND calendar_month_number_in_year=1;
The first value returned is January 31, 2003; however, the next time the same query is run, the first value might be January 24 or January 3, or the 10th or 17th. This is because the analytic ORDER BY
column day_of_week
returns rows that contain ties (multiple Fridays). These repeated values make the ORDER BY
evaluation result nondeterministic, because rows that contain ties can be ordered in any way, and any one of those rows qualifies as being the first value of day_of_week
.
calendar_year | date_key | day_of_week | full_date_description | first_value
--------------+----------+-------------+-----------------------+------------------
2003 | 31 | Friday | January 31, 2003 | January 31, 2003
2003 | 24 | Friday | January 24, 2003 | January 31, 2003
2003 | 3 | Friday | January 3, 2003 | January 31, 2003
2003 | 10 | Friday | January 10, 2003 | January 31, 2003
2003 | 17 | Friday | January 17, 2003 | January 31, 2003
2003 | 6 | Monday | January 6, 2003 | January 31, 2003
2003 | 27 | Monday | January 27, 2003 | January 31, 2003
2003 | 13 | Monday | January 13, 2003 | January 31, 2003
2003 | 20 | Monday | January 20, 2003 | January 31, 2003
2003 | 11 | Saturday | January 11, 2003 | January 31, 2003
2003 | 18 | Saturday | January 18, 2003 | January 31, 2003
2003 | 25 | Saturday | January 25, 2003 | January 31, 2003
2003 | 4 | Saturday | January 4, 2003 | January 31, 2003
2003 | 12 | Sunday | January 12, 2003 | January 31, 2003
2003 | 26 | Sunday | January 26, 2003 | January 31, 2003
2003 | 5 | Sunday | January 5, 2003 | January 31, 2003
2003 | 19 | Sunday | January 19, 2003 | January 31, 2003
2003 | 23 | Thursday | January 23, 2003 | January 31, 2003
2003 | 2 | Thursday | January 2, 2003 | January 31, 2003
2003 | 9 | Thursday | January 9, 2003 | January 31, 2003
2003 | 16 | Thursday | January 16, 2003 | January 31, 2003
2003 | 30 | Thursday | January 30, 2003 | January 31, 2003
2003 | 21 | Tuesday | January 21, 2003 | January 31, 2003
2003 | 14 | Tuesday | January 14, 2003 | January 31, 2003
2003 | 7 | Tuesday | January 7, 2003 | January 31, 2003
2003 | 28 | Tuesday | January 28, 2003 | January 31, 2003
2003 | 22 | Wednesday | January 22, 2003 | January 31, 2003
2003 | 29 | Wednesday | January 29, 2003 | January 31, 2003
2003 | 15 | Wednesday | January 15, 2003 | January 31, 2003
2003 | 1 | Wednesday | January 1, 2003 | January 31, 2003
2003 | 8 | Wednesday | January 8, 2003 | January 31, 2003
(31 rows)
Note
The day_of_week
results are returned in alphabetical order because of lexical rules. The fact that each day does not appear ordered by the 7-day week cycle (for example, starting with Sunday followed by Monday, Tuesday, and so on) has no affect on results.
To return deterministic results, modify the query so that it performs its analytic ORDER BY
operations on a unique field, such as date_key
:
=> SELECT calendar_year, date_key, day_of_week, full_date_description,
FIRST_VALUE(full_date_description) OVER
(PARTITION BY calendar_month_number_in_year ORDER BY date_key) AS "first_value"
FROM date_dimension WHERE calendar_year=2003;
FIRST_VALUE()
returns a first value of January 1 for the January partition and the first value of February 1 for the February partition. Also, the full_date_description
column contains no ties:
calendar_year | date_key | day_of_week | full_date_description | first_value
---------------+----------+-------------+-----------------------+------------
2003 | 1 | Wednesday | January 1, 2003 | January 1, 2003
2003 | 2 | Thursday | January 2, 2003 | January 1, 2003
2003 | 3 | Friday | January 3, 2003 | January 1, 2003
2003 | 4 | Saturday | January 4, 2003 | January 1, 2003
2003 | 5 | Sunday | January 5, 2003 | January 1, 2003
2003 | 6 | Monday | January 6, 2003 | January 1, 2003
2003 | 7 | Tuesday | January 7, 2003 | January 1, 2003
2003 | 8 | Wednesday | January 8, 2003 | January 1, 2003
2003 | 9 | Thursday | January 9, 2003 | January 1, 2003
2003 | 10 | Friday | January 10, 2003 | January 1, 2003
2003 | 11 | Saturday | January 11, 2003 | January 1, 2003
2003 | 12 | Sunday | January 12, 2003 | January 1, 2003
2003 | 13 | Monday | January 13, 2003 | January 1, 2003
2003 | 14 | Tuesday | January 14, 2003 | January 1, 2003
2003 | 15 | Wednesday | January 15, 2003 | January 1, 2003
2003 | 16 | Thursday | January 16, 2003 | January 1, 2003
2003 | 17 | Friday | January 17, 2003 | January 1, 2003
2003 | 18 | Saturday | January 18, 2003 | January 1, 2003
2003 | 19 | Sunday | January 19, 2003 | January 1, 2003
2003 | 20 | Monday | January 20, 2003 | January 1, 2003
2003 | 21 | Tuesday | January 21, 2003 | January 1, 2003
2003 | 22 | Wednesday | January 22, 2003 | January 1, 2003
2003 | 23 | Thursday | January 23, 2003 | January 1, 2003
2003 | 24 | Friday | January 24, 2003 | January 1, 2003
2003 | 25 | Saturday | January 25, 2003 | January 1, 2003
2003 | 26 | Sunday | January 26, 2003 | January 1, 2003
2003 | 27 | Monday | January 27, 2003 | January 1, 2003
2003 | 28 | Tuesday | January 28, 2003 | January 1, 2003
2003 | 29 | Wednesday | January 29, 2003 | January 1, 2003
2003 | 30 | Thursday | January 30, 2003 | January 1, 2003
2003 | 31 | Friday | January 31, 2003 | January 1, 2003
2003 | 32 | Saturday | February 1, 2003 | February 1, 2003
2003 | 33 | Sunday | February 2, 2003 | February 1,2003
...
(365 rows)
See also
14 - LAG [analytic]
Returns the value of the input expression at the given offset before the current row within a.
Returns the value of the input expression at the given offset before the current row within a window. This function lets you access more than one row in a table at the same time. This is useful for comparing values when the relative positions of rows can be reliably known. It also lets you avoid the more costly self join, which enhances query processing speed.
For information on getting the rows that follow, see LEAD.
Behavior type
Immutable
Syntax
LAG ( expression[, offset ] [, default ] ) OVER (
[ window-partition-clause ]
window-order-clause )
Parameters
expression
- The expression to evaluate—for example, a constant, column, non-analytic function, function expression, or expressions involving any of these.
offset
- Indicates how great is the lag. The default value is 1 (the previous row). This parameter must evaluate to a constant positive integer.
default
- The value returned if
offset
falls outside the bounds of the table or partition. This value must be a constant value or an expression that can be evaluated to a constant; its data type is coercible to that of the first argument.
Examples
This example sums the current balance by date in a table and also sums the previous balance from the last day. Given the inputs that follow, the data satisfies the following conditions:
-
For each some_id
, there is exactly 1 row for each date represented by month_date
.
-
For each some_id
, the set of dates is consecutive; that is, if there is a row for February 24 and a row for February 26, there would also be a row for February 25.
-
Each some_id
has the same set of dates.
=> CREATE TABLE balances (
month_date DATE,
current_bal INT,
some_id INT);
=> INSERT INTO balances values ('2009-02-24', 10, 1);
=> INSERT INTO balances values ('2009-02-25', 10, 1);
=> INSERT INTO balances values ('2009-02-26', 10, 1);
=> INSERT INTO balances values ('2009-02-24', 20, 2);
=> INSERT INTO balances values ('2009-02-25', 20, 2);
=> INSERT INTO balances values ('2009-02-26', 20, 2);
=> INSERT INTO balances values ('2009-02-24', 30, 3);
=> INSERT INTO balances values ('2009-02-25', 20, 3);
=> INSERT INTO balances values ('2009-02-26', 30, 3);
Now run LAG to sum the current balance for each date and sum the previous balance from the last day:
=> SELECT month_date,
SUM(current_bal) as current_bal_sum,
SUM(previous_bal) as previous_bal_sum FROM
(SELECT month_date, current_bal,
LAG(current_bal, 1, 0) OVER
(PARTITION BY some_id ORDER BY month_date)
AS previous_bal FROM balances) AS subQ
GROUP BY month_date ORDER BY month_date;
month_date | current_bal_sum | previous_bal_sum
------------+-----------------+------------------
2009-02-24 | 60 | 0
2009-02-25 | 50 | 60
2009-02-26 | 60 | 50
(3 rows)
Using the same example data, the following query would not be allowed because LAG is nested inside an aggregate function:
=> SELECT month_date,
SUM(current_bal) as current_bal_sum,
SUM(LAG(current_bal, 1, 0) OVER
(PARTITION BY some_id ORDER BY month_date)) AS previous_bal_sum
FROM some_table GROUP BY month_date ORDER BY month_date;
The following example uses the VMart database. LAG first returns the annual income from the previous row, and then it calculates the difference between the income in the current row from the income in the previous row:
=> SELECT occupation, customer_key, customer_name, annual_income,
LAG(annual_income, 1, 0) OVER (PARTITION BY occupation
ORDER BY annual_income) AS prev_income, annual_income -
LAG(annual_income, 1, 0) OVER (PARTITION BY occupation
ORDER BY annual_income) AS difference
FROM customer_dimension ORDER BY occupation, customer_key LIMIT 20;
occupation | customer_key | customer_name | annual_income | prev_income | difference
------------+--------------+----------------------+---------------+-------------+------------
Accountant | 15 | Midori V. Peterson | 692610 | 692535 | 75
Accountant | 43 | Midori S. Rodriguez | 282359 | 280976 | 1383
Accountant | 93 | Robert P. Campbell | 471722 | 471355 | 367
Accountant | 102 | Sam T. McNulty | 901636 | 901561 | 75
Accountant | 134 | Martha B. Overstreet | 705146 | 704335 | 811
Accountant | 165 | James C. Kramer | 376841 | 376474 | 367
Accountant | 225 | Ben W. Farmer | 70574 | 70449 | 125
Accountant | 270 | Jessica S. Lang | 684204 | 682274 | 1930
Accountant | 273 | Mark X. Lampert | 723294 | 722737 | 557
Accountant | 295 | Sharon K. Gauthier | 29033 | 28412 | 621
Accountant | 338 | Anna S. Jackson | 816858 | 815557 | 1301
Accountant | 377 | William I. Jones | 915149 | 914872 | 277
Accountant | 438 | Joanna A. McCabe | 147396 | 144482 | 2914
Accountant | 452 | Kim P. Brown | 126023 | 124797 | 1226
Accountant | 467 | Meghan K. Carcetti | 810528 | 810284 | 244
Accountant | 478 | Tanya E. Greenwood | 639649 | 639029 | 620
Accountant | 511 | Midori P. Vogel | 187246 | 185539 | 1707
Accountant | 525 | Alexander K. Moore | 677433 | 677050 | 383
Accountant | 550 | Sam P. Reyes | 735691 | 735355 | 336
Accountant | 577 | Robert U. Vu | 616101 | 615439 | 662
(20 rows)
The next example uses LEAD and LAG to return the third row after the salary in the current row and fifth salary before the salary in the current row:
=> SELECT hire_date, employee_key, employee_last_name,
LEAD(hire_date, 1) OVER (ORDER BY hire_date) AS "next_hired" ,
LAG(hire_date, 1) OVER (ORDER BY hire_date) AS "last_hired"
FROM employee_dimension ORDER BY hire_date, employee_key;
hire_date | employee_key | employee_last_name | next_hired | last_hired
------------+--------------+--------------------+------------+------------
1956-04-11 | 2694 | Farmer | 1956-05-12 |
1956-05-12 | 5486 | Winkler | 1956-09-18 | 1956-04-11
1956-09-18 | 5525 | McCabe | 1957-01-15 | 1956-05-12
1957-01-15 | 560 | Greenwood | 1957-02-06 | 1956-09-18
1957-02-06 | 9781 | Bauer | 1957-05-25 | 1957-01-15
1957-05-25 | 9506 | Webber | 1957-07-04 | 1957-02-06
1957-07-04 | 6723 | Kramer | 1957-07-07 | 1957-05-25
1957-07-07 | 5827 | Garnett | 1957-11-11 | 1957-07-04
1957-11-11 | 373 | Reyes | 1957-11-21 | 1957-07-07
1957-11-21 | 3874 | Martin | 1958-02-06 | 1957-11-11
(10 rows)
See also
15 - LAST_VALUE [analytic]
Lets you select the last value of a table or partition (determined by the window-order-clause) without having to use a self join.
Lets you select the last value of a table or partition (determined by the window-order-clause
) without having to use a self join. LAST_VALUE
takes the last record from the partition after the window order clause. The function then computes the expression against the last record, and returns the results. This function is useful when you want to use the last value as a baseline in calculations.
Use LAST_VALUE()
with the window-order-clause
to produce deterministic results. If no window is specified for the current row, the default window is UNBOUNDED PRECEDING AND CURRENT ROW
.
Tip
Due to default window semantics,
LAST_VALUE
does not always return the last value of a partition. If you omit
window-frame-clause from the analytic clause,
LAST_VALUE
operates on this default window. Although results can seem non-intuitive by not returning the bottom of the current partition, it returns the bottom of the window, which continues to change along with the current input row being processed. If you want to return the last value of a partition, use
UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
. See examples below.
Behavior type
Immutable
Syntax
LAST_VALUE ( expression [ IGNORE NULLS ] ) OVER (
[ window-partition-clause ]
[ window-order-clause ]
[ window-frame-clause ] )
Parameters
expression
- Expression to evaluate—for example, a constant, column, nonanalytic function, function expression, or expressions involving any of these.
IGNORE NULLS
- Specifies to return the last non-null value in the set, or
NULL
if all values are NULL
. If you omit this option and the last value in the set is null, the function returns NULL
.
OVER()
- See Analytic Functions.
Examples
Using the schema defined in Window framing in Analyzing Data, the following query does not show the highest salary value by department; instead it shows the highest salary value by department by salary.
=> SELECT deptno, sal, empno, LAST_VALUE(sal)
OVER (PARTITION BY deptno ORDER BY sal) AS lv
FROM emp;
deptno | sal | empno | lv
--------+-----+-------+--------
10 | 101 | 1 | 101
10 | 104 | 4 | 104
20 | 100 | 11 | 100
20 | 109 | 7 | 109
20 | 109 | 6 | 109
20 | 109 | 8 | 109
20 | 110 | 10 | 110
20 | 110 | 9 | 110
30 | 102 | 2 | 102
30 | 103 | 3 | 103
30 | 105 | 5 | 105
If you include the window frame clause ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
, LAST_VALUE()
returns the highest salary by department, an accurate representation of the information:
=> SELECT deptno, sal, empno, LAST_VALUE(sal)
OVER (PARTITION BY deptno ORDER BY sal
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS lv
FROM emp;
deptno | sal | empno | lv
--------+-----+-------+--------
10 | 101 | 1 | 104
10 | 104 | 4 | 104
20 | 100 | 11 | 110
20 | 109 | 7 | 110
20 | 109 | 6 | 110
20 | 109 | 8 | 110
20 | 110 | 10 | 110
20 | 110 | 9 | 110
30 | 102 | 2 | 105
30 | 103 | 3 | 105
30 | 105 | 5 | 105
For more examples, see FIRST_VALUE().
See also
16 - LEAD [analytic]
Returns values from the row after the current row within a , letting you access more than one row in a table at the same time.
Returns values from the row after the current row within a window, letting you access more than one row in a table at the same time. This is useful for comparing values when the relative positions of rows can be reliably known. It also lets you avoid the more costly self join, which enhances query processing speed.
Behavior type
Immutable
Syntax
LEAD ( expression[, offset ] [, default ] ) OVER (
[ window-partition-clause ]
window-order-clause )
Parameters
expression
- The expression to evaluate—for example, a constant, column, non-analytic function, function expression, or expressions involving any of these.
offset
- Is an optional parameter that defaults to 1 (the next row). This parameter must evaluate to a constant positive integer.
default
- The value returned if
offset
falls outside the bounds of the table or partition. This value must be a constant value or an expression that can be evaluated to a constant; its data type is coercible to that of the first argument.
Examples
LEAD
finds the hire date of the employee hired just after the current row:
=> SELECT employee_region, hire_date, employee_key, employee_last_name,
LEAD(hire_date, 1) OVER (PARTITION BY employee_region ORDER BY hire_date) AS "next_hired"
FROM employee_dimension ORDER BY employee_region, hire_date, employee_key;
employee_region | hire_date | employee_key | employee_last_name | next_hired
-------------------+------------+--------------+--------------------+------------
East | 1956-04-08 | 9218 | Harris | 1957-02-06
East | 1957-02-06 | 7799 | Stein | 1957-05-25
East | 1957-05-25 | 3687 | Farmer | 1957-06-26
East | 1957-06-26 | 9474 | Bauer | 1957-08-18
East | 1957-08-18 | 570 | Jefferson | 1957-08-24
East | 1957-08-24 | 4363 | Wilson | 1958-02-17
East | 1958-02-17 | 6457 | McCabe | 1958-06-26
East | 1958-06-26 | 6196 | Li | 1958-07-16
East | 1958-07-16 | 7749 | Harris | 1958-09-18
East | 1958-09-18 | 9678 | Sanchez | 1958-11-10
(10 rows)
The next example uses LEAD
and LAG
to return the third row after the salary in the current row and fifth salary before the salary in the current row.
=> SELECT hire_date, employee_key, employee_last_name,
LEAD(hire_date, 1) OVER (ORDER BY hire_date) AS "next_hired" ,
LAG(hire_date, 1) OVER (ORDER BY hire_date) AS "last_hired"
FROM employee_dimension ORDER BY hire_date, employee_key;
hire_date | employee_key | employee_last_name | next_hired | last_hired
------------+--------------+--------------------+------------+------------
1956-04-11 | 2694 | Farmer | 1956-05-12 |
1956-05-12 | 5486 | Winkler | 1956-09-18 | 1956-04-11
1956-09-18 | 5525 | McCabe | 1957-01-15 | 1956-05-12
1957-01-15 | 560 | Greenwood | 1957-02-06 | 1956-09-18
1957-02-06 | 9781 | Bauer | 1957-05-25 | 1957-01-15
1957-05-25 | 9506 | Webber | 1957-07-04 | 1957-02-06
1957-07-04 | 6723 | Kramer | 1957-07-07 | 1957-05-25
1957-07-07 | 5827 | Garnett | 1957-11-11 | 1957-07-04
1957-11-11 | 373 | Reyes | 1957-11-21 | 1957-07-07
1957-11-21 | 3874 | Martin | 1958-02-06 | 1957-11-11
(10 rows)
The following example returns employee name and salary, along with the next highest and lowest salaries.
=> SELECT employee_last_name, annual_salary,
NVL(LEAD(annual_salary) OVER (ORDER BY annual_salary),
MIN(annual_salary) OVER()) "Next Highest",
NVL(LAG(annual_salary) OVER (ORDER BY annual_salary),
MAX(annual_salary) OVER()) "Next Lowest"
FROM employee_dimension;
employee_last_name | annual_salary | Next Highest | Next Lowest
--------------------+---------------+--------------+-------------
Nielson | 1200 | 1200 | 995533
Lewis | 1200 | 1200 | 1200
Harris | 1200 | 1202 | 1200
Robinson | 1202 | 1202 | 1200
Garnett | 1202 | 1202 | 1202
Weaver | 1202 | 1202 | 1202
Nielson | 1202 | 1202 | 1202
McNulty | 1202 | 1204 | 1202
Farmer | 1204 | 1204 | 1202
Martin | 1204 | 1204 | 1204
(10 rows)
The next example returns, for each assistant director in the employees table, the hire date of the director hired just after the director on the current row. For example, Jackson was hired on 2016-12-28, and the next director hired was Bauer:
=> SELECT employee_last_name, hire_date,
LEAD(hire_date, 1) OVER (ORDER BY hire_date DESC) as "NextHired"
FROM employee_dimension WHERE job_title = 'Assistant Director';
employee_last_name | hire_date | NextHired
--------------------+------------+------------
Jackson | 2016-12-28 | 2016-12-26
Bauer | 2016-12-26 | 2016-12-11
Miller | 2016-12-11 | 2016-12-07
Fortin | 2016-12-07 | 2016-11-27
Harris | 2016-11-27 | 2016-11-15
Goldberg | 2016-11-15 |
(5 rows)
See also
17 - MAX [analytic]
Returns the maximum value of an expression within a.
Returns the maximum value of an expression within a window. The return value has the same type as the expression data type.
The analytic functions MIN()
and MAX()
can operate with Boolean values. The MAX()
function acts upon a Boolean data type or a value that can be implicitly converted to a Boolean value. If at least one input value is true, MAX()
returns t
(true). Otherwise, it returns f
(false). In the same scenario, the MIN()
function returns t
(true) if all input values are true. Otherwise, it returns f
.
Behavior type
Immutable
Syntax
MAX ( expression ) OVER (
[ window-partition-clause ]
[ window-order-clause ]
[ window-frame-clause ] )
Parameters
expression
- Any expression for which the maximum value is calculated, typically a column reference.
OVER()
- See Analytic Functions.
Examples
The following query computes the deviation between the employees' annual salary and the maximum annual salary in Massachusetts:
=> SELECT employee_state, annual_salary,
MAX(annual_salary)
OVER(PARTITION BY employee_state ORDER BY employee_key) max,
annual_salary- MAX(annual_salary)
OVER(PARTITION BY employee_state ORDER BY employee_key) diff
FROM employee_dimension
WHERE employee_state = 'MA';
employee_state | annual_salary | max | diff
----------------+---------------+--------+---------
MA | 1918 | 995533 | -993615
MA | 2058 | 995533 | -993475
MA | 2586 | 995533 | -992947
MA | 2500 | 995533 | -993033
MA | 1318 | 995533 | -994215
MA | 2072 | 995533 | -993461
MA | 2656 | 995533 | -992877
MA | 2148 | 995533 | -993385
MA | 2366 | 995533 | -993167
MA | 2664 | 995533 | -992869
(10 rows)
The following example shows you the difference between the MIN
and MAX
analytic functions when you use them with a Boolean value. The sample creates a table with two columns, adds two rows of data, and shows sample output for MIN
and MAX
.
CREATE TABLE min_max_functions (emp VARCHAR, torf BOOL);
INSERT INTO min_max_functions VALUES ('emp1', 1);
INSERT INTO min_max_functions VALUES ('emp1', 0);
SELECT DISTINCT emp,
min(torf) OVER (PARTITION BY emp) AS worksasbooleanand,
Max(torf) OVER (PARTITION BY emp) AS worksasbooleanor
FROM min_max_functions;
emp | worksasbooleanand | worksasbooleanor
------+-------------------+------------------
emp1 | f | t
(1 row)
See also
18 - MEDIAN [analytic]
For each row, returns the median value of a value set within each partition.
For each row, returns the median value of a value set within each partition. MEDIAN
determines the argument with the highest numeric precedence, implicitly converts the remaining arguments to that data type, and returns that data type.
MEDIAN
is an alias of
PERCENTILE_CONT [analytic]
with an argument of 0.5 (50%).
Behavior type
Immutable
Syntax
MEDIAN ( expression ) OVER ( [ window-partition-clause ] )
Parameters
expression
- Any
NUMERIC
data type or any non-numeric data type that can be implicitly converted to a numeric data type. The function returns the middle value or an interpolated value that would be the middle value once the values are sorted. Null values are ignored in the calculation.
OVER()
- If the
OVER
clause specifies window-partition-clause
, MEDIAN
groups input rows according to one or more columns or expressions. If this clause is omitted, no grouping occurs and MEDIAN
processes all input rows as a single partition.
Examples
See Calculating a median value
See also
19 - MIN [analytic]
Returns the minimum value of an expression within a.
Returns the minimum value of an expression within a window. The return value has the same type as the expression data type.
The analytic functions MIN()
and MAX()
can operate with Boolean values. The MAX()
function acts upon a Boolean data type or a value that can be implicitly converted to a Boolean value. If at least one input value is true, MAX()
returns t
(true). Otherwise, it returns f
(false). In the same scenario, the MIN()
function returns t
(true) if all input values are true. Otherwise, it returns f
.
Behavior type
Immutable
Syntax
MIN ( expression ) OVER (
[ window-partition-clause ]
[ window-order-clause ]
[ window-frame-clause ] )
Parameters
expression
- Any expression for which the minimum value is calculated, typically a column reference.
OVER()
- See Analytic Functions.
Examples
The following example shows how you can query to determine the deviation between the employees' annual salary and the minimum annual salary in Massachusetts:
=> SELECT employee_state, annual_salary,
MIN(annual_salary)
OVER(PARTITION BY employee_state ORDER BY employee_key) min,
annual_salary- MIN(annual_salary)
OVER(PARTITION BY employee_state ORDER BY employee_key) diff
FROM employee_dimension
WHERE employee_state = 'MA';
employee_state | annual_salary | min | diff
----------------+---------------+------+------
MA | 1918 | 1204 | 714
MA | 2058 | 1204 | 854
MA | 2586 | 1204 | 1382
MA | 2500 | 1204 | 1296
MA | 1318 | 1204 | 114
MA | 2072 | 1204 | 868
MA | 2656 | 1204 | 1452
MA | 2148 | 1204 | 944
MA | 2366 | 1204 | 1162
MA | 2664 | 1204 | 1460
(10 rows)
The following example shows you the difference between the MIN
and MAX
analytic functions when you use them with a Boolean value. The sample creates a table with two columns, adds two rows of data, and shows sample output for MIN
and MAX
.
CREATE TABLE min_max_functions (emp VARCHAR, torf BOOL);
INSERT INTO min_max_functions VALUES ('emp1', 1);
INSERT INTO min_max_functions VALUES ('emp1', 0);
SELECT DISTINCT emp,
min(torf) OVER (PARTITION BY emp) AS worksasbooleanand,
Max(torf) OVER (PARTITION BY emp) AS worksasbooleanor
FROM min_max_functions;
emp | worksasbooleanand | worksasbooleanor
------+-------------------+------------------
emp1 | f | t
(1 row)
See also
20 - NTH_VALUE [analytic]
Returns the value evaluated at the row that is the nth row of the window (counting from 1).
Returns the value evaluated at the row that is the *n
*th row of the window (counting from 1). If the specified row does not exist, NTH_VALUE returns NULL
.
Behavior type
Immutable
Syntax
NTH_VALUE ( expression, row-number [ IGNORE NULLS ] ) OVER (
[ window-frame-clause ]
[ window-order-clause ])
Parameters
expression
- Expression to evaluate. The expression can be a constant, column name, nonanalytic function, function expression, or expressions that include any of these.
row-number
- Specifies the row to evaluate, where
row-number
evaluates to an integer ≥ 1.
IGNORE NULLS
- Specifies to return the first non-
NULL
value in the set, or NULL
if all values are NULL
.
OVER()
- See Analytic Functions.
Examples
In the following example, for each tuple (current row) in table t1
, the window frame clause defines the window as follows:
ORDER BY b ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
For each window, n
for *n
*th value is a+1
. a
is the value of column a
in the tuple.
NTH_VALUE returns the result of the expression b+1
, where b
is the value of column b
in the *n
*th row, which is the a+1
row within the window.
=> SELECT * FROM t1 ORDER BY a;
a | b
---+----
1 | 10
2 | 20
2 | 21
3 | 30
4 | 40
5 | 50
6 | 60
(7 rows)
=> SELECT NTH_VALUE(b+1, a+1) OVER
(ORDER BY b ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) FROM t1;
?column?
----------
22
31
(7 rows)
21 - NTILE [analytic]
Equally divides an ordered data set (partition) into a {value} number of subsets within a , where the subsets are numbered 1 through the value in parameter constant-value.
Equally divides an ordered data set (partition) into a {
value
}
number of subsets within a window, where the subsets are numbered 1 through the value in parameter constant-value
. For example, if constant-value
= 4 and the partition contains 20 rows, NTILE
divides the partition rows into four equal subsets of five rows. NTILE
assigns each row to a subset by giving row a number from 1 to 4. The rows in the first subset are assigned 1, the next five are assigned 2, and so on.
If the number of partition rows is not evenly divisible by the number of subsets, the rows are distributed so no subset is more than one row larger than any other subset, and the lowest subsets have extra rows. For example, if constant-value
= 4 and the number of rows = 21, the first subset has six rows, the second subset has five rows, and so on.
If the number of subsets is greater than the number of rows, then a number of subsets equal to the number of rows is filled, and the remaining subsets are empty.
Behavior type
Immutable
Syntax
NTILE ( constant-value ) OVER (
[ window-partition-clause ]
window-order-clause )
Parameters
constant-value
- Specifies the number of subsets , where
constant-value
must resolve to a positive constant for each partition.
OVER()
- See Analytic Functions.
Examples
The following query assigns each month's sales total into one of four subsets:
=> SELECT calendar_month_name AS MONTH, SUM(sales_quantity),
NTILE(4) OVER (ORDER BY SUM(sales_quantity)) AS NTILE
FROM store.store_sales_fact JOIN date_dimension
USING(date_key)
GROUP BY calendar_month_name
ORDER BY NTILE;
MONTH | SUM | NTILE
-----------+---------+-------
November | 2040726 | 1
June | 2088528 | 1
February | 2134708 | 1
April | 2181767 | 2
January | 2229220 | 2
October | 2316363 | 2
September | 2323914 | 3
March | 2354409 | 3
August | 2387017 | 3
July | 2417239 | 4
May | 2492182 | 4
December | 2531842 | 4
(12 rows)
See also
22 - PERCENT_RANK [analytic]
Calculates the relative rank of a row for a given row in a group within a by dividing that row’s rank less 1 by the number of rows in the partition, also less 1.
Calculates the relative rank of a row for a given row in a group within a window by dividing that row’s rank less 1 by the number of rows in the partition, also less 1. PERCENT_RANK
always returns values from 0 to 1 inclusive. The first row in any set has a PERCENT_RANK
of 0. The return value is NUMBER
.
( rank - 1 ) / ( [ rows ] - 1 )
In the preceding formula, rank
is the rank position of a row in the group and rows
is the total number of rows in the partition defined by the OVER()
clause.
Behavior type
Immutable
Syntax
PERCENT_RANK ( ) OVER (
[ window-partition-clause ]
window-order-clause )
Parameters
OVER()
- See Analytic Functions
Examples
The following example finds the percent rank of gross profit for different states within each month of the first quarter:
=> SELECT calendar_month_name AS MONTH, store_state,
SUM(gross_profit_dollar_amount),
PERCENT_RANK() OVER (PARTITION BY calendar_month_name
ORDER BY SUM(gross_profit_dollar_amount)) AS PERCENT_RANK
FROM store.store_sales_fact JOIN date_dimension
USING(date_key)
JOIN store.store_dimension
USING (store_key)
WHERE calendar_month_name IN ('January','February','March')
AND store_state IN ('OR','IA','DC','NV','WI')
GROUP BY calendar_month_name, store_state
ORDER BY calendar_month_name, PERCENT_RANK;
MONTH | store_state | SUM | PERCENT_RANK
----------+-------------+--------+--------------
February | IA | 418490 | 0
February | OR | 460588 | 0.25
February | DC | 616553 | 0.5
February | WI | 619204 | 0.75
February | NV | 838039 | 1
January | OR | 446528 | 0
January | IA | 474501 | 0.25
January | DC | 628496 | 0.5
January | WI | 679382 | 0.75
January | NV | 871824 | 1
March | IA | 460282 | 0
March | OR | 481935 | 0.25
March | DC | 716063 | 0.5
March | WI | 771575 | 0.75
March | NV | 970878 | 1
(15 rows)
The following example calculates, for each employee, the percent rank of the employee's salary by their job title:
=> SELECT job_title, employee_last_name, annual_salary,
PERCENT_RANK()
OVER (PARTITION BY job_title ORDER BY annual_salary DESC) AS percent_rank
FROM employee_dimension
ORDER BY percent_rank, annual_salary;
job_title | employee_last_name | annual_salary | percent_rank
--------------------+--------------------+---------------+---------------------
Cashier | Fortin | 3196 | 0
Delivery Person | Garnett | 3196 | 0
Cashier | Vogel | 3196 | 0
Customer Service | Sanchez | 3198 | 0
Shelf Stocker | Jones | 3198 | 0
Custodian | Li | 3198 | 0
Customer Service | Kramer | 3198 | 0
Greeter | McNulty | 3198 | 0
Greeter | Greenwood | 3198 | 0
Shift Manager | Miller | 99817 | 0
Advertising | Vu | 99853 | 0
Branch Manager | Jackson | 99858 | 0
Marketing | Taylor | 99928 | 0
Assistant Director | King | 99973 | 0
Sales | Kramer | 99973 | 0
Head of PR | Goldberg | 199067 | 0
Regional Manager | Gauthier | 199744 | 0
Director of HR | Moore | 199896 | 0
Head of Marketing | Overstreet | 199955 | 0
VP of Advertising | Meyer | 199975 | 0
VP of Sales | Sanchez | 199992 | 0
Founder | Gauthier | 927335 | 0
CEO | Taylor | 953373 | 0
Investor | Garnett | 963104 | 0
Co-Founder | Vu | 977716 | 0
CFO | Vogel | 983634 | 0
President | Sanchez | 992363 | 0
Delivery Person | Li | 3194 | 0.00114155251141553
Delivery Person | Robinson | 3194 | 0.00114155251141553
Custodian | McCabe | 3192 | 0.00126582278481013
Shelf Stocker | Moore | 3196 | 0.00128040973111396
Branch Manager | Moore | 99716 | 0.00186567164179104
...
See also
23 - PERCENTILE_CONT [analytic]
An inverse distribution function where, for each row, PERCENTILE_CONT returns the value that would fall into the specified percentile among a set of values in each partition within a.
An inverse distribution function where, for each row, PERCENTILE_CONT returns the value that would fall into the specified percentile among a set of values in each partition within a window. For example, if the argument to the function is 0.5, the result of the function is the median of the data set (50th percentile). PERCENTILE_CONT assumes a continuous distribution data model. NULL values are ignored.
PERCENTILE_CONT computes the percentile by first computing the row number where the percentile row would exist. For example:
row-number = 1 + percentile-value * (num-partition-rows -1)
If row-number
is a whole number (within an error of 0.00001), the percentile is the value of row row-number
.
Otherwise, Vertica interpolates the percentile value between the value of the CEILING(
row-number
)
row and the value of the FLOOR(
row-number
)
row. In other words, the percentile is calculated as follows:
( CEILING( row-number) - row-number ) * ( value of FLOOR(row-number) row )
+ ( row-number - FLOOR(row-number) ) * ( value of CEILING(row-number) row)
Note
If the percentile value is 0.5, PERCENTILE_CONT returns the same result set as the function
MEDIAN.
Behavior type
Immutable
Syntax
PERCENTILE_CONT ( percentile ) WITHIN GROUP ( ORDER BY expression [ ASC | DESC ] ) OVER ( [ window-partition-clause ] )
Parameters
percentile
- Percentile value, a FLOAT constant that ranges from 0 to 1 (inclusive).
WITHIN GROUP (ORDER BY
expression
)
- Specifies how to sort data within each group. ORDER BY takes only one column/expression that must be INTEGER, FLOAT, INTERVAL, or NUMERIC data type. NULL values are discarded.
The WITHIN GROUP(ORDER BY)
clause does not guarantee the order of the SQL result. To order the final result , use the SQL ORDER BY clause set.
ASC | DESC
- Specifies the ordering sequence as ascending (default) or descending.
Specifying ASC or DESC in the WITHIN GROUP
clause affects results as long as the percentile
is not 0.5
.
OVER()
- See Analytic Functions
Examples
This query computes the median annual income per group for the first 300 customers in Wisconsin and the District of Columbia.
=> SELECT customer_state, customer_key, annual_income, PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY annual_income)
OVER (PARTITION BY customer_state) AS PERCENTILE_CONT
FROM customer_dimension WHERE customer_state IN ('DC','WI') AND customer_key < 300
ORDER BY customer_state, customer_key;
customer_state | customer_key | annual_income | PERCENTILE_CONT
----------------+--------------+---------------+-----------------
DC | 52 | 168312 | 483266.5
DC | 118 | 798221 | 483266.5
WI | 62 | 283043 | 377691
WI | 139 | 472339 | 377691
(4 rows)
This query computes the median annual income per group for all customers in Wisconsin and the District of Columbia.
=> SELECT customer_state, customer_key, annual_income, PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY annual_income)
OVER (PARTITION BY customer_state) AS PERCENTILE_CONT
FROM customer_dimension WHERE customer_state IN ('DC','WI') ORDER BY customer_state, customer_key;
customer_state | customer_key | annual_income | PERCENTILE_CONT
----------------+--------------+---------------+-----------------
DC | 52 | 168312 | 483266.5
DC | 118 | 798221 | 483266.5
DC | 622 | 220782 | 555088
DC | 951 | 178453 | 555088
DC | 972 | 961582 | 555088
DC | 1286 | 760445 | 555088
DC | 1434 | 44836 | 555088
...
WI | 62 | 283043 | 377691
WI | 139 | 472339 | 377691
WI | 359 | 42242 | 517717
WI | 364 | 867543 | 517717
WI | 403 | 509031 | 517717
WI | 455 | 32000 | 517717
WI | 485 | 373129 | 517717
...
(1353 rows)
See also
24 - PERCENTILE_DISC [analytic]
An inverse distribution function where, for each row, PERCENTILE_DISC returns the value that would fall into the specified percentile among a set of values in each partition within a.
An inverse distribution function where, for each row, PERCENTILE_DISC
returns the value that would fall into the specified percentile among a set of values in each partition within a window. PERCENTILE_DISC()
assumes a discrete distribution data model. NULL
values are ignored.
PERCENTILE_DISC
examines the cumulative distribution values in each group until it finds one that is greater than or equal to the specified percentile. Vertica computes the percentile where, for each row, PERCENTILE_DISC
outputs the first value of the WITHIN GROUP(ORDER BY)
column whose CUME_DIST
(cumulative distribution) value is >= the argument FLOAT
value—for example, 0.4
:
PERCENTILE_DIST(0.4) WITHIN GROUP (ORDER BY salary) OVER(PARTITION BY deptno)...
Given the following query:
SELECT CUME_DIST() OVER(ORDER BY salary) FROM table-name;
The smallest CUME_DIST
value that is greater than 0.4 is also the PERCENTILE_DISC
.
Behavior type
Immutable
Syntax
PERCENTILE_DISC ( percentile ) WITHIN GROUP (
ORDER BY expression [ ASC | DESC ] ) OVER (
[ window-partition-clause ] )
Parameters
percentile
- Percentile value, a
FLOAT
constant that ranges from 0 to 1 (inclusive).
WITHIN GROUP(ORDER BY
expression
)
- Specifies how to sort data within each group.
ORDER BY
takes only one column/expression that must be INTEGER
, FLOAT
, INTERVAL
, or NUMERIC
data type. NULL
values are discarded.
The WITHIN GROUP(ORDER BY)
clause does not guarantee the order of the SQL result. To order the final result , use the SQL
ORDER BY
clause set.
ASC | DESC
- Specifies the ordering sequence as ascending (default) or descending.
OVER()
- See Analytic Functions
Examples
This query computes the 20th percentile annual income by group for first 300 customers in Wisconsin and the District of Columbia.
=> SELECT customer_state, customer_key, annual_income,
PERCENTILE_DISC(.2) WITHIN GROUP(ORDER BY annual_income)
OVER (PARTITION BY customer_state) AS PERCENTILE_DISC
FROM customer_dimension
WHERE customer_state IN ('DC','WI')
AND customer_key < 300
ORDER BY customer_state, customer_key;
customer_state | customer_key | annual_income | PERCENTILE_DISC
----------------+--------------+---------------+-----------------
DC | 104 | 658383 | 417092
DC | 168 | 417092 | 417092
DC | 245 | 670205 | 417092
WI | 106 | 227279 | 227279
WI | 127 | 703889 | 227279
WI | 209 | 458607 | 227279
(6 rows)
See also
25 - RANK [analytic]
Within each window partition, ranks all rows in the query results set according to the order specified by the window's ORDER BY clause.
Within each window partition, ranks all rows in the query results set according to the order specified by the window's ORDER BY
clause.
RANK
executes as follows:
-
Sorts partition rows as specified by the ORDER BY
clause.
-
Compares the ORDER BY
values of the preceding row and current row and ranks the current row as follows:
-
If ORDER BY
values are the same, the current row gets the same ranking as the preceding row.
Note
Null values are considered equal. For detailed information on how null values are sorted, see
NULL sort order.
-
If the ORDER BY
values are different, DENSE_RANK
increments or decrements the current row's ranking by 1, plus the number of consecutive duplicate values in the rows that precede it.
The largest rank value is the equal to the total number of rows returned by the query.
Behavior type
Immutable
Syntax
RANK() OVER (
[ window-partition-clause ]
window-order-clause )
Parameters
OVER()
- See Analytic Functions
Compared with DENSE_RANK
RANK
can leave gaps in the ranking sequence, while
DENSE_RANK
does not.
Examples
The following query ranks by state all company customers that have been customers since 2007. In rows where the customer_since
dates are the same, RANK
assigns the rows equal ranking. When the customer_since
date changes, RANK
skips one or more rankings—for example, within CA
, from 12 to 14, and from 17 to 19.
=> SELECT customer_state, customer_name, customer_since,
RANK() OVER (PARTITION BY customer_state ORDER BY customer_since) AS rank
FROM customer_dimension WHERE customer_type='Company' AND customer_since > '01/01/2007'
ORDER BY customer_state;
customer_state | customer_name | customer_since | rank
----------------+---------------+----------------+------
AZ | Foodshop | 2007-01-20 | 1
AZ | Goldstar | 2007-08-11 | 2
CA | Metahope | 2007-01-05 | 1
CA | Foodgen | 2007-02-05 | 2
CA | Infohope | 2007-02-09 | 3
CA | Foodcom | 2007-02-19 | 4
CA | Amerihope | 2007-02-22 | 5
CA | Infostar | 2007-03-05 | 6
CA | Intracare | 2007-03-14 | 7
CA | Infocare | 2007-04-07 | 8
...
CO | Goldtech | 2007-02-19 | 1
CT | Foodmedia | 2007-02-11 | 1
CT | Metatech | 2007-02-20 | 2
CT | Infocorp | 2007-04-10 | 3
...
See also
SQL analytics
26 - ROW_NUMBER [analytic]
Assigns a sequence of unique numbers to each row in a partition, starting with 1.
Assigns a sequence of unique numbers to each row in a window partition, starting with 1. ROW_NUMBER and RANK are generally interchangeable, with the following differences:
-
ROW_NUMBER assigns a unique ordinal number to each row in the ordered set, starting with 1.
-
ROW_NUMBER() is a Vertica extension, while RANK conforms to the SQL-99 standard.
Behavior type
Immutable
Syntax
ROW_NUMBER () OVER (
[ window-partition-clause ]
[ window-order-clause ] )
Parameters
OVER()
- See Analytic Functions
Examples
The following ROW_NUMBER query partitions customers in the VMart table customer_dimension
by customer_regio
n. Within each partition, the function ranks those customers in order of seniority, as specified by its window order clause:
=> SELECT * FROM
(SELECT ROW_NUMBER() OVER (PARTITION BY customer_region ORDER BY customer_since) AS most_senior,
customer_region, customer_name, customer_since FROM public.customer_dimension WHERE customer_type = 'Individual') sq
WHERE most_senior <= 5;
most_senior | customer_region | customer_name | customer_since
-------------+-----------------+----------------------+----------------
1 | West | Jack Y. Perkins | 1965-01-01
2 | West | Linda Q. Winkler | 1965-01-02
3 | West | Marcus K. Li | 1965-01-03
4 | West | Carla R. Jones | 1965-01-07
5 | West | Seth P. Young | 1965-01-09
1 | East | Kim O. Vu | 1965-01-01
2 | East | Alexandra L. Weaver | 1965-01-02
3 | East | Steve L. Webber | 1965-01-04
4 | East | Thom Y. Li | 1965-01-05
5 | East | Martha B. Farmer | 1965-01-07
1 | SouthWest | Martha V. Gauthier | 1965-01-01
2 | SouthWest | Jessica U. Goldberg | 1965-01-07
3 | SouthWest | Robert O. Stein | 1965-01-07
4 | SouthWest | Emily I. McCabe | 1965-01-18
5 | SouthWest | Jack E. Miller | 1965-01-25
1 | NorthWest | Julie O. Greenwood | 1965-01-08
2 | NorthWest | Amy X. McNulty | 1965-01-25
3 | NorthWest | Kevin S. Carcetti | 1965-02-09
4 | NorthWest | Sam K. Carcetti | 1965-03-16
5 | NorthWest | Alexandra X. Winkler | 1965-04-05
1 | MidWest | Michael Y. Meyer | 1965-01-01
2 | MidWest | Joanna W. Bauer | 1965-01-06
3 | MidWest | Amy E. Harris | 1965-01-08
4 | MidWest | Julie W. McCabe | 1965-01-09
5 | MidWest | William . Peterson | 1965-01-09
1 | South | Dean . Martin | 1965-01-01
2 | South | Ruth U. Williams | 1965-01-02
3 | South | Steve Y. Farmer | 1965-01-03
4 | South | Mark V. King | 1965-01-08
5 | South | Lucas Y. Young | 1965-01-10
(30 rows)
See also
27 - STDDEV [analytic]
Computes the statistical sample standard deviation of the current row with respect to the group within a.
Computes the statistical sample standard deviation of the current row with respect to the group within a window. STDDEV_SAMP
returns the same value as the square root of the variance defined for the
VAR_SAMP
function:
STDDEV( expression ) = SQRT(VAR_SAMP( expression ))
When VAR_SAMP
returns NULL
, this function returns NULL
.
Note
The nonstandard function
STDDEV
is provided for compatibility with other databases. It is semantically identical to
STDDEV_SAMP
.
Behavior type
Immutable
Syntax
STDDEV ( expression ) OVER (
[ window-partition-clause ]
[ window-order-clause ]
[ window-frame-clause ] )
Parameters
expression
- Any
NUMERIC
data type or any non-numeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument.
OVER()
- See Analytic Functions
Examples
The following example returns the standard deviations of salaries in the employee dimension table by job title Assistant Director:
=> SELECT employee_last_name, annual_salary,
STDDEV(annual_salary) OVER (ORDER BY hire_date) as "stddev"
FROM employee_dimension
WHERE job_title = 'Assistant Director';
employee_last_name | annual_salary | stddev
--------------------+---------------+------------------
Bauer | 85003 | NaN
Reyes | 91051 | 4276.58181261624
Overstreet | 53296 | 20278.6923394976
Gauthier | 97216 | 19543.7184537642
Jones | 82320 | 16928.0764028285
Fortin | 56166 | 18400.2738421652
Carcetti | 71135 | 16968.9453554483
Weaver | 74419 | 15729.0709901852
Stein | 85689 | 15040.5909495309
McNulty | 69423 | 14401.1524291943
Webber | 99091 | 15256.3160166536
Meyer | 74774 | 14588.6126417355
Garnett | 82169 | 14008.7223268494
Roy | 76974 | 13466.1270356647
Dobisz | 83486 | 13040.4887828347
Martin | 99702 | 13637.6804131055
Martin | 73589 | 13299.2838158566
...
See also
28 - STDDEV_POP [analytic]
Evaluates the statistical population standard deviation for each member of the group.
Computes the statistical population standard deviation and returns the square root of the population variance within a window. The STDDEV_POP()
return value is the same as the square root of the VAR_POP()
function:
STDDEV_POP( expression ) = SQRT(VAR_POP( expression ))
When VAR_POP
returns null, STDDEV_POP
returns null.
Behavior type
Immutable
Syntax
STDDEV_POP ( expression ) OVER (
[ window-partition-clause ]
[ window-order-clause ]
[ window-frame-clause ] )
Parameters
expression
- Any
NUMERIC
data type or any non-numeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument.
OVER()
- See Analytic Functions.
Examples
The following example returns the population standard deviations of salaries in the employee dimension table by job title Assistant Director:
=> SELECT employee_last_name, annual_salary,
STDDEV_POP(annual_salary) OVER (ORDER BY hire_date) as "stddev_pop"
FROM employee_dimension WHERE job_title = 'Assistant Director';
employee_last_name | annual_salary | stddev_pop
--------------------+---------------+------------------
Goldberg | 61859 | 0
Miller | 79582 | 8861.5
Goldberg | 74236 | 7422.74712548456
Campbell | 66426 | 6850.22125098891
Moore | 66630 | 6322.08223926257
Nguyen | 53530 | 8356.55480080699
Harris | 74115 | 8122.72288970008
Lang | 59981 | 8053.54776538731
Farmer | 60597 | 7858.70140687825
Nguyen | 78941 | 8360.63150784682
See also
29 - STDDEV_SAMP [analytic]
Computes the statistical sample standard deviation of the current row with respect to the group within a.
Computes the statistical sample standard deviation of the current row with respect to the group within a window. STDDEV_SAM
's return value is the same as the square root of the variance defined for the VAR_SAMP
function:
STDDEV( expression ) = SQRT(VAR_SAMP( expression ))
When VAR_SAMP
returns NULL
, STDDEV_SAMP
returns NULL.
Note
STDDEV_SAMP()
is semantically identical to the nonstandard function,
STDDEV()
.
Behavior type
Immutable
Syntax
STDDEV_SAMP ( expression ) OVER (
[ window-partition-clause ]
[ window-order-clause ]
[ window-frame-clause ] )
Parameters
expression
- Any
NUMERIC
data type or any non-numeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument..
OVER()
- See Analytic Functions
Examples
The following example returns the sample standard deviations of salaries in the employee
dimension table by job title Assistant Director:
=> SELECT employee_last_name, annual_salary,
STDDEV(annual_salary) OVER (ORDER BY hire_date) as "stddev_samp"
FROM employee_dimension WHERE job_title = 'Assistant Director';
employee_last_name | annual_salary | stddev_samp
--------------------+---------------+------------------
Bauer | 85003 | NaN
Reyes | 91051 | 4276.58181261624
Overstreet | 53296 | 20278.6923394976
Gauthier | 97216 | 19543.7184537642
Jones | 82320 | 16928.0764028285
Fortin | 56166 | 18400.2738421652
Carcetti | 71135 | 16968.9453554483
Weaver | 74419 | 15729.0709901852
Stein | 85689 | 15040.5909495309
McNulty | 69423 | 14401.1524291943
Webber | 99091 | 15256.3160166536
Meyer | 74774 | 14588.6126417355
Garnett | 82169 | 14008.7223268494
Roy | 76974 | 13466.1270356647
Dobisz | 83486 | 13040.4887828347
...
See also
30 - SUM [analytic]
Computes the sum of an expression over a group of rows within a.
Computes the sum of an expression over a group of rows within a window. It returns a DOUBLE PRECISION
value for a floating-point expression. Otherwise, the return value is the same as the expression data type.
Behavior type
Immutable
Syntax
SUM ( expression ) OVER (
[ window-partition-clause ]
[ window-order-clause ]
[ window-frame-clause ] )
Parameters
expression
- Any
NUMERIC
data type or any non-numeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument.
OVER()
- See Analytic Functions
Overflow handling
If you encounter data overflow when using SUM
, use
SUM_FLOAT
which converts data to a floating point.
By default, Vertica allows silent numeric overflow when you call this function on numeric data types. For more information on this behavior and how to change it, seeNumeric data type overflow with SUM, SUM_FLOAT, and AVG.
Examples
The following query returns the cumulative sum all of the returns made to stores in January:
=> SELECT calendar_month_name AS month, transaction_type, sales_quantity,
SUM(sales_quantity)
OVER (PARTITION BY calendar_month_name ORDER BY date_dimension.date_key) AS SUM
FROM store.store_sales_fact JOIN date_dimension
USING(date_key) WHERE calendar_month_name IN ('January')
AND transaction_type= 'return';
month | transaction_type | sales_quantity | SUM
---------+------------------+----------------+------
January | return | 7 | 651
January | return | 3 | 651
January | return | 7 | 651
January | return | 7 | 651
January | return | 7 | 651
January | return | 3 | 651
January | return | 7 | 651
January | return | 5 | 651
January | return | 1 | 651
January | return | 6 | 651
January | return | 6 | 651
January | return | 3 | 651
January | return | 9 | 651
January | return | 7 | 651
January | return | 6 | 651
January | return | 8 | 651
January | return | 7 | 651
January | return | 2 | 651
January | return | 4 | 651
January | return | 5 | 651
January | return | 7 | 651
January | return | 8 | 651
January | return | 4 | 651
January | return | 10 | 651
January | return | 6 | 651
...
See also
31 - VAR_POP [analytic]
Returns the statistical population variance of a non-null set of numbers (nulls are ignored) in a group within a.
Returns the statistical population variance of a non-null set of numbers (nulls are ignored) in a group within a window. Results are calculated by the sum of squares of the difference of expression
from the mean of expression
, divided by the number of rows remaining:
(SUM( expression * expression ) - SUM( expression ) * SUM( expression ) / COUNT( expression )) / COUNT( expression )
Behavior type
Immutable
Syntax
VAR_POP ( expression ) OVER (
[ window-partition-clause ]
[ window-order-clause ]
[ window-frame-clause ] )
Parameters
expression
- Any
NUMERIC
data type or any non-numeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument
OVER()
- See Analytic Functions
Examples
The following example calculates the cumulative population in the store orders fact table of sales in January 2007:
=> SELECT date_ordered,
VAR_POP(SUM(total_order_cost))
OVER (ORDER BY date_ordered) "var_pop"
FROM store.store_orders_fact s
WHERE date_ordered BETWEEN '2007-01-01' AND '2007-01-31'
GROUP BY s.date_ordered;
date_ordered | var_pop
--------------+------------------
2007-01-01 | 0
2007-01-02 | 89870400
2007-01-03 | 3470302472
2007-01-04 | 4466755450.6875
2007-01-05 | 3816904780.80078
2007-01-06 | 25438212385.25
2007-01-07 | 22168747513.1016
2007-01-08 | 23445191012.7344
2007-01-09 | 39292879603.1113
2007-01-10 | 48080574326.9609
(10 rows)
See also
32 - VAR_SAMP [analytic]
Returns the sample variance of a non-NULL set of numbers (NULL values in the set are ignored) for each row of the group within a.
Returns the sample variance of a non-NULL
set of numbers (NULL
values in the set are ignored) for each row of the group within a window. Results are calculated as follows:
(SUM( expression * expression ) - SUM( expression ) * SUM( expression ) / COUNT( expression ) )
/ (COUNT( expression ) - 1 )
This function and
VARIANCE
differ in one way: given an input set of one element, VARIANCE
returns 0 and VAR_SAMP
returns NULL
.
Behavior type
Immutable
Syntax
VAR_SAMP ( expression ) OVER (
[ window-partition-clause ]
[ window-order-clause ]
[ window-frame-clause ] )
Parameters
expression
- Any
NUMERIC
data type or any non-numeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument
OVER()
- See Analytic Functions
Null handling
-
VAR_SAMP
returns the sample variance of a set of numbers after it discards the NULL
values in the set.
-
If the function is applied to an empty set, then it returns NULL
.
Examples
The following example calculates the sample variance in the store orders fact table of sales in December 2007:
=> SELECT date_ordered,
VAR_SAMP(SUM(total_order_cost))
OVER (ORDER BY date_ordered) "var_samp"
FROM store.store_orders_fact s
WHERE date_ordered BETWEEN '2007-12-01' AND '2007-12-31'
GROUP BY s.date_ordered;
date_ordered | var_samp
--------------+------------------
2007-12-01 | NaN
2007-12-02 | 90642601088
2007-12-03 | 48030548449.3359
2007-12-04 | 32740062504.2461
2007-12-05 | 32100319112.6992
2007-12-06 | 26274166814.668
2007-12-07 | 23017490251.9062
2007-12-08 | 21099374085.1406
2007-12-09 | 27462205977.9453
2007-12-10 | 26288687564.1758
(10 rows)
See also
33 - VARIANCE [analytic]
Returns the sample variance of a non-NULL set of numbers (NULL values in the set are ignored) for each row of the group within a.
Returns the sample variance of a non-NULL
set of numbers (NULL
values in the set are ignored) for each row of the group within a window. Results are calculated as follows:
( SUM( expression * expression ) - SUM( expression ) * SUM( expression ) / COUNT( expression )) / (COUNT( expression ) - 1 )
VARIANCE
returns the variance of expression
, which is calculated as follows:
Note
The nonstandard function
VARIANCE
is provided for compatibility with other databases. It is semantically identical to
VAR_SAMP
.
Behavior type
Immutable
Syntax
VAR_SAMP ( expression ) OVER (
[ window-partition-clause ]
[ window-order-clause ]
[ window-frame-clause ] )
Parameters
expression
- Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument.
OVER()
- See Analytic Functions
Examples
The following example calculates the cumulative variance in the store orders fact table of sales in December 2007:
=> SELECT date_ordered,
VARIANCE(SUM(total_order_cost))
OVER (ORDER BY date_ordered) "variance"
FROM store.store_orders_fact s
WHERE date_ordered BETWEEN '2007-12-01' AND '2007-12-31'
GROUP BY s.date_ordered;
date_ordered | variance
--------------+------------------
2007-12-01 | NaN
2007-12-02 | 2259129762
2007-12-03 | 1809012182.33301
2007-12-04 | 35138165568.25
2007-12-05 | 26644110029.3003
2007-12-06 | 25943125234
2007-12-07 | 23178202223.9048
2007-12-08 | 21940268901.1431
2007-12-09 | 21487676799.6108
2007-12-10 | 21521358853.4331
(10 rows)
See also