This is the multipage printable view of this section.
Click here to print.
Return to the regular view of this page.
Analytic functions
All analytic functions in this section with an aggregate counterpart are appended with [Analytics] in the heading to avoid confusion between the two function types.
Note
All analytic functions in this section with an aggregate counterpart are appended with [Analytics] in the heading to avoid confusion between the two function types.
Vertica analytics are SQL functions based on the ANSI 99 standard. These functions handle complex analysis and reporting tasks—for example:

Rank the longeststanding customers in a particular state.

Calculate the moving average of retail volume over a specified time.

Find the highest score among all students in the same grade.

Compare the current sales bonus that salespersons received against their previous bonus.
Analytic functions return aggregate results but they do not group the result set. They return the group value multiple times, once per record. You can sort group values, or partitions, using a window ORDER BY
clause, but the order affects only the function result set, not the entire query result set.
Syntax
General
analyticfunction(arguments) OVER(
[ windowpartitionclause ]
[ windoworderclause [ windowframeclause ] ]
)
With named window
analyticfunction(arguments) OVER(
[ namedwindow [ windowframeclause ] ]
)
Parameters
analyticfunction
(
arguments
)
 A Vertica analytic function and its arguments.
OVER
 Specifies how to partition, sort, and window frame function input with respect to the current row. The input data is the result set that the query returns after it evaluates
FROM
, WHERE
, GROUP BY
, and HAVING
clauses.
An empty OVER
clause provides the best performance for single threaded queries on a single node.
 windowpartitionclause
 Groups input rows according to one or more columns or expressions.
If you omit this clause, no grouping occurs and the analytic function processes all input rows as a single partition.
 windoworderclause
 Optionally specifies how to sort rows that are supplied to the analytic function. If the
OVER
clause also includes a partition clause, rows are sorted within each partition.

windowframeclause
 Only valid for some analytic functions, specifies as input a set of rows relative to the row that is currently being evaluated by the analytic function. After the function processes that row and its window, Vertica advances the current row and adjusts the window boundaries accordingly.
namedwindow
 The name of a window that you define in the same query with a windownameclause. This definition encapsulates window partitioning and sorting. Named windows are useful when the query invokes multiple analytic functions with similar
OVER
clauses.
A window name clause cannot specify a window frame clause. However, you can qualify the named window in an OVER
clause with a window frame clause.
Requirements
The following requirements apply to analytic functions:

All require an OVER
clause. Each function has its own OVER
clause requirements. For example, you can supply an empty OVER
clause for some analytic aggregate functions such as
SUM
. For other functions, window frame and order clauses might be required, or might be invalid.

Analytic functions can be invoked only in a query's SELECT
and ORDER BY
clauses.

Analytic functions cannot be nested. For example, the following query is not allowed:
=> SELECT MEDIAN(RANK() OVER(ORDER BY sal) OVER()).

WHERE
, GROUP BY
and HAVING
operators are technically not part of the analytic function. However, they determine input to that function.
See also
1  ARGMAX [analytic]
This function is patterned after the mathematical function argmax(f(x)), which returns the value of x that maximizes f(x).
This function is patterned after the mathematical function argmax(
f
(
x
))
, which returns the value of x
that maximizes f
(
x
)
. Similarly, ARGMAX takes two arguments target
and arg
, where both are columns or column expressions in the queried dataset. ARGMAX finds the row with the largest nonnull value in target
and returns the value of arg
in that row. If multiple rows contain the largest target
value, ARGMAX returns arg
from the first row that it finds.
Behavior type
Immutable
Syntax
ARGMAX ( target, arg ) OVER ( [ PARTITION BY expression[,...] ] [ windoworderclause ] )
Arguments
target
, arg
 Columns in the queried dataset.
OVER()
 Specifies the following window clauses:

PARTITION BY
expression
: Groups (partitions) input rows according to the values in expression
, which resolves to one or more columns in the queried dataset. If you omit this clause, ARGMAX processes all input rows as a single partition.

windoworderclause: Specifies how to sort input rows. If the OVER clause also includes a partition clause, rows are sorted separately within each partition.
Important
To ensure consistent results when multiple rows contain the largest target
value, include a window order clause that sorts on arg
.
For details, see Analytic Functions.
Examples
Create and populate table service_info
, which contains information on various services, their respective development groups, and their userbase. A NULL in the users
column indicates that the service has not been released, and so it cannot have users.
=> CREATE TABLE service_info(dev_group VARCHAR(10), product_name VARCHAR(30), users INT);
=> COPY t FROM stdin NULL AS 'null';
>> irischat48193
>> aspentrading3000
>> orchidcloud990322
>> irisvideo call 10203
>> daffodilstreaming44123
>> hydrangeapassword managernull
>> hydrangeatotp1837363
>> daffodilclip share3000
>> hydrangeae2e smsnull
>> rosecryptonull
>> irisforum48193
>> \.
ARGMAX returns the value in the product_name
column that maximizes the value in the users
column. In this case, ARGMAX returns totp
, which indicates that the totp
service has the largest user base:
=> SELECT dev_group, product_name, users, ARGMAX(users, product_name) OVER (ORDER BY dev_group ASC) FROM service_info;
dev_group  product_name  users  ARGMAX
+++
aspen  trading  3000  totp
daffodil  clip share  3000  totp
daffodil  streaming  44123  totp
hydrangea  e2e sms   totp
hydrangea  password manager   totp
hydrangea  totp  1837363  totp
iris  chat  48193  totp
iris  forum  48193  totp
iris  video call  10203  totp
orchid  cloud  990322  totp
rose  crypto   totp
(11 rows)
The next query partitions the data on dev_group
to identify the most popular service created by each development group. ARGMAX returns NULL if the partition's users
column contains only NULL values and breaks ties using the first value in product_name
from the top of the partition.
=> SELECT dev_group, product_name, users, ARGMAX(users, product_name) OVER (PARTITION BY dev_group ORDER BY product_name ASC) FROM service_info;
dev_group  product_name  users  ARGMAX
+++
iris  chat  48193  chat
iris  forum  48193  chat
iris  video call  10203  chat
orchid  cloud  990322  cloud
aspen  trading  3000  trading
daffodil  clip share  3000  streaming
daffodil  streaming  44123  streaming
rose  crypto  
hydrangea  e2e sms   totp
hydrangea  password manager   totp
hydrangea  totp  1837363  totp
(11 rows)
See also
ARGMIN [analytic]
2  ARGMIN [analytic]
This function is patterned after the mathematical function argmin(f(x)), which returns the value of x that minimizes f(x).
This function is patterned after the mathematical function argmin(
f
(
x
))
, which returns the value of x
that minimizes f
(
x
)
. Similarly, ARGMIN takes two arguments target
and arg
, where both are columns or column expressions in the queried dataset. ARGMIN finds the row with the smallest nonnull value in target
and returns the value of arg
in that row. If multiple rows contain the smallest target
value, ARGMIN returns arg
from the first row that it finds.
Behavior type
Immutable
Syntax
ARGMIN ( target, arg ) OVER ( [ PARTITION BY expression[,...] ] [ windoworderclause ] )
Arguments
target
, arg
 Columns in the queried dataset.
OVER()
 Specifies the following window clauses:

PARTITION BY
expression
: Groups (partitions) input rows according to the values in expression
, which resolves to one or more columns in the queried dataset. If you omit this clause, ARGMIN processes all input rows as a single partition.

windoworderclause: Specifies how to sort input rows. If the OVER
clause also includes a partition clause, rows are sorted separately within each partition.
Important
To ensure consistent results when multiple rows contain the smallest target
value, include a window order clause that sorts on arg
.
For details, see Analytic Functions.
Examples
Create and populate table service_info
, which contains information on various services, their respective development groups, and their userbase. A NULL in the users
column indicates that the service has not been released, and so it cannot have users.
=> CREATE TABLE service_info(dev_group VARCHAR(10), product_name VARCHAR(30), users INT);
=> COPY t FROM stdin NULL AS 'null';
>> irischat48193
>> aspentrading3000
>> orchidcloud990322
>> irisvideo call 10203
>> daffodilstreaming44123
>> hydrangeapassword managernull
>> hydrangeatotp1837363
>> daffodilclip share3000
>> hydrangeae2e smsnull
>> rosecryptonull
>> irisforum48193
>> \.
ARGMIN returns the value in the product_name
column that minimizes the value in the users
column. In this case, ARGMIN returns totp
, which indicates that the totp
service has the smallest user base:
=> SELECT dev_group, product_name, users, ARGMIN(users, product_name) OVER (ORDER BY dev_group ASC) FROM service_info;
dev_group  product_name  users  ARGMIN
+++
aspen  trading  3000  trading
daffodil  clip share  3000  trading
daffodil  streaming  44123  trading
hydrangea  e2e sms   trading
hydrangea  password manager   trading
hydrangea  totp  1837363  trading
iris  chat  48193  trading
iris  forum  48193  trading
iris  video call  10203  trading
orchid  cloud  990322  trading
rose  crypto   trading
(11 rows)
The next query partitions the data on dev_group
to identify the least popular service created by each development group. ARGMIN returns NULL if the partition's users
column contains only NULL values and breaks ties using the first value in product_name
from the top of the partition.
=> SELECT dev_group, product_name, users, ARGMIN(users, product_name) OVER (PARTITION BY dev_group ORDER BY product_name ASC) FROM service_info;
dev_group  product_name  users  ARGMIN
+++
iris  chat  48193  video call
iris  forum  48193  video call
iris  video call  10203  video call
orchid  cloud  990322  cloud
aspen  trading  3000  trading
daffodil  clip share  3000  clip share
daffodil  streaming  44123  clip share
rose  crypto  
hydrangea  e2e sms   totp
hydrangea  password manager   totp
hydrangea  totp  1837363  totp
(11 rows)
See also
ARGMAX [analytic]
3  AVG [analytic]
Computes an average of an expression in a group within a.
Computes an average of an expression in a group within a window. AVG
returns the same data type as the expression's numeric data type.
The AVG
analytic function differs from the
AVG
aggregate function, which computes the average of an expression over a group of rows.
Behavior type
Immutable
Syntax
AVG ( expression ) OVER (
[ windowpartitionclause ]
[ windoworderclause ]
[ windowframeclause ] )
Parameters
expression
 Any data that can be implicitly converted to a numeric data type.
OVER()
 See Analytic Functions.
Overflow handling
By default, Vertica allows silent numeric overflow when you call this function on numeric data types. For more information on this behavior and how to change it, seeNumeric data type overflow with SUM, SUM_FLOAT, and AVG.
Examples
The following query finds the sales for that calendar month and returns a running/cumulative average (sometimes called a moving average) using the default window of RANGE UNBOUNDED PRECEDING AND CURRENT ROW
:
=> SELECT calendar_month_number_in_year Mo, SUM(product_price) Sales,
AVG(SUM(product_price)) OVER (ORDER BY calendar_month_number_in_year)::INTEGER Average
FROM product_dimension pd, date_dimension dm, inventory_fact if
WHERE dm.date_key = if.date_key AND pd.product_key = if.product_key GROUP BY Mo;
Mo  Sales  Average
++
1  23869547  23869547
2  19604661  21737104
3  22877913  22117374
4  22901263  22313346
5  23670676  22584812
6  22507600  22571943
7  21514089  22420821
8  24860684  22725804
9  21687795  22610470
10  23648921  22714315
11  21115910  22569005
12  24708317  22747281
(12 rows)
To return a moving average that is not a running (cumulative) average, the window can specify ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
:
=> SELECT calendar_month_number_in_year Mo, SUM(product_price) Sales,
AVG(SUM(product_price)) OVER (ORDER BY calendar_month_number_in_year
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)::INTEGER Average
FROM product_dimension pd, date_dimension dm, inventory_fact if
WHERE dm.date_key = if.date_key AND pd.product_key = if.product_key GROUP BY Mo;
Mo  Sales  Average
++
1  23869547  22117374
2  19604661  22313346
3  22877913  22584812
4  22901263  22312423
5  23670676  22694308
6  22507600  23090862
7  21514089  22848169
8  24860684  22843818
9  21687795  22565480
10  23648921  23204325
11  21115910  22790236
12  24708317  23157716
(12 rows)
See also
4  BOOL_AND [analytic]
Returns the Boolean value of an expression within a.
Returns the Boolean value of an expression within a window. If all input values are true, BOOL_AND
returns t
. Otherwise, it returns f
.
Behavior type
Immutable
Syntax
BOOL_AND ( expression ) OVER (
[ windowpartitionclause ]
[ windoworderclause ]
[ windowframeclause ] )
Parameters
expression
 A Boolean data type or any nonBoolean data type that can be implicitly converted to a Boolean data type. The function returns a Boolean value.
OVER()
 See Analytic Functions.
Examples
The following example illustrates how you can use the BOOL_AND
, BOOL_OR
, and BOOL_XOR
analytic functions. The sample table, employee, includes a column for type of employee and years paid.
=> CREATE TABLE employee(emptype VARCHAR, yearspaid VARCHAR);
CREATE TABLE
Insert sample data into the table to show years paid. In more than one case, an employee could be paid more than once within one year.
=> INSERT INTO employee
SELECT 'contractor1', '2014'
UNION ALL
SELECT 'contractor2', '2015'
UNION ALL
SELECT 'contractor3', '2014'
UNION ALL
SELECT 'contractor1', '2014'
UNION ALL
SELECT 'contractor2', '2014'
UNION ALL
SELECT 'contractor3', '2015'
UNION ALL
SELECT 'contractor4', '2014'
UNION ALL
SELECT 'contractor4', '2014'
UNION ALL
SELECT 'contractor5', '2015'
UNION ALL
SELECT 'contractor5', '2016';
OUTPUT

10
(1 row)
Query the table. The result shows employees that were paid twice in 2014 (BOOL_AND
), once or twice in 2014 (BOOL_OR
), and specifically not more than once in 2014 (BOOL_XOR
).
=> SELECT DISTINCT emptype,
BOOL_AND(yearspaid='2014') OVER (PARTITION BY emptype) AS paidtwicein2014,
BOOL_OR(yearspaid='2014') OVER (PARTITION BY emptype) AS paidonceortwicein2014,
BOOL_XOR(yearspaid='2014') OVER (PARTITION BY emptype) AS paidjustoncein2014
FROM employee;
emptype  paidtwicein2014  paidonceortwicein2014  paidjustoncein2014
+++
contractor1  t  t  f
contractor2  f  t  t
contractor3  f  t  t
contractor4  t  t  f
contractor5  f  f  f
(5 rows)
See also
5  BOOL_OR [analytic]
Returns the Boolean value of an expression within a.
Returns the Boolean value of an expression within a window. If at least one input value is true, BOOL_OR
returns t
. Otherwise, it returns f
.
Behavior type
Immutable
Syntax
BOOL_OR ( expression ) OVER (
[ windowpartitionclause ]
[ windoworderclause ]
[ windowframeclause ] )
Parameters
expression
 A Boolean data type or any nonBoolean data type that can be implicitly converted to a Boolean data type. The function returns a Boolean value.
OVER()
 See Analytic Functions.
Examples
The following example illustrates how you can use the BOOL_AND
, BOOL_OR
, and BOOL_XOR
analytic functions. The sample table, employee, includes a column for type of employee and years paid.
=> CREATE TABLE employee(emptype VARCHAR, yearspaid VARCHAR);
CREATE TABLE
Insert sample data into the table to show years paid. In more than one case, an employee could be paid more than once within one year.
=> INSERT INTO employee
SELECT 'contractor1', '2014'
UNION ALL
SELECT 'contractor2', '2015'
UNION ALL
SELECT 'contractor3', '2014'
UNION ALL
SELECT 'contractor1', '2014'
UNION ALL
SELECT 'contractor2', '2014'
UNION ALL
SELECT 'contractor3', '2015'
UNION ALL
SELECT 'contractor4', '2014'
UNION ALL
SELECT 'contractor4', '2014'
UNION ALL
SELECT 'contractor5', '2015'
UNION ALL
SELECT 'contractor5', '2016';
OUTPUT

10
(1 row)
Query the table. The result shows employees that were paid twice in 2014 (BOOL_AND
), once or twice in 2014 (BOOL_OR
), and specifically not more than once in 2014 (BOOL_XOR
).
=> SELECT DISTINCT emptype,
BOOL_AND(yearspaid='2014') OVER (PARTITION BY emptype) AS paidtwicein2014,
BOOL_OR(yearspaid='2014') OVER (PARTITION BY emptype) AS paidonceortwicein2014,
BOOL_XOR(yearspaid='2014') OVER (PARTITION BY emptype) AS paidjustoncein2014
FROM employee;
emptype  paidtwicein2014  paidonceortwicein2014  paidjustoncein2014
+++
contractor1  t  t  f
contractor2  f  t  t
contractor3  f  t  t
contractor4  t  t  f
contractor5  f  f  f
(5 rows)
See also
6  BOOL_XOR [analytic]
Returns the Boolean value of an expression within a.
Returns the Boolean value of an expression within a window. If only one input value is true, BOOL_XOR
returns t
. Otherwise, it returns f
.
Behavior type
Immutable
Syntax
BOOL_XOR ( expression ) OVER (
[ windowpartitionclause ]
[ windoworderclause ]
[ windowframeclause ] )
Parameters
expression
 A Boolean data type or any nonBoolean data type that can be implicitly converted to a Boolean data type. The function returns a Boolean value.
OVER()
 See Analytic Functions.
Examples
The following example illustrates how you can use the BOOL_AND
, BOOL_OR
, and BOOL_XOR
analytic functions. The sample table, employee, includes a column for type of employee and years paid.
=> CREATE TABLE employee(emptype VARCHAR, yearspaid VARCHAR);
CREATE TABLE
Insert sample data into the table to show years paid. In more than one case, an employee could be paid more than once within one year.
=> INSERT INTO employee
SELECT 'contractor1', '2014'
UNION ALL
SELECT 'contractor2', '2015'
UNION ALL
SELECT 'contractor3', '2014'
UNION ALL
SELECT 'contractor1', '2014'
UNION ALL
SELECT 'contractor2', '2014'
UNION ALL
SELECT 'contractor3', '2015'
UNION ALL
SELECT 'contractor4', '2014'
UNION ALL
SELECT 'contractor4', '2014'
UNION ALL
SELECT 'contractor5', '2015'
UNION ALL
SELECT 'contractor5', '2016';
OUTPUT

10
(1 row)
Query the table. The result shows employees that were paid twice in 2014 (BOOL_AND
), once or twice in 2014 (BOOL_OR
), and specifically not more than once in 2014 (BOOL_XOR
).
=> SELECT DISTINCT emptype,
BOOL_AND(yearspaid='2014') OVER (PARTITION BY emptype) AS paidtwicein2014,
BOOL_OR(yearspaid='2014') OVER (PARTITION BY emptype) AS paidonceortwicein2014,
BOOL_XOR(yearspaid='2014') OVER (PARTITION BY emptype) AS paidjustoncein2014
FROM employee;
emptype  paidtwicein2014  paidonceortwicein2014  paidjustoncein2014
+++
contractor1  t  t  f
contractor2  f  t  t
contractor3  f  t  t
contractor4  t  t  f
contractor5  f  f  f
(5 rows)
See also
7  CONDITIONAL_CHANGE_EVENT [analytic]
Assigns an event window number to each row, starting from 0, and increments by 1 when the result of evaluating the argument expression on the current row differs from that on the previous row.
Assigns an event window number to each row, starting from 0, and increments by 1 when the result of evaluating the argument expression on the current row differs from that on the previous row.
Behavior type
Immutable
Syntax
CONDITIONAL_CHANGE_EVENT ( expression ) OVER (
[ windowpartitionclause ]
windoworderclause )
Parameters
expression
 SQL scalar expression that is evaluated on an input record. The result of *
expression
*can be of any data type.
OVER()
 See Analytic Functions.
Notes
The analytic windoworderclause
is required but the windowpartitionclause
is optional.
Examples
=> SELECT CONDITIONAL_CHANGE_EVENT(bid)
OVER (PARTITION BY symbol ORDER BY ts) AS cce
FROM TickStore;
The system returns an error when no ORDER BY
clause is present:
=> SELECT CONDITIONAL_CHANGE_EVENT(bid)
OVER (PARTITION BY symbol) AS cce
FROM TickStore;
ERROR: conditional_change_event must contain an
ORDER BY clause within its analytic clause
For more examples, see Eventbased windows.
See also
8  CONDITIONAL_TRUE_EVENT [analytic]
Assigns an event window number to each row, starting from 0, and increments the number by 1 when the result of the boolean argument expression evaluates true.
Assigns an event window number to each row, starting from 0, and increments the number by 1 when the result of the boolean argument expression evaluates true. For example, given a sequence of values for column a, as follows:
( 1, 2, 3, 4, 5, 6 )
CONDITIONAL_TRUE_EVENT(a > 3)
returns 0, 0, 0, 1, 2, 3
.
Behavior type
Immutable
Syntax
CONDITIONAL_TRUE_EVENT ( booleanexpression ) OVER (
[ windowpartitionclause ]
windoworderclause )
Parameters
booleanexpression
 SQL scalar expression that is evaluated on an input record, type BOOLEAN.
OVER()
 See Analytic functions.
Notes
The analytic windoworderclause
is required but the windowpartitionclause
is optional.
Examples
> SELECT CONDITIONAL_TRUE_EVENT(bid > 10.6)
OVER(PARTITION BY bid ORDER BY ts) AS cte
FROM Tickstore;
The system returns an error if the ORDER BY
clause is omitted:
> SELECT CONDITIONAL_TRUE_EVENT(bid > 10.6)
OVER(PARTITION BY bid) AS cte
FROM Tickstore;
ERROR: conditional_true_event must contain an ORDER BY
clause within its analytic clause
For more examples, see Eventbased windows.
See also
9  COUNT [analytic]
Counts occurrences within a group within a.
Counts occurrences within a group within a window. If you specify * or some nonnull constant, COUNT()
counts all rows.
Behavior type
Immutable
Syntax
COUNT ( expression ) OVER (
[ windowpartitionclause ]
[ windoworderclause ]
[ windowframeclause ] )
Parameters
expression
 Returns the number of rows in each group for which the
expression
is not null. Can be any expression resulting in BIGINT.
OVER()
 See Analytic Functions.
Examples
Using the schema defined in Window framing, the following COUNT
function omits window order and window frame clauses; otherwise Vertica would treat it as a window aggregate. Think of the window of reporting aggregates as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
.
=> SELECT deptno, sal, empno, COUNT(sal)
OVER (PARTITION BY deptno) AS count FROM emp;
deptno  sal  empno  count
+++
10  101  1  2
10  104  4  2
20  110  10  6
20  110  9  6
20  109  7  6
20  109  6  6
20  109  8  6
20  109  11  6
30  105  5  3
30  103  3  3
30  102  2  3
Using ORDER BY sal
creates a moving window query with default window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
.
=> SELECT deptno, sal, empno, COUNT(sal)
OVER (PARTITION BY deptno ORDER BY sal) AS count
FROM emp;
deptno  sal  empno  count
+++
10  101  1  1
10  104  4  2
20  100  11  1
20  109  7  4
20  109  6  4
20  109  8  4
20  110  10  6
20  110  9  6
30  102  2  1
30  103  3  2
30  105  5  3
Using the VMart schema, the following query finds the number of employees who make less than or equivalent to the hourly rate of the current employee. The query returns a running/cumulative average (sometimes called a moving average) using the default window of RANGE UNBOUNDED PRECEDING AND CURRENT ROW
:
=> SELECT employee_last_name AS "last_name", hourly_rate, COUNT(*)
OVER (ORDER BY hourly_rate) AS moving_count from employee_dimension;
last_name  hourly_rate  moving_count
++
Gauthier  6  4
Taylor  6  4
Jefferson  6  4
Nielson  6  4
McNulty  6.01  11
Robinson  6.01  11
Dobisz  6.01  11
Williams  6.01  11
Kramer  6.01  11
Miller  6.01  11
Wilson  6.01  11
Vogel  6.02  14
Moore  6.02  14
Vogel  6.02  14
Carcetti  6.03  19
...
To return a moving average that is not also a running (cumulative) average, the window should specify ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
:
=> SELECT employee_last_name AS "last_name", hourly_rate, COUNT(*)
OVER (ORDER BY hourly_rate ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)
AS moving_count from employee_dimension;
See also
10  CUME_DIST [analytic]
Calculates the cumulative distribution, or relative rank, of the current row with regard to other rows in the same partition within a .
Calculates the cumulative distribution, or relative rank, of the current row with regard to other rows in the same partition within a window.
CUME_DIST()
returns a number greater then 0 and less then or equal to 1, where the number represents the relative position of the specified row within a group of n
rows. For a row x
(assuming ASC
ordering), the CUME_DIST
of x
is the number of rows with values lower than or equal to the value of x
, divided by the number of rows in the partition. For example, in a group of three rows, the cumulative distribution values returned would be 1/3, 2/3, and 3/3.
Note
Because the result for a given row depends on the number of rows preceding that row in the same partition, you should always specify a windoworderclause
when you call this function.
Behavior type
Immutable
Syntax
CUME_DIST ( ) OVER (
[ windowpartitionclause ]
windoworderclause )
Parameters
OVER()
 See Analytic Functions.
Examples
The following example returns the cumulative distribution of sales for different transaction types within each month of the first quarter.
=> SELECT calendar_month_name AS month, tender_type, SUM(sales_quantity),
CUME_DIST()
OVER (PARTITION BY calendar_month_name ORDER BY SUM(sales_quantity)) AS
CUME_DIST
FROM store.store_sales_fact JOIN date_dimension
USING(date_key) WHERE calendar_month_name IN ('January','February','March')
AND tender_type NOT LIKE 'Other'
GROUP BY calendar_month_name, tender_type;
month  tender_type  SUM  CUME_DIST
+++
March  Credit  469858  0.25
March  Cash  470449  0.5
March  Check  473033  0.75
March  Debit  475103  1
January  Cash  441730  0.25
January  Debit  443922  0.5
January  Check  446297  0.75
January  Credit  450994  1
February  Check  425665  0.25
February  Debit  426726  0.5
February  Credit  430010  0.75
February  Cash  430767  1
(12 rows)
See also
11  DENSE_RANK [analytic]
Within each window partition, ranks all rows in the query results set according to the order specified by the window's ORDER BY clause.
Within each window partition, ranks all rows in the query results set according to the order specified by the window's ORDER BY
clause. A DENSE_RANK
function returns a sequence of ranking numbers without any gaps.
DENSE_RANK
executes as follows:

Sorts partition rows as specified by the ORDER BY
clause.

Compares the ORDER BY
values of the preceding row and current row and ranks the current row as follows:

If ORDER BY
values are the same, the current row gets the same ranking as the preceding row.
Note
Null values are considered equal. For detailed information on how null values are sorted, see
NULL sort order.

If the ORDER BY
values are different, DENSE_RANK
increments or decrements the current row's ranking by 1, depending whether sort order is ascending or descending.
DENSE_RANK
always changes the ranking by 1, so no gaps appear in the ranking sequence. The largest rank value is the number of unique ORDER BY
values returned by the query.
Behavior type
Immutable
Syntax
DENSE_RANK() OVER (
[ windowpartitionclause ]
windoworderclause )
Parameters
OVER()
 See Analytic Functions.
See Analytic Functions
Compared with RANK
RANK
leaves gaps in the ranking sequence, while DENSE_RANK
does not. The example below compares the behavior of the two functions.
Examples
The following query invokes RANK
and DENSE_RANK
to rank customers by annual income. The two functions return different rankings, as follows:

If annual_salary
contains duplicate values, RANK()
inserts duplicate rankings and then skips one or more values—for example, from 4 to 6 and 7 to 9.

In the parallel column Dense Rank
, DENSE_RANK()
also inserts duplicate rankings, but leaves no gaps in the rankings sequence:
=> SELECT employee_region region, employee_key, annual_salary,
RANK() OVER (PARTITION BY employee_region ORDER BY annual_salary) Rank,
DENSE_RANK() OVER (PARTITION BY employee_region ORDER BY annual_salary) "Dense Rank"
FROM employee_dimension;
region  employee_key  annual_salary  Rank  Dense Rank
++++
West  5248  1200  1  1
West  6880  1204  2  2
West  5700  1214  3  3
West  9857  1218  4  4
West  6014  1218  4  4
West  9221  1220  6  5
West  7646  1222  7  6
West  6621  1222  7  6
West  6488  1224  9  7
West  7659  1226  10  8
West  7432  1226  10  8
West  9905  1226  10  8
West  9021  1228  13  9
...
West  56  963104  2794  2152
West  100  992363  2795  2153
East  8353  1200  1  1
East  9743  1202  2  2
East  9975  1202  2  2
East  9205  1204  4  3
East  8894  1206  5  4
East  7740  1206  5  4
East  7324  1208  7  5
East  6505  1208  7  5
East  5404  1208  7  5
East  5010  1208  7  5
East  9114  1212  11  6
...
See also
SQL analytics
12  EXPONENTIAL_MOVING_AVERAGE [analytic]
Calculates the exponential moving average (EMA) of expression E with smoothing factor X.
Calculates the exponential moving average (EMA) of expression E
with smoothing factor X
. An EMA differs from a simple moving average in that it provides a more stable picture of changes to data over time.
The EMA is calculated by adding the previous EMA value to the current data point scaled by the smoothing factor, as in the following formula:
EMA
=
EMA0
+ (
X
* (
E

EMA0
))
where:

E
is the current data point

EMA0
is the previous row's EMA value.

X
is the smoothing factor.
This function also works at the row level. For example, EMA assumes the data in a given column is sampled at uniform intervals. If the users' data points are sampled at nonuniform intervals, they should run the time series gap filling and interpolation (GFI) operations before EMA()
Behavior type
Immutable
Syntax
EXPONENTIAL_MOVING_AVERAGE ( E, X ) OVER (
[ windowpartitionclause ]
windoworderclause )
Parameters
E
 The value whose average is calculated over a set of rows. Can be
INTEGER
, FLOAT
or NUMERIC
type and must be a constant.
X
 A positive
FLOAT
value between 0 and 1 that is used as the smoothing factor.
OVER()
 See Analytic Functions.
Examples
The following example uses time series gap filling and interpolation (GFI) first in a subquery, and then performs an EXPONENTIAL_MOVING_AVERAGE
operation on the subquery result.
Create a simple fourcolumn table:
=> CREATE TABLE ticker(
time TIMESTAMP,
symbol VARCHAR(8),
bid1 FLOAT,
bid2 FLOAT );
Insert some data, including nulls, so GFI can do its interpolation and gap filling:
=> INSERT INTO ticker VALUES ('20090712 03:00:00', 'ABC', 60.45, 60.44);
=> INSERT INTO ticker VALUES ('20090712 03:00:01', 'ABC', 60.49, 65.12);
=> INSERT INTO ticker VALUES ('20090712 03:00:02', 'ABC', 57.78, 59.25);
=> INSERT INTO ticker VALUES ('20090712 03:00:03', 'ABC', null, 65.12);
=> INSERT INTO ticker VALUES ('20090712 03:00:04', 'ABC', 67.88, null);
=> INSERT INTO ticker VALUES ('20090712 03:00:00', 'XYZ', 47.55, 40.15);
=> INSERT INTO ticker VALUES ('20090712 03:00:01', 'XYZ', 44.35, 46.78);
=> INSERT INTO ticker VALUES ('20090712 03:00:02', 'XYZ', 71.56, 75.78);
=> INSERT INTO ticker VALUES ('20090712 03:00:03', 'XYZ', 85.55, 70.21);
=> INSERT INTO ticker VALUES ('20090712 03:00:04', 'XYZ', 45.55, 58.65);
=> COMMIT;
Note
During gap filling and interpolation, Vertica takes the closest non null value on either side of the time slice and uses that value. For example, if you use a linear interpolation scheme and you do not specify
IGNORE NULLS
, and your data has one real value and one null, the result is null. If the value on either side is null, the result is null. See
When Time Series Data Contains Nulls for details.
Query the table that you just created to you can see the output:
=> SELECT * FROM ticker;
time  symbol  bid1  bid2
+++
20090712 03:00:00  ABC  60.45  60.44
20090712 03:00:01  ABC  60.49  65.12
20090712 03:00:02  ABC  57.78  59.25
20090712 03:00:03  ABC   65.12
20090712 03:00:04  ABC  67.88 
20090712 03:00:00  XYZ  47.55  40.15
20090712 03:00:01  XYZ  44.35  46.78
20090712 03:00:02  XYZ  71.56  75.78
20090712 03:00:03  XYZ  85.55  70.21
20090712 03:00:04  XYZ  45.55  58.65
(10 rows)
The following query processes the first and last values that belong to each 2second time slice in table trades
' column a
. The query then calculates the exponential moving average of expression fv and lv with a smoothing factor of 50%:
=> SELECT symbol, slice_time, fv, lv,
EXPONENTIAL_MOVING_AVERAGE(fv, 0.5)
OVER (PARTITION BY symbol ORDER BY slice_time) AS ema_first,
EXPONENTIAL_MOVING_AVERAGE(lv, 0.5)
OVER (PARTITION BY symbol ORDER BY slice_time) AS ema_last
FROM (
SELECT symbol, slice_time,
TS_FIRST_VALUE(bid1 IGNORE NULLS) as fv,
TS_LAST_VALUE(bid2 IGNORE NULLS) AS lv
FROM ticker TIMESERIES slice_time AS '2 seconds'
OVER (PARTITION BY symbol ORDER BY time) ) AS sq;
symbol  slice_time  fv  lv  ema_first  ema_last
+++++
ABC  20090712 03:00:00  60.45  65.12  60.45  65.12
ABC  20090712 03:00:02  57.78  65.12  59.115  65.12
ABC  20090712 03:00:04  67.88  65.12  63.4975  65.12
XYZ  20090712 03:00:00  47.55  46.78  47.55  46.78
XYZ  20090712 03:00:02  71.56  70.21  59.555  58.495
XYZ  20090712 03:00:04  45.55  58.65  52.5525  58.5725
(6 rows)
See also
13  FIRST_VALUE [analytic]
Lets you select the first value of a table or partition (determined by the windoworderclause) without having to use a self join.
Lets you select the first value of a table or partition (determined by the windoworderclause
) without having to use a self join. This function is useful when you want to use the first value as a baseline in calculations.
Use FIRST_VALUE()
with the windoworderclause
to produce deterministic results. If no window is specified for the current row, the default window is UNBOUNDED PRECEDING AND CURRENT ROW
.
Behavior type
Immutable
Syntax
FIRST_VALUE ( expression [ IGNORE NULLS ] ) OVER (
[ windowpartitionclause ]
[ windoworderclause ]
[ windowframeclause ] )
Parameters
expression
 Expression to evaluate—or example, a constant, column, nonanalytic function, function expression, or expressions involving any of these.
IGNORE NULLS
 Specifies to return the first nonnull value in the set, or
NULL
if all values are NULL
. If you omit this option and the first value in the set is null, the function returns NULL
.
OVER()
 See Analytic Functions.
Examples
The following query asks for the first value in the partitioned day of week, and illustrates the potential nondeterministic nature of FIRST_VALUE()
:
=> SELECT calendar_year, date_key, day_of_week, full_date_description,
FIRST_VALUE(full_date_description)
OVER(PARTITION BY calendar_month_number_in_year ORDER BY day_of_week)
AS "first_value"
FROM date_dimension
WHERE calendar_year=2003 AND calendar_month_number_in_year=1;
The first value returned is January 31, 2003; however, the next time the same query is run, the first value might be January 24 or January 3, or the 10th or 17th. This is because the analytic ORDER BY
column day_of_week
returns rows that contain ties (multiple Fridays). These repeated values make the ORDER BY
evaluation result nondeterministic, because rows that contain ties can be ordered in any way, and any one of those rows qualifies as being the first value of day_of_week
.
calendar_year  date_key  day_of_week  full_date_description  first_value
++++
2003  31  Friday  January 31, 2003  January 31, 2003
2003  24  Friday  January 24, 2003  January 31, 2003
2003  3  Friday  January 3, 2003  January 31, 2003
2003  10  Friday  January 10, 2003  January 31, 2003
2003  17  Friday  January 17, 2003  January 31, 2003
2003  6  Monday  January 6, 2003  January 31, 2003
2003  27  Monday  January 27, 2003  January 31, 2003
2003  13  Monday  January 13, 2003  January 31, 2003
2003  20  Monday  January 20, 2003  January 31, 2003
2003  11  Saturday  January 11, 2003  January 31, 2003
2003  18  Saturday  January 18, 2003  January 31, 2003
2003  25  Saturday  January 25, 2003  January 31, 2003
2003  4  Saturday  January 4, 2003  January 31, 2003
2003  12  Sunday  January 12, 2003  January 31, 2003
2003  26  Sunday  January 26, 2003  January 31, 2003
2003  5  Sunday  January 5, 2003  January 31, 2003
2003  19  Sunday  January 19, 2003  January 31, 2003
2003  23  Thursday  January 23, 2003  January 31, 2003
2003  2  Thursday  January 2, 2003  January 31, 2003
2003  9  Thursday  January 9, 2003  January 31, 2003
2003  16  Thursday  January 16, 2003  January 31, 2003
2003  30  Thursday  January 30, 2003  January 31, 2003
2003  21  Tuesday  January 21, 2003  January 31, 2003
2003  14  Tuesday  January 14, 2003  January 31, 2003
2003  7  Tuesday  January 7, 2003  January 31, 2003
2003  28  Tuesday  January 28, 2003  January 31, 2003
2003  22  Wednesday  January 22, 2003  January 31, 2003
2003  29  Wednesday  January 29, 2003  January 31, 2003
2003  15  Wednesday  January 15, 2003  January 31, 2003
2003  1  Wednesday  January 1, 2003  January 31, 2003
2003  8  Wednesday  January 8, 2003  January 31, 2003
(31 rows)
Note
The day_of_week
results are returned in alphabetical order because of lexical rules. The fact that each day does not appear ordered by the 7day week cycle (for example, starting with Sunday followed by Monday, Tuesday, and so on) has no affect on results.
To return deterministic results, modify the query so that it performs its analytic ORDER BY
operations on a unique field, such as date_key
:
=> SELECT calendar_year, date_key, day_of_week, full_date_description,
FIRST_VALUE(full_date_description) OVER
(PARTITION BY calendar_month_number_in_year ORDER BY date_key) AS "first_value"
FROM date_dimension WHERE calendar_year=2003;
FIRST_VALUE()
returns a first value of January 1 for the January partition and the first value of February 1 for the February partition. Also, the full_date_description
column contains no ties:
calendar_year  date_key  day_of_week  full_date_description  first_value
++++
2003  1  Wednesday  January 1, 2003  January 1, 2003
2003  2  Thursday  January 2, 2003  January 1, 2003
2003  3  Friday  January 3, 2003  January 1, 2003
2003  4  Saturday  January 4, 2003  January 1, 2003
2003  5  Sunday  January 5, 2003  January 1, 2003
2003  6  Monday  January 6, 2003  January 1, 2003
2003  7  Tuesday  January 7, 2003  January 1, 2003
2003  8  Wednesday  January 8, 2003  January 1, 2003
2003  9  Thursday  January 9, 2003  January 1, 2003
2003  10  Friday  January 10, 2003  January 1, 2003
2003  11  Saturday  January 11, 2003  January 1, 2003
2003  12  Sunday  January 12, 2003  January 1, 2003
2003  13  Monday  January 13, 2003  January 1, 2003
2003  14  Tuesday  January 14, 2003  January 1, 2003
2003  15  Wednesday  January 15, 2003  January 1, 2003
2003  16  Thursday  January 16, 2003  January 1, 2003
2003  17  Friday  January 17, 2003  January 1, 2003
2003  18  Saturday  January 18, 2003  January 1, 2003
2003  19  Sunday  January 19, 2003  January 1, 2003
2003  20  Monday  January 20, 2003  January 1, 2003
2003  21  Tuesday  January 21, 2003  January 1, 2003
2003  22  Wednesday  January 22, 2003  January 1, 2003
2003  23  Thursday  January 23, 2003  January 1, 2003
2003  24  Friday  January 24, 2003  January 1, 2003
2003  25  Saturday  January 25, 2003  January 1, 2003
2003  26  Sunday  January 26, 2003  January 1, 2003
2003  27  Monday  January 27, 2003  January 1, 2003
2003  28  Tuesday  January 28, 2003  January 1, 2003
2003  29  Wednesday  January 29, 2003  January 1, 2003
2003  30  Thursday  January 30, 2003  January 1, 2003
2003  31  Friday  January 31, 2003  January 1, 2003
2003  32  Saturday  February 1, 2003  February 1, 2003
2003  33  Sunday  February 2, 2003  February 1,2003
...
(365 rows)
See also
14  LAG [analytic]
Returns the value of the input expression at the given offset before the current row within a.
Returns the value of the input expression at the given offset before the current row within a window. This function lets you access more than one row in a table at the same time. This is useful for comparing values when the relative positions of rows can be reliably known. It also lets you avoid the more costly self join, which enhances query processing speed.
For information on getting the rows that follow, see LEAD.
Behavior type
Immutable
Syntax
LAG ( expression[, offset ] [, default ] ) OVER (
[ windowpartitionclause ]
windoworderclause )
Parameters
expression
 The expression to evaluate—for example, a constant, column, nonanalytic function, function expression, or expressions involving any of these.
offset
 Indicates how great is the lag. The default value is 1 (the previous row). This parameter must evaluate to a constant positive integer.
default
 The value returned if
offset
falls outside the bounds of the table or partition. This value must be a constant value or an expression that can be evaluated to a constant; its data type is coercible to that of the first argument.
Examples
This example sums the current balance by date in a table and also sums the previous balance from the last day. Given the inputs that follow, the data satisfies the following conditions:

For each some_id
, there is exactly 1 row for each date represented by month_date
.

For each some_id
, the set of dates is consecutive; that is, if there is a row for February 24 and a row for February 26, there would also be a row for February 25.

Each some_id
has the same set of dates.
=> CREATE TABLE balances (
month_date DATE,
current_bal INT,
some_id INT);
=> INSERT INTO balances values ('20090224', 10, 1);
=> INSERT INTO balances values ('20090225', 10, 1);
=> INSERT INTO balances values ('20090226', 10, 1);
=> INSERT INTO balances values ('20090224', 20, 2);
=> INSERT INTO balances values ('20090225', 20, 2);
=> INSERT INTO balances values ('20090226', 20, 2);
=> INSERT INTO balances values ('20090224', 30, 3);
=> INSERT INTO balances values ('20090225', 20, 3);
=> INSERT INTO balances values ('20090226', 30, 3);
Now run LAG to sum the current balance for each date and sum the previous balance from the last day:
=> SELECT month_date,
SUM(current_bal) as current_bal_sum,
SUM(previous_bal) as previous_bal_sum FROM
(SELECT month_date, current_bal,
LAG(current_bal, 1, 0) OVER
(PARTITION BY some_id ORDER BY month_date)
AS previous_bal FROM balances) AS subQ
GROUP BY month_date ORDER BY month_date;
month_date  current_bal_sum  previous_bal_sum
++
20090224  60  0
20090225  50  60
20090226  60  50
(3 rows)
Using the same example data, the following query would not be allowed because LAG is nested inside an aggregate function:
=> SELECT month_date,
SUM(current_bal) as current_bal_sum,
SUM(LAG(current_bal, 1, 0) OVER
(PARTITION BY some_id ORDER BY month_date)) AS previous_bal_sum
FROM some_table GROUP BY month_date ORDER BY month_date;
The following example uses the VMart database. LAG first returns the annual income from the previous row, and then it calculates the difference between the income in the current row from the income in the previous row:
=> SELECT occupation, customer_key, customer_name, annual_income,
LAG(annual_income, 1, 0) OVER (PARTITION BY occupation
ORDER BY annual_income) AS prev_income, annual_income 
LAG(annual_income, 1, 0) OVER (PARTITION BY occupation
ORDER BY annual_income) AS difference
FROM customer_dimension ORDER BY occupation, customer_key LIMIT 20;
occupation  customer_key  customer_name  annual_income  prev_income  difference
+++++
Accountant  15  Midori V. Peterson  692610  692535  75
Accountant  43  Midori S. Rodriguez  282359  280976  1383
Accountant  93  Robert P. Campbell  471722  471355  367
Accountant  102  Sam T. McNulty  901636  901561  75
Accountant  134  Martha B. Overstreet  705146  704335  811
Accountant  165  James C. Kramer  376841  376474  367
Accountant  225  Ben W. Farmer  70574  70449  125
Accountant  270  Jessica S. Lang  684204  682274  1930
Accountant  273  Mark X. Lampert  723294  722737  557
Accountant  295  Sharon K. Gauthier  29033  28412  621
Accountant  338  Anna S. Jackson  816858  815557  1301
Accountant  377  William I. Jones  915149  914872  277
Accountant  438  Joanna A. McCabe  147396  144482  2914
Accountant  452  Kim P. Brown  126023  124797  1226
Accountant  467  Meghan K. Carcetti  810528  810284  244
Accountant  478  Tanya E. Greenwood  639649  639029  620
Accountant  511  Midori P. Vogel  187246  185539  1707
Accountant  525  Alexander K. Moore  677433  677050  383
Accountant  550  Sam P. Reyes  735691  735355  336
Accountant  577  Robert U. Vu  616101  615439  662
(20 rows)
The next example uses LEAD and LAG to return the third row after the salary in the current row and fifth salary before the salary in the current row:
=> SELECT hire_date, employee_key, employee_last_name,
LEAD(hire_date, 1) OVER (ORDER BY hire_date) AS "next_hired" ,
LAG(hire_date, 1) OVER (ORDER BY hire_date) AS "last_hired"
FROM employee_dimension ORDER BY hire_date, employee_key;
hire_date  employee_key  employee_last_name  next_hired  last_hired
++++
19560411  2694  Farmer  19560512 
19560512  5486  Winkler  19560918  19560411
19560918  5525  McCabe  19570115  19560512
19570115  560  Greenwood  19570206  19560918
19570206  9781  Bauer  19570525  19570115
19570525  9506  Webber  19570704  19570206
19570704  6723  Kramer  19570707  19570525
19570707  5827  Garnett  19571111  19570704
19571111  373  Reyes  19571121  19570707
19571121  3874  Martin  19580206  19571111
(10 rows)
See also
15  LAST_VALUE [analytic]
Lets you select the last value of a table or partition (determined by the windoworderclause) without having to use a self join.
Lets you select the last value of a table or partition (determined by the windoworderclause
) without having to use a self join. LAST_VALUE
takes the last record from the partition after the window order clause. The function then computes the expression against the last record, and returns the results. This function is useful when you want to use the last value as a baseline in calculations.
Use LAST_VALUE()
with the windoworderclause
to produce deterministic results. If no window is specified for the current row, the default window is UNBOUNDED PRECEDING AND CURRENT ROW
.
Tip
Due to default window semantics,
LAST_VALUE
does not always return the last value of a partition. If you omit
windowframeclause from the analytic clause,
LAST_VALUE
operates on this default window. Although results can seem nonintuitive by not returning the bottom of the current partition, it returns the bottom of the window, which continues to change along with the current input row being processed. If you want to return the last value of a partition, use
UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
. See examples below.
Behavior type
Immutable
Syntax
LAST_VALUE ( expression [ IGNORE NULLS ] ) OVER (
[ windowpartitionclause ]
[ windoworderclause ]
[ windowframeclause ] )
Parameters
expression
 Expression to evaluate—for example, a constant, column, nonanalytic function, function expression, or expressions involving any of these.
IGNORE NULLS
 Specifies to return the last nonnull value in the set, or
NULL
if all values are NULL
. If you omit this option and the last value in the set is null, the function returns NULL
.
OVER()
 See Analytic Functions.
Examples
Using the schema defined in Window framing in Analyzing Data, the following query does not show the highest salary value by department; instead it shows the highest salary value by department by salary.
=> SELECT deptno, sal, empno, LAST_VALUE(sal)
OVER (PARTITION BY deptno ORDER BY sal) AS lv
FROM emp;
deptno  sal  empno  lv
+++
10  101  1  101
10  104  4  104
20  100  11  100
20  109  7  109
20  109  6  109
20  109  8  109
20  110  10  110
20  110  9  110
30  102  2  102
30  103  3  103
30  105  5  105
If you include the window frame clause ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
, LAST_VALUE()
returns the highest salary by department, an accurate representation of the information:
=> SELECT deptno, sal, empno, LAST_VALUE(sal)
OVER (PARTITION BY deptno ORDER BY sal
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS lv
FROM emp;
deptno  sal  empno  lv
+++
10  101  1  104
10  104  4  104
20  100  11  110
20  109  7  110
20  109  6  110
20  109  8  110
20  110  10  110
20  110  9  110
30  102  2  105
30  103  3  105
30  105  5  105
For more examples, see FIRST_VALUE().
See also
16  LEAD [analytic]
Returns values from the row after the current row within a , letting you access more than one row in a table at the same time.
Returns values from the row after the current row within a window, letting you access more than one row in a table at the same time. This is useful for comparing values when the relative positions of rows can be reliably known. It also lets you avoid the more costly self join, which enhances query processing speed.
Behavior type
Immutable
Syntax
LEAD ( expression[, offset ] [, default ] ) OVER (
[ windowpartitionclause ]
windoworderclause )
Parameters
expression
 The expression to evaluate—for example, a constant, column, nonanalytic function, function expression, or expressions involving any of these.
offset
 Is an optional parameter that defaults to 1 (the next row). This parameter must evaluate to a constant positive integer.
default
 The value returned if
offset
falls outside the bounds of the table or partition. This value must be a constant value or an expression that can be evaluated to a constant; its data type is coercible to that of the first argument.
Examples
LEAD
finds the hire date of the employee hired just after the current row:
=> SELECT employee_region, hire_date, employee_key, employee_last_name,
LEAD(hire_date, 1) OVER (PARTITION BY employee_region ORDER BY hire_date) AS "next_hired"
FROM employee_dimension ORDER BY employee_region, hire_date, employee_key;
employee_region  hire_date  employee_key  employee_last_name  next_hired
++++
East  19560408  9218  Harris  19570206
East  19570206  7799  Stein  19570525
East  19570525  3687  Farmer  19570626
East  19570626  9474  Bauer  19570818
East  19570818  570  Jefferson  19570824
East  19570824  4363  Wilson  19580217
East  19580217  6457  McCabe  19580626
East  19580626  6196  Li  19580716
East  19580716  7749  Harris  19580918
East  19580918  9678  Sanchez  19581110
(10 rows)
The next example uses LEAD
and LAG
to return the third row after the salary in the current row and fifth salary before the salary in the current row.
=> SELECT hire_date, employee_key, employee_last_name,
LEAD(hire_date, 1) OVER (ORDER BY hire_date) AS "next_hired" ,
LAG(hire_date, 1) OVER (ORDER BY hire_date) AS "last_hired"
FROM employee_dimension ORDER BY hire_date, employee_key;
hire_date  employee_key  employee_last_name  next_hired  last_hired
++++
19560411  2694  Farmer  19560512 
19560512  5486  Winkler  19560918  19560411
19560918  5525  McCabe  19570115  19560512
19570115  560  Greenwood  19570206  19560918
19570206  9781  Bauer  19570525  19570115
19570525  9506  Webber  19570704  19570206
19570704  6723  Kramer  19570707  19570525
19570707  5827  Garnett  19571111  19570704
19571111  373  Reyes  19571121  19570707
19571121  3874  Martin  19580206  19571111
(10 rows)
The following example returns employee name and salary, along with the next highest and lowest salaries.
=> SELECT employee_last_name, annual_salary,
NVL(LEAD(annual_salary) OVER (ORDER BY annual_salary),
MIN(annual_salary) OVER()) "Next Highest",
NVL(LAG(annual_salary) OVER (ORDER BY annual_salary),
MAX(annual_salary) OVER()) "Next Lowest"
FROM employee_dimension;
employee_last_name  annual_salary  Next Highest  Next Lowest
+++
Nielson  1200  1200  995533
Lewis  1200  1200  1200
Harris  1200  1202  1200
Robinson  1202  1202  1200
Garnett  1202  1202  1202
Weaver  1202  1202  1202
Nielson  1202  1202  1202
McNulty  1202  1204  1202
Farmer  1204  1204  1202
Martin  1204  1204  1204
(10 rows)
The next example returns, for each assistant director in the employees table, the hire date of the director hired just after the director on the current row. For example, Jackson was hired on 20161228, and the next director hired was Bauer:
=> SELECT employee_last_name, hire_date,
LEAD(hire_date, 1) OVER (ORDER BY hire_date DESC) as "NextHired"
FROM employee_dimension WHERE job_title = 'Assistant Director';
employee_last_name  hire_date  NextHired
++
Jackson  20161228  20161226
Bauer  20161226  20161211
Miller  20161211  20161207
Fortin  20161207  20161127
Harris  20161127  20161115
Goldberg  20161115 
(5 rows)
See also
17  MAX [analytic]
Returns the maximum value of an expression within a.
Returns the maximum value of an expression within a window. The return value has the same type as the expression data type.
The analytic functions MIN()
and MAX()
can operate with Boolean values. The MAX()
function acts upon a Boolean data type or a value that can be implicitly converted to a Boolean value. If at least one input value is true, MAX()
returns t
(true). Otherwise, it returns f
(false). In the same scenario, the MIN()
function returns t
(true) if all input values are true. Otherwise, it returns f
.
Behavior type
Immutable
Syntax
MAX ( expression ) OVER (
[ windowpartitionclause ]
[ windoworderclause ]
[ windowframeclause ] )
Parameters
expression
 Any expression for which the maximum value is calculated, typically a column reference.
OVER()
 See Analytic Functions.
Examples
The following query computes the deviation between the employees' annual salary and the maximum annual salary in Massachusetts:
=> SELECT employee_state, annual_salary,
MAX(annual_salary)
OVER(PARTITION BY employee_state ORDER BY employee_key) max,
annual_salary MAX(annual_salary)
OVER(PARTITION BY employee_state ORDER BY employee_key) diff
FROM employee_dimension
WHERE employee_state = 'MA';
employee_state  annual_salary  max  diff
+++
MA  1918  995533  993615
MA  2058  995533  993475
MA  2586  995533  992947
MA  2500  995533  993033
MA  1318  995533  994215
MA  2072  995533  993461
MA  2656  995533  992877
MA  2148  995533  993385
MA  2366  995533  993167
MA  2664  995533  992869
(10 rows)
The following example shows you the difference between the MIN
and MAX
analytic functions when you use them with a Boolean value. The sample creates a table with two columns, adds two rows of data, and shows sample output for MIN
and MAX
.
CREATE TABLE min_max_functions (emp VARCHAR, torf BOOL);
INSERT INTO min_max_functions VALUES ('emp1', 1);
INSERT INTO min_max_functions VALUES ('emp1', 0);
SELECT DISTINCT emp,
min(torf) OVER (PARTITION BY emp) AS worksasbooleanand,
Max(torf) OVER (PARTITION BY emp) AS worksasbooleanor
FROM min_max_functions;
emp  worksasbooleanand  worksasbooleanor
++
emp1  f  t
(1 row)
See also
18  MEDIAN [analytic]
For each row, returns the median value of a value set within each partition.
For each row, returns the median value of a value set within each partition. MEDIAN
determines the argument with the highest numeric precedence, implicitly converts the remaining arguments to that data type, and returns that data type.
MEDIAN
is an alias of
PERCENTILE_CONT [analytic]
with an argument of 0.5 (50%).
Behavior type
Immutable
Syntax
MEDIAN ( expression ) OVER ( [ windowpartitionclause ] )
Parameters
expression
 Any
NUMERIC
data type or any nonnumeric data type that can be implicitly converted to a numeric data type. The function returns the middle value or an interpolated value that would be the middle value once the values are sorted. Null values are ignored in the calculation.
OVER()
 If the
OVER
clause specifies windowpartitionclause
, MEDIAN
groups input rows according to one or more columns or expressions. If this clause is omitted, no grouping occurs and MEDIAN
processes all input rows as a single partition.
Examples
See Calculating a median value
See also
19  MIN [analytic]
Returns the minimum value of an expression within a.
Returns the minimum value of an expression within a window. The return value has the same type as the expression data type.
The analytic functions MIN()
and MAX()
can operate with Boolean values. The MAX()
function acts upon a Boolean data type or a value that can be implicitly converted to a Boolean value. If at least one input value is true, MAX()
returns t
(true). Otherwise, it returns f
(false). In the same scenario, the MIN()
function returns t
(true) if all input values are true. Otherwise, it returns f
.
Behavior type
Immutable
Syntax
MIN ( expression ) OVER (
[ windowpartitionclause ]
[ windoworderclause ]
[ windowframeclause ] )
Parameters
expression
 Any expression for which the minimum value is calculated, typically a column reference.
OVER()
 See Analytic Functions.
Examples
The following example shows how you can query to determine the deviation between the employees' annual salary and the minimum annual salary in Massachusetts:
=> SELECT employee_state, annual_salary,
MIN(annual_salary)
OVER(PARTITION BY employee_state ORDER BY employee_key) min,
annual_salary MIN(annual_salary)
OVER(PARTITION BY employee_state ORDER BY employee_key) diff
FROM employee_dimension
WHERE employee_state = 'MA';
employee_state  annual_salary  min  diff
+++
MA  1918  1204  714
MA  2058  1204  854
MA  2586  1204  1382
MA  2500  1204  1296
MA  1318  1204  114
MA  2072  1204  868
MA  2656  1204  1452
MA  2148  1204  944
MA  2366  1204  1162
MA  2664  1204  1460
(10 rows)
The following example shows you the difference between the MIN
and MAX
analytic functions when you use them with a Boolean value. The sample creates a table with two columns, adds two rows of data, and shows sample output for MIN
and MAX
.
CREATE TABLE min_max_functions (emp VARCHAR, torf BOOL);
INSERT INTO min_max_functions VALUES ('emp1', 1);
INSERT INTO min_max_functions VALUES ('emp1', 0);
SELECT DISTINCT emp,
min(torf) OVER (PARTITION BY emp) AS worksasbooleanand,
Max(torf) OVER (PARTITION BY emp) AS worksasbooleanor
FROM min_max_functions;
emp  worksasbooleanand  worksasbooleanor
++
emp1  f  t
(1 row)
See also
20  NTH_VALUE [analytic]
Returns the value evaluated at the row that is the nth row of the window (counting from 1).
Returns the value evaluated at the row that is the *n
*th row of the window (counting from 1). If the specified row does not exist, NTH_VALUE returns NULL
.
Behavior type
Immutable
Syntax
NTH_VALUE ( expression, rownumber [ IGNORE NULLS ] ) OVER (
[ windowframeclause ]
[ windoworderclause ])
Parameters
expression
 Expression to evaluate. The expression can be a constant, column name, nonanalytic function, function expression, or expressions that include any of these.
rownumber
 Specifies the row to evaluate, where
rownumber
evaluates to an integer ≥ 1.
IGNORE NULLS
 Specifies to return the first non
NULL
value in the set, or NULL
if all values are NULL
.
OVER()
 See Analytic Functions.
Examples
In the following example, for each tuple (current row) in table t1
, the window frame clause defines the window as follows:
ORDER BY b ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
For each window, n
for *n
*th value is a+1
. a
is the value of column a
in the tuple.
NTH_VALUE returns the result of the expression b+1
, where b
is the value of column b
in the *n
*th row, which is the a+1
row within the window.
=> SELECT * FROM t1 ORDER BY a;
a  b
+
1  10
2  20
2  21
3  30
4  40
5  50
6  60
(7 rows)
=> SELECT NTH_VALUE(b+1, a+1) OVER
(ORDER BY b ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) FROM t1;
?column?

22
31
(7 rows)
21  NTILE [analytic]
Equally divides an ordered data set (partition) into a {value} number of subsets within a , where the subsets are numbered 1 through the value in parameter constantvalue.
Equally divides an ordered data set (partition) into a {
value
}
number of subsets within a window, where the subsets are numbered 1 through the value in parameter constantvalue
. For example, if constantvalue
= 4 and the partition contains 20 rows, NTILE
divides the partition rows into four equal subsets of five rows. NTILE
assigns each row to a subset by giving row a number from 1 to 4. The rows in the first subset are assigned 1, the next five are assigned 2, and so on.
If the number of partition rows is not evenly divisible by the number of subsets, the rows are distributed so no subset is more than one row larger than any other subset, and the lowest subsets have extra rows. For example, if constantvalue
= 4 and the number of rows = 21, the first subset has six rows, the second subset has five rows, and so on.
If the number of subsets is greater than the number of rows, then a number of subsets equal to the number of rows is filled, and the remaining subsets are empty.
Behavior type
Immutable
Syntax
NTILE ( constantvalue ) OVER (
[ windowpartitionclause ]
windoworderclause )
Parameters
constantvalue
 Specifies the number of subsets , where
constantvalue
must resolve to a positive constant for each partition.
OVER()
 See Analytic Functions.
Examples
The following query assigns each month's sales total into one of four subsets:
=> SELECT calendar_month_name AS MONTH, SUM(sales_quantity),
NTILE(4) OVER (ORDER BY SUM(sales_quantity)) AS NTILE
FROM store.store_sales_fact JOIN date_dimension
USING(date_key)
GROUP BY calendar_month_name
ORDER BY NTILE;
MONTH  SUM  NTILE
++
November  2040726  1
June  2088528  1
February  2134708  1
April  2181767  2
January  2229220  2
October  2316363  2
September  2323914  3
March  2354409  3
August  2387017  3
July  2417239  4
May  2492182  4
December  2531842  4
(12 rows)
See also
22  PERCENT_RANK [analytic]
Calculates the relative rank of a row for a given row in a group within a by dividing that row’s rank less 1 by the number of rows in the partition, also less 1.
Calculates the relative rank of a row for a given row in a group within a window by dividing that row’s rank less 1 by the number of rows in the partition, also less 1. PERCENT_RANK
always returns values from 0 to 1 inclusive. The first row in any set has a PERCENT_RANK
of 0. The return value is NUMBER
.
( rank  1 ) / ( [ rows ]  1 )
In the preceding formula, rank
is the rank position of a row in the group and rows
is the total number of rows in the partition defined by the OVER()
clause.
Behavior type
Immutable
Syntax
PERCENT_RANK ( ) OVER (
[ windowpartitionclause ]
windoworderclause )
Parameters
OVER()
 See Analytic Functions
Examples
The following example finds the percent rank of gross profit for different states within each month of the first quarter:
=> SELECT calendar_month_name AS MONTH, store_state,
SUM(gross_profit_dollar_amount),
PERCENT_RANK() OVER (PARTITION BY calendar_month_name
ORDER BY SUM(gross_profit_dollar_amount)) AS PERCENT_RANK
FROM store.store_sales_fact JOIN date_dimension
USING(date_key)
JOIN store.store_dimension
USING (store_key)
WHERE calendar_month_name IN ('January','February','March')
AND store_state IN ('OR','IA','DC','NV','WI')
GROUP BY calendar_month_name, store_state
ORDER BY calendar_month_name, PERCENT_RANK;
MONTH  store_state  SUM  PERCENT_RANK
+++
February  IA  418490  0
February  OR  460588  0.25
February  DC  616553  0.5
February  WI  619204  0.75
February  NV  838039  1
January  OR  446528  0
January  IA  474501  0.25
January  DC  628496  0.5
January  WI  679382  0.75
January  NV  871824  1
March  IA  460282  0
March  OR  481935  0.25
March  DC  716063  0.5
March  WI  771575  0.75
March  NV  970878  1
(15 rows)
The following example calculates, for each employee, the percent rank of the employee's salary by their job title:
=> SELECT job_title, employee_last_name, annual_salary,
PERCENT_RANK()
OVER (PARTITION BY job_title ORDER BY annual_salary DESC) AS percent_rank
FROM employee_dimension
ORDER BY percent_rank, annual_salary;
job_title  employee_last_name  annual_salary  percent_rank
+++
Cashier  Fortin  3196  0
Delivery Person  Garnett  3196  0
Cashier  Vogel  3196  0
Customer Service  Sanchez  3198  0
Shelf Stocker  Jones  3198  0
Custodian  Li  3198  0
Customer Service  Kramer  3198  0
Greeter  McNulty  3198  0
Greeter  Greenwood  3198  0
Shift Manager  Miller  99817  0
Advertising  Vu  99853  0
Branch Manager  Jackson  99858  0
Marketing  Taylor  99928  0
Assistant Director  King  99973  0
Sales  Kramer  99973  0
Head of PR  Goldberg  199067  0
Regional Manager  Gauthier  199744  0
Director of HR  Moore  199896  0
Head of Marketing  Overstreet  199955  0
VP of Advertising  Meyer  199975  0
VP of Sales  Sanchez  199992  0
Founder  Gauthier  927335  0
CEO  Taylor  953373  0
Investor  Garnett  963104  0
CoFounder  Vu  977716  0
CFO  Vogel  983634  0
President  Sanchez  992363  0
Delivery Person  Li  3194  0.00114155251141553
Delivery Person  Robinson  3194  0.00114155251141553
Custodian  McCabe  3192  0.00126582278481013
Shelf Stocker  Moore  3196  0.00128040973111396
Branch Manager  Moore  99716  0.00186567164179104
...
See also
23  PERCENTILE_CONT [analytic]
An inverse distribution function where, for each row, PERCENTILE_CONT returns the value that would fall into the specified percentile among a set of values in each partition within a.
An inverse distribution function where, for each row, PERCENTILE_CONT returns the value that would fall into the specified percentile among a set of values in each partition within a window. For example, if the argument to the function is 0.5, the result of the function is the median of the data set (50th percentile). PERCENTILE_CONT assumes a continuous distribution data model. NULL values are ignored.
PERCENTILE_CONT computes the percentile by first computing the row number where the percentile row would exist. For example:
rownumber = 1 + percentilevalue * (numpartitionrows 1)
If rownumber
is a whole number (within an error of 0.00001), the percentile is the value of row rownumber
.
Otherwise, Vertica interpolates the percentile value between the value of the CEILING(
rownumber
)
row and the value of the FLOOR(
rownumber
)
row. In other words, the percentile is calculated as follows:
( CEILING( rownumber)  rownumber ) * ( value of FLOOR(rownumber) row )
+ ( rownumber  FLOOR(rownumber) ) * ( value of CEILING(rownumber) row)
Note
If the percentile value is 0.5, PERCENTILE_CONT returns the same result set as the function
MEDIAN.
Behavior type
Immutable
Syntax
PERCENTILE_CONT ( percentile ) WITHIN GROUP ( ORDER BY expression [ ASC  DESC ] ) OVER ( [ windowpartitionclause ] )
Parameters
percentile
 Percentile value, a FLOAT constant that ranges from 0 to 1 (inclusive).
WITHIN GROUP (ORDER BY
expression
)
 Specifies how to sort data within each group. ORDER BY takes only one column/expression that must be INTEGER, FLOAT, INTERVAL, or NUMERIC data type. NULL values are discarded.
The WITHIN GROUP(ORDER BY)
clause does not guarantee the order of the SQL result. To order the final result , use the SQL ORDER BY clause set.
ASC  DESC
 Specifies the ordering sequence as ascending (default) or descending.
Specifying ASC or DESC in the WITHIN GROUP
clause affects results as long as the percentile
is not 0.5
.
OVER()
 See Analytic Functions
Examples
This query computes the median annual income per group for the first 300 customers in Wisconsin and the District of Columbia.
=> SELECT customer_state, customer_key, annual_income, PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY annual_income)
OVER (PARTITION BY customer_state) AS PERCENTILE_CONT
FROM customer_dimension WHERE customer_state IN ('DC','WI') AND customer_key < 300
ORDER BY customer_state, customer_key;
customer_state  customer_key  annual_income  PERCENTILE_CONT
+++
DC  52  168312  483266.5
DC  118  798221  483266.5
WI  62  283043  377691
WI  139  472339  377691
(4 rows)
This query computes the median annual income per group for all customers in Wisconsin and the District of Columbia.
=> SELECT customer_state, customer_key, annual_income, PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY annual_income)
OVER (PARTITION BY customer_state) AS PERCENTILE_CONT
FROM customer_dimension WHERE customer_state IN ('DC','WI') ORDER BY customer_state, customer_key;
customer_state  customer_key  annual_income  PERCENTILE_CONT
+++
DC  52  168312  483266.5
DC  118  798221  483266.5
DC  622  220782  555088
DC  951  178453  555088
DC  972  961582  555088
DC  1286  760445  555088
DC  1434  44836  555088
...
WI  62  283043  377691
WI  139  472339  377691
WI  359  42242  517717
WI  364  867543  517717
WI  403  509031  517717
WI  455  32000  517717
WI  485  373129  517717
...
(1353 rows)
See also
24  PERCENTILE_DISC [analytic]
An inverse distribution function where, for each row, PERCENTILE_DISC returns the value that would fall into the specified percentile among a set of values in each partition within a.
An inverse distribution function where, for each row, PERCENTILE_DISC
returns the value that would fall into the specified percentile among a set of values in each partition within a window. PERCENTILE_DISC()
assumes a discrete distribution data model. NULL
values are ignored.
PERCENTILE_DISC
examines the cumulative distribution values in each group until it finds one that is greater than or equal to the specified percentile. Vertica computes the percentile where, for each row, PERCENTILE_DISC
outputs the first value of the WITHIN GROUP(ORDER BY)
column whose CUME_DIST
(cumulative distribution) value is >= the argument FLOAT
value—for example, 0.4
:
PERCENTILE_DIST(0.4) WITHIN GROUP (ORDER BY salary) OVER(PARTITION BY deptno)...
Given the following query:
SELECT CUME_DIST() OVER(ORDER BY salary) FROM tablename;
The smallest CUME_DIST
value that is greater than 0.4 is also the PERCENTILE_DISC
.
Behavior type
Immutable
Syntax
PERCENTILE_DISC ( percentile ) WITHIN GROUP (
ORDER BY expression [ ASC  DESC ] ) OVER (
[ windowpartitionclause ] )
Parameters
percentile
 Percentile value, a
FLOAT
constant that ranges from 0 to 1 (inclusive).
WITHIN GROUP(ORDER BY
expression
)
 Specifies how to sort data within each group.
ORDER BY
takes only one column/expression that must be INTEGER
, FLOAT
, INTERVAL
, or NUMERIC
data type. NULL
values are discarded.
The WITHIN GROUP(ORDER BY)
clause does not guarantee the order of the SQL result. To order the final result , use the SQL
ORDER BY
clause set.
ASC  DESC
 Specifies the ordering sequence as ascending (default) or descending.
OVER()
 See Analytic Functions
Examples
This query computes the 20th percentile annual income by group for first 300 customers in Wisconsin and the District of Columbia.
=> SELECT customer_state, customer_key, annual_income,
PERCENTILE_DISC(.2) WITHIN GROUP(ORDER BY annual_income)
OVER (PARTITION BY customer_state) AS PERCENTILE_DISC
FROM customer_dimension
WHERE customer_state IN ('DC','WI')
AND customer_key < 300
ORDER BY customer_state, customer_key;
customer_state  customer_key  annual_income  PERCENTILE_DISC
+++
DC  104  658383  417092
DC  168  417092  417092
DC  245  670205  417092
WI  106  227279  227279
WI  127  703889  227279
WI  209  458607  227279
(6 rows)
See also
25  RANK [analytic]
Within each window partition, ranks all rows in the query results set according to the order specified by the window's ORDER BY clause.
Within each window partition, ranks all rows in the query results set according to the order specified by the window's ORDER BY
clause.
RANK
executes as follows:

Sorts partition rows as specified by the ORDER BY
clause.

Compares the ORDER BY
values of the preceding row and current row and ranks the current row as follows:

If ORDER BY
values are the same, the current row gets the same ranking as the preceding row.
Note
Null values are considered equal. For detailed information on how null values are sorted, see
NULL sort order.

If the ORDER BY
values are different, DENSE_RANK
increments or decrements the current row's ranking by 1, plus the number of consecutive duplicate values in the rows that precede it.
The largest rank value is the equal to the total number of rows returned by the query.
Behavior type
Immutable
Syntax
RANK() OVER (
[ windowpartitionclause ]
windoworderclause )
Parameters
OVER()
 See Analytic Functions
Compared with DENSE_RANK
RANK
can leave gaps in the ranking sequence, while
DENSE_RANK
does not.
Examples
The following query ranks by state all company customers that have been customers since 2007. In rows where the customer_since
dates are the same, RANK
assigns the rows equal ranking. When the customer_since
date changes, RANK
skips one or more rankings—for example, within CA
, from 12 to 14, and from 17 to 19.
=> SELECT customer_state, customer_name, customer_since,
RANK() OVER (PARTITION BY customer_state ORDER BY customer_since) AS rank
FROM customer_dimension WHERE customer_type='Company' AND customer_since > '01/01/2007'
ORDER BY customer_state;
customer_state  customer_name  customer_since  rank
+++
AZ  Foodshop  20070120  1
AZ  Goldstar  20070811  2
CA  Metahope  20070105  1
CA  Foodgen  20070205  2
CA  Infohope  20070209  3
CA  Foodcom  20070219  4
CA  Amerihope  20070222  5
CA  Infostar  20070305  6
CA  Intracare  20070314  7
CA  Infocare  20070407  8
...
CO  Goldtech  20070219  1
CT  Foodmedia  20070211  1
CT  Metatech  20070220  2
CT  Infocorp  20070410  3
...
See also
SQL analytics
26  ROW_NUMBER [analytic]
Assigns a sequence of unique numbers to each row in a partition, starting with 1.
Assigns a sequence of unique numbers to each row in a window partition, starting with 1. ROW_NUMBER and RANK are generally interchangeable, with the following differences:

ROW_NUMBER assigns a unique ordinal number to each row in the ordered set, starting with 1.

ROW_NUMBER() is a Vertica extension, while RANK conforms to the SQL99 standard.
Behavior type
Immutable
Syntax
ROW_NUMBER () OVER (
[ windowpartitionclause ]
[ windoworderclause ] )
Parameters
OVER()
 See Analytic Functions
Examples
The following ROW_NUMBER query partitions customers in the VMart table customer_dimension
by customer_regio
n. Within each partition, the function ranks those customers in order of seniority, as specified by its window order clause:
=> SELECT * FROM
(SELECT ROW_NUMBER() OVER (PARTITION BY customer_region ORDER BY customer_since) AS most_senior,
customer_region, customer_name, customer_since FROM public.customer_dimension WHERE customer_type = 'Individual') sq
WHERE most_senior <= 5;
most_senior  customer_region  customer_name  customer_since
+++
1  West  Jack Y. Perkins  19650101
2  West  Linda Q. Winkler  19650102
3  West  Marcus K. Li  19650103
4  West  Carla R. Jones  19650107
5  West  Seth P. Young  19650109
1  East  Kim O. Vu  19650101
2  East  Alexandra L. Weaver  19650102
3  East  Steve L. Webber  19650104
4  East  Thom Y. Li  19650105
5  East  Martha B. Farmer  19650107
1  SouthWest  Martha V. Gauthier  19650101
2  SouthWest  Jessica U. Goldberg  19650107
3  SouthWest  Robert O. Stein  19650107
4  SouthWest  Emily I. McCabe  19650118
5  SouthWest  Jack E. Miller  19650125
1  NorthWest  Julie O. Greenwood  19650108
2  NorthWest  Amy X. McNulty  19650125
3  NorthWest  Kevin S. Carcetti  19650209
4  NorthWest  Sam K. Carcetti  19650316
5  NorthWest  Alexandra X. Winkler  19650405
1  MidWest  Michael Y. Meyer  19650101
2  MidWest  Joanna W. Bauer  19650106
3  MidWest  Amy E. Harris  19650108
4  MidWest  Julie W. McCabe  19650109
5  MidWest  William . Peterson  19650109
1  South  Dean . Martin  19650101
2  South  Ruth U. Williams  19650102
3  South  Steve Y. Farmer  19650103
4  South  Mark V. King  19650108
5  South  Lucas Y. Young  19650110
(30 rows)
See also
27  STDDEV [analytic]
Computes the statistical sample standard deviation of the current row with respect to the group within a.
Computes the statistical sample standard deviation of the current row with respect to the group within a window. STDDEV_SAMP
returns the same value as the square root of the variance defined for the
VAR_SAMP
function:
STDDEV( expression ) = SQRT(VAR_SAMP( expression ))
When VAR_SAMP
returns NULL
, this function returns NULL
.
Note
The nonstandard function
STDDEV
is provided for compatibility with other databases. It is semantically identical to
STDDEV_SAMP
.
Behavior type
Immutable
Syntax
STDDEV ( expression ) OVER (
[ windowpartitionclause ]
[ windoworderclause ]
[ windowframeclause ] )
Parameters
expression
 Any
NUMERIC
data type or any nonnumeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument.
OVER()
 See Analytic Functions
Examples
The following example returns the standard deviations of salaries in the employee dimension table by job title Assistant Director:
=> SELECT employee_last_name, annual_salary,
STDDEV(annual_salary) OVER (ORDER BY hire_date) as "stddev"
FROM employee_dimension
WHERE job_title = 'Assistant Director';
employee_last_name  annual_salary  stddev
++
Bauer  85003  NaN
Reyes  91051  4276.58181261624
Overstreet  53296  20278.6923394976
Gauthier  97216  19543.7184537642
Jones  82320  16928.0764028285
Fortin  56166  18400.2738421652
Carcetti  71135  16968.9453554483
Weaver  74419  15729.0709901852
Stein  85689  15040.5909495309
McNulty  69423  14401.1524291943
Webber  99091  15256.3160166536
Meyer  74774  14588.6126417355
Garnett  82169  14008.7223268494
Roy  76974  13466.1270356647
Dobisz  83486  13040.4887828347
Martin  99702  13637.6804131055
Martin  73589  13299.2838158566
...
See also
28  STDDEV_POP [analytic]
Evaluates the statistical population standard deviation for each member of the group.
Computes the statistical population standard deviation and returns the square root of the population variance within a window. The STDDEV_POP()
return value is the same as the square root of the VAR_POP()
function:
STDDEV_POP( expression ) = SQRT(VAR_POP( expression ))
When VAR_POP
returns null, STDDEV_POP
returns null.
Behavior type
Immutable
Syntax
STDDEV_POP ( expression ) OVER (
[ windowpartitionclause ]
[ windoworderclause ]
[ windowframeclause ] )
Parameters
expression
 Any
NUMERIC
data type or any nonnumeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument.
OVER()
 See Analytic Functions.
Examples
The following example returns the population standard deviations of salaries in the employee dimension table by job title Assistant Director:
=> SELECT employee_last_name, annual_salary,
STDDEV_POP(annual_salary) OVER (ORDER BY hire_date) as "stddev_pop"
FROM employee_dimension WHERE job_title = 'Assistant Director';
employee_last_name  annual_salary  stddev_pop
++
Goldberg  61859  0
Miller  79582  8861.5
Goldberg  74236  7422.74712548456
Campbell  66426  6850.22125098891
Moore  66630  6322.08223926257
Nguyen  53530  8356.55480080699
Harris  74115  8122.72288970008
Lang  59981  8053.54776538731
Farmer  60597  7858.70140687825
Nguyen  78941  8360.63150784682
See also
29  STDDEV_SAMP [analytic]
Computes the statistical sample standard deviation of the current row with respect to the group within a.
Computes the statistical sample standard deviation of the current row with respect to the group within a window. STDDEV_SAM
's return value is the same as the square root of the variance defined for the VAR_SAMP
function:
STDDEV( expression ) = SQRT(VAR_SAMP( expression ))
When VAR_SAMP
returns NULL
, STDDEV_SAMP
returns NULL.
Note
STDDEV_SAMP()
is semantically identical to the nonstandard function,
STDDEV()
.
Behavior type
Immutable
Syntax
STDDEV_SAMP ( expression ) OVER (
[ windowpartitionclause ]
[ windoworderclause ]
[ windowframeclause ] )
Parameters
expression
 Any
NUMERIC
data type or any nonnumeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument..
OVER()
 See Analytic Functions
Examples
The following example returns the sample standard deviations of salaries in the employee
dimension table by job title Assistant Director:
=> SELECT employee_last_name, annual_salary,
STDDEV(annual_salary) OVER (ORDER BY hire_date) as "stddev_samp"
FROM employee_dimension WHERE job_title = 'Assistant Director';
employee_last_name  annual_salary  stddev_samp
++
Bauer  85003  NaN
Reyes  91051  4276.58181261624
Overstreet  53296  20278.6923394976
Gauthier  97216  19543.7184537642
Jones  82320  16928.0764028285
Fortin  56166  18400.2738421652
Carcetti  71135  16968.9453554483
Weaver  74419  15729.0709901852
Stein  85689  15040.5909495309
McNulty  69423  14401.1524291943
Webber  99091  15256.3160166536
Meyer  74774  14588.6126417355
Garnett  82169  14008.7223268494
Roy  76974  13466.1270356647
Dobisz  83486  13040.4887828347
...
See also
30  SUM [analytic]
Computes the sum of an expression over a group of rows within a.
Computes the sum of an expression over a group of rows within a window. It returns a DOUBLE PRECISION
value for a floatingpoint expression. Otherwise, the return value is the same as the expression data type.
Behavior type
Immutable
Syntax
SUM ( expression ) OVER (
[ windowpartitionclause ]
[ windoworderclause ]
[ windowframeclause ] )
Parameters
expression
 Any
NUMERIC
data type or any nonnumeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument.
OVER()
 See Analytic Functions
Overflow handling
If you encounter data overflow when using SUM
, use
SUM_FLOAT
which converts data to a floating point.
By default, Vertica allows silent numeric overflow when you call this function on numeric data types. For more information on this behavior and how to change it, seeNumeric data type overflow with SUM, SUM_FLOAT, and AVG.
Examples
The following query returns the cumulative sum all of the returns made to stores in January:
=> SELECT calendar_month_name AS month, transaction_type, sales_quantity,
SUM(sales_quantity)
OVER (PARTITION BY calendar_month_name ORDER BY date_dimension.date_key) AS SUM
FROM store.store_sales_fact JOIN date_dimension
USING(date_key) WHERE calendar_month_name IN ('January')
AND transaction_type= 'return';
month  transaction_type  sales_quantity  SUM
+++
January  return  7  651
January  return  3  651
January  return  7  651
January  return  7  651
January  return  7  651
January  return  3  651
January  return  7  651
January  return  5  651
January  return  1  651
January  return  6  651
January  return  6  651
January  return  3  651
January  return  9  651
January  return  7  651
January  return  6  651
January  return  8  651
January  return  7  651
January  return  2  651
January  return  4  651
January  return  5  651
January  return  7  651
January  return  8  651
January  return  4  651
January  return  10  651
January  return  6  651
...
See also
31  VAR_POP [analytic]
Returns the statistical population variance of a nonnull set of numbers (nulls are ignored) in a group within a.
Returns the statistical population variance of a nonnull set of numbers (nulls are ignored) in a group within a window. Results are calculated by the sum of squares of the difference of expression
from the mean of expression
, divided by the number of rows remaining:
(SUM( expression * expression )  SUM( expression ) * SUM( expression ) / COUNT( expression )) / COUNT( expression )
Behavior type
Immutable
Syntax
VAR_POP ( expression ) OVER (
[ windowpartitionclause ]
[ windoworderclause ]
[ windowframeclause ] )
Parameters
expression
 Any
NUMERIC
data type or any nonnumeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument
OVER()
 See Analytic Functions
Examples
The following example calculates the cumulative population in the store orders fact table of sales in January 2007:
=> SELECT date_ordered,
VAR_POP(SUM(total_order_cost))
OVER (ORDER BY date_ordered) "var_pop"
FROM store.store_orders_fact s
WHERE date_ordered BETWEEN '20070101' AND '20070131'
GROUP BY s.date_ordered;
date_ordered  var_pop
+
20070101  0
20070102  89870400
20070103  3470302472
20070104  4466755450.6875
20070105  3816904780.80078
20070106  25438212385.25
20070107  22168747513.1016
20070108  23445191012.7344
20070109  39292879603.1113
20070110  48080574326.9609
(10 rows)
See also
32  VAR_SAMP [analytic]
Returns the sample variance of a nonNULL set of numbers (NULL values in the set are ignored) for each row of the group within a.
Returns the sample variance of a nonNULL
set of numbers (NULL
values in the set are ignored) for each row of the group within a window. Results are calculated as follows:
(SUM( expression * expression )  SUM( expression ) * SUM( expression ) / COUNT( expression ) )
/ (COUNT( expression )  1 )
This function and
VARIANCE
differ in one way: given an input set of one element, VARIANCE
returns 0 and VAR_SAMP
returns NULL
.
Behavior type
Immutable
Syntax
VAR_SAMP ( expression ) OVER (
[ windowpartitionclause ]
[ windoworderclause ]
[ windowframeclause ] )
Parameters
expression
 Any
NUMERIC
data type or any nonnumeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument
OVER()
 See Analytic Functions
Null handling

VAR_SAMP
returns the sample variance of a set of numbers after it discards the NULL
values in the set.

If the function is applied to an empty set, then it returns NULL
.
Examples
The following example calculates the sample variance in the store orders fact table of sales in December 2007:
=> SELECT date_ordered,
VAR_SAMP(SUM(total_order_cost))
OVER (ORDER BY date_ordered) "var_samp"
FROM store.store_orders_fact s
WHERE date_ordered BETWEEN '20071201' AND '20071231'
GROUP BY s.date_ordered;
date_ordered  var_samp
+
20071201  NaN
20071202  90642601088
20071203  48030548449.3359
20071204  32740062504.2461
20071205  32100319112.6992
20071206  26274166814.668
20071207  23017490251.9062
20071208  21099374085.1406
20071209  27462205977.9453
20071210  26288687564.1758
(10 rows)
See also
33  VARIANCE [analytic]
Returns the sample variance of a nonNULL set of numbers (NULL values in the set are ignored) for each row of the group within a.
Returns the sample variance of a nonNULL
set of numbers (NULL
values in the set are ignored) for each row of the group within a window. Results are calculated as follows:
( SUM( expression * expression )  SUM( expression ) * SUM( expression ) / COUNT( expression )) / (COUNT( expression )  1 )
VARIANCE
returns the variance of expression
, which is calculated as follows:
Note
The nonstandard function
VARIANCE
is provided for compatibility with other databases. It is semantically identical to
VAR_SAMP
.
Behavior type
Immutable
Syntax
VAR_SAMP ( expression ) OVER (
[ windowpartitionclause ]
[ windoworderclause ]
[ windowframeclause ] )
Parameters
expression
 Any NUMERIC data type or any nonnumeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument.
OVER()
 See Analytic Functions
Examples
The following example calculates the cumulative variance in the store orders fact table of sales in December 2007:
=> SELECT date_ordered,
VARIANCE(SUM(total_order_cost))
OVER (ORDER BY date_ordered) "variance"
FROM store.store_orders_fact s
WHERE date_ordered BETWEEN '20071201' AND '20071231'
GROUP BY s.date_ordered;
date_ordered  variance
+
20071201  NaN
20071202  2259129762
20071203  1809012182.33301
20071204  35138165568.25
20071205  26644110029.3003
20071206  25943125234
20071207  23178202223.9048
20071208  21940268901.1431
20071209  21487676799.6108
20071210  21521358853.4331
(10 rows)
See also