This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Aggregate functions

All functions in this section that have an analytic function counterpart are appended with [Aggregate] to avoid confusion between the two.

Aggregate functions summarize data over groups of rows from a query result set. The groups are specified using the GROUP BY clause. They are allowed only in the select list and in the HAVING and ORDER BY clauses of a SELECT statement (as described in Aggregate expressions).

Notes

  • Except for COUNT, these functions return a null value when no rows are selected. In particular, SUM of no rows returns NULL, not zero.

  • In some cases you can replace an expression that includes multiple aggregates with an single aggregate of an expression. For example SUM(x) + SUM(y) can be expressed as as SUM(x+y) (where x and y are NOT NULL).

  • Vertica does not support nested aggregate functions.

You can also use some of the simple aggregate functions as analytic (window) functions. See Analytic functions for details. See also SQL analytics.

1 - APPROXIMATE_COUNT_DISTINCT

Returns the number of distinct non-NULL values in a data set.

Returns the number of distinct non-NULL values in a data set.

Behavior type

Immutable

Syntax

APPROXIMATE_COUNT_DISTINCT ( expression[, error-tolerance ] )

Parameters

expression
Value to be evaluated using any data type that supports equality comparison.
error-tolerance

Numeric value that represents the desired percentage of error tolerance, distributed around the value returned by this function. The smaller the error tolerance, the closer the approximation.

You can set error-tolerance to a minimum value of 0.88. Vertica imposes no maximum restriction, but any value greater than 5 is implemented with 5% error tolerance.

If you omit this argument, Vertica uses an error tolerance of 1.25(%).

Restrictions

APPROXIMATE_COUNT_DISTINCT and DISTINCT aggregates cannot be in the same query block.

Error tolerance

APPROXIMATE_COUNT_DISTINCT(x, error-tolerance) returns a value equal to COUNT(DISTINCT x), with an error that is lognormally distributed with standard deviation.

Parameter error-tolerance is optional. Supply this argument to specify the desired standard deviation. error-tolerance is defined as 2.17 standard deviations, which corresponds to a 97 percent confidence interval:

standard-deviation = error-tolerance / 2.17

For example:

  • error-tolerance = 1

    Default setting, corresponds to a standard deviation

    97 percent of the time, APPROXIMATE_COUNT_DISTINCT(x,5) returns a value between:

    • COUNT(DISTINCT x) * 0.99

    • COUNT(DISTINCT x) * 1.01

  • error-tolerance = 5

    97 percent of the time, APPROXIMATE_COUNT_DISTINCT(x) returns a value between:

    • COUNT(DISTINCT x) * 0.95

    • COUNT(DISTINCT x) * 1.05

A 99 percent confidence interval corresponds to 2.58 standard deviations. To set error-tolerance confidence level corresponding to 99 (instead of a 97) percent , multiply error-tolerance by 2.17 / 2.58 = 0.841.

For example, if you specify *error-tolerance* as 5 * 0.841 = 4.2, APPROXIMATE_COUNT_DISTINCT(x,4.2) returns values 99 percent of the time between:

  • COUNT (DISTINCT x) * 0.95

  • COUNT (DISTINCT x) * 1.05

Examples

Count the total number of distinct values in column product_key from table store.store_sales_fact:

=> SELECT COUNT(DISTINCT product_key) FROM store.store_sales_fact;
 COUNT
-------
 19982
(1 row)

Count the approximate number of distinct values in product_key with various error tolerances. The smaller the error tolerance, the closer the approximation:


=> SELECT APPROXIMATE_COUNT_DISTINCT(product_key,5) AS five_pct_accuracy,
   APPROXIMATE_COUNT_DISTINCT(product_key,1) AS one_pct_accuracy,
   APPROXIMATE_COUNT_DISTINCT(product_key,.88) AS point_eighteight_pct_accuracy
   FROM store.store_sales_fact;

 five_pct_accuracy | one_pct_accuracy | point_eighteight_pct_accuracy
-------------------+------------------+-------------------------------
             19431 |            19921 |                         19921
(1 row)

See also

Approximate count distinct functions

2 - APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS

Calculates the number of distinct non-NULL values from the synopsis objects created by APPROXIMATE_COUNT_DISTINCT_SYNOPSIS.

Calculates the number of distinct non-NULL values from the synopsis objects created by APPROXIMATE_COUNT_DISTINCT_SYNOPSIS.

Behavior type

Immutable

Syntax

APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS ( synopsis-obj[, error-tolerance ] )

Parameters

synopsis-obj
A synopsis object created by APPROXIMATE_COUNT_DISTINCT_SYNOPSIS.
error-tolerance

Numeric value that represents the desired percentage of error tolerance, distributed around the value returned by this function. The smaller the error tolerance, the closer the approximation.

You can set error-tolerance to a minimum value of 0.88. Vertica imposes no maximum restriction, but any value greater than 5 is implemented with 5% error tolerance.

If you omit this argument, Vertica uses an error tolerance of 1.25(%).

For more details, see APPROXIMATE_COUNT_DISTINCT.

Restrictions

APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS and DISTINCT aggregates cannot be in the same query block.

Examples

The following examples review and compare different ways to obtain a count of unique values in a table column:

Return an exact count of unique values in column product_key, from table store.store_sales_fact:

=> \timing
Timing is on.
=> SELECT COUNT(DISTINCT product_key) from store.store_sales_fact;
 count
-------
 19982
(1 row)

Time: First fetch (1 row): 553.033 ms. All rows formatted: 553.075 ms

Return an approximate count of unique values in column product_key:

=> SELECT APPROXIMATE_COUNT_DISTINCT(product_key) as unique_product_keys
   FROM store.store_sales_fact;
 unique_product_keys
---------------------
               19921
(1 row)

Time: First fetch (1 row): 394.562 ms. All rows formatted: 394.600 ms

Create a synopsis object that represents a set of store.store_sales_fact data with unique product_key values, store the synopsis in the new table my_summary:


=> CREATE TABLE my_summary AS SELECT APPROXIMATE_COUNT_DISTINCT_SYNOPSIS (product_key) syn
   FROM store.store_sales_fact;
CREATE TABLE
Time: First fetch (0 rows): 582.662 ms. All rows formatted: 582.682 ms

Return a count from the saved synopsis:


=> SELECT APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS(syn) FROM my_summary;
 ApproxCountDistinctOfSynopsis
-------------------------------
                         19921
(1 row)

Time: First fetch (1 row): 105.295 ms. All rows formatted: 105.335 ms

See also

Approximate count distinct functions

3 - APPROXIMATE_COUNT_DISTINCT_SYNOPSIS

Summarizes the information of distinct non-NULL values and materializes the result set in a VARBINARY or LONG VARBINARY synopsis object.

Summarizes the information of distinct non-NULL values and materializes the result set in a VARBINARY or LONG VARBINARY synopsis object. The calculated result is within a specified range of error tolerance. You save the synopsis object in a Vertica table for use by APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS.

Behavior type

Immutable

Syntax

APPROXIMATE_COUNT_DISTINCT_SYNOPSIS ( expression[, error-tolerance] )

Parameters

expression
Value to evaluate using any data type that supports equality comparison.
error-tolerance

Numeric value that represents the desired percentage of error tolerance, distributed around the value returned by this function. The smaller the error tolerance, the closer the approximation.

You can set error-tolerance to a minimum value of 0.88. Vertica imposes no maximum restriction, but any value greater than 5 is implemented with 5% error tolerance.

If you omit this argument, Vertica uses an error tolerance of 1.25(%).

For more details, see APPROXIMATE_COUNT_DISTINCT.

Restrictions

APPROXIMATE_COUNT_DISTINCT_SYNOPSIS and DISTINCT aggregates cannot be in the same query block.

Examples

See APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS.

See also

Approximate count distinct functions

4 - APPROXIMATE_COUNT_DISTINCT_SYNOPSIS_MERGE

Aggregates multiple synopses into one new synopsis.

Aggregates multiple synopses into one new synopsis. This function is similar to APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS but returns one synopsis instead of the count estimate. The benefit of this function is that it speeds up final estimation when calling APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS.

For example, if you need to regularly estimate count distinct of users for a long period of time (such as several years) you can pre-accumulate synopses of days into one synopsis for a year.

Behavior type

Immutable

Syntax

APPROXIMATE_COUNT_DISTINCT_SYNOPSIS_MERGE ( synopsis-obj [, error-tolerance] )

Parameters

synopsis-obj
An expression that can be evaluated to one or more synopses. Typically a synopsis-obj is generated as a binary string by either the APPROXIMATE_COUNT_DISTINCT or APPROXIMATE_COUNT_DISTINCT_SYNOPSIS_MERGE function and is stored in a table column of type VARBINARY or LONG VARBINARY.
error-tolerance

Numeric value that represents the desired percentage of error tolerance, distributed around the value returned by this function. The smaller the error tolerance, the closer the approximation.

You can set error-tolerance to a minimum value of 0.88. Vertica imposes no maximum restriction, but any value greater than 5 is implemented with 5% error tolerance.

If you omit this argument, Vertica uses an error tolerance of 1.25(%).

For more details, see APPROXIMATE_COUNT_DISTINCT.

Examples

See Approximate count distinct functions.

5 - APPROXIMATE_MEDIAN [aggregate]

Computes the approximate median of an expression over a group of rows.

Computes the approximate median of an expression over a group of rows. The function returns a FLOAT value.

APPROXIMATE_MEDIAN is an alias of APPROXIMATE_PERCENTILE [aggregate] with a parameter of 0.5.

Behavior type

Immutable

Syntax

APPROXIMATE_MEDIAN ( expression )

Parameters

expression
Any FLOAT or INTEGER data type. The function returns the approximate middle value or an interpolated value that would be the approximate middle value once the values are sorted. Null values are ignored in the calculation.

Examples

The following examples uses this table:

CREATE TABLE allsales(state VARCHAR(20), name VARCHAR(20), sales INT) ORDER BY state;
INSERT INTO allsales VALUES('MA', 'A', 60);
INSERT INTO allsales VALUES('NY', 'B', 20);
INSERT INTO allsales VALUES('NY', 'C', 15);
INSERT INTO allsales VALUES('MA', 'D', 20);
INSERT INTO allsales VALUES('MA', 'E', 50);
INSERT INTO allsales VALUES('NY', 'F', 40);
INSERT INTO allsales VALUES('MA', 'G', 10);
COMMIT;

Calculate the approximate median of all sales in this table:

=> SELECT APPROXIMATE_MEDIAN (sales) FROM allsales;
APROXIMATE_MEDIAN
--------------------
                 20
(1 row)

Modify the query to group sales by state, and obtain the approximate median for each one:

=> SELECT state, APPROXIMATE_MEDIAN(sales) FROM allsales GROUP BY state;
 state | APPROXIMATE_MEDIAN
-------+--------------------
 MA    |                 35
 NY    |                 20
(2 rows)

See also

6 - APPROXIMATE_PERCENTILE [aggregate]

Computes the approximate percentile of an expression over a group of rows.

Computes the approximate percentile of an expression over a group of rows. This function returns a FLOAT value.

Behavior type

Immutable

Syntax

APPROXIMATE_PERCENTILE ( column-expression USING PARAMETERS percentiles='percentile-values' )

Arguments

column-expression
A column of FLOAT or INTEGER data types whose percentiles will be calculated. NULL values are ignored.

Parameters

percentiles
One or more (up to 1000) comma-separated FLOAT constants ranging from 0 to 1 inclusive, specifying the percentile values to be calculated.

Examples

The following examples use this table:

=> CREATE TABLE allsales(state VARCHAR(20), name VARCHAR(20), sales INT) ORDER BY state;
INSERT INTO allsales VALUES('MA', 'A', 60);
INSERT INTO allsales VALUES('NY', 'B', 20);
INSERT INTO allsales VALUES('NY', 'C', 15);
INSERT INTO allsales VALUES('MA', 'D', 20);
INSERT INTO allsales VALUES('MA', 'E', 50);
INSERT INTO allsales VALUES('NY', 'F', 40);
INSERT INTO allsales VALUES('MA', 'G', 10);
COMMIT;

=> SELECT * FROM allsales;
 state | name | sales
-------+------+-------
 MA    | A    |    60
 NY    | B    |    20
 NY    | C    |    15
 NY    | F    |    40
 MA    | D    |    20
 MA    | E    |    50
 MA    | G    |    10
(7 rows)

Calculate the approximate percentile for sales in each state:

=> SELECT state, APPROXIMATE_PERCENTILE(sales USING PARAMETERS percentiles='0.5') AS median
FROM allsales GROUP BY state;
 state | median
-------+--------
 MA    |     35
 NY    |     20
(2 rows)

Calculate multiple approximate percentiles for sales in each state:

=> SELECT state, APPROXIMATE_PERCENTILE(sales USING PARAMETERS percentiles='0.5,1.0')
FROM allsales GROUP BY state;
 state | APPROXIMATE_PERCENTILE
-------+--------
 MA    |     [35.0,60.0]
 NY    |     [20.0,40.0]
(2 rows)

Calculate multiple approximate percentiles for sales in each state and show results for each percentile in separate columns:

=> SELECT ps[0] as q0, ps[1] as q1, ps[2] as q2, ps[3] as q3, ps[4] as q4
FROM (SELECT APPROXIMATE_PERCENTILE(sales USING PARAMETERS percentiles='0, 0.25, 0.5, 0.75, 1')
AS ps FROM allsales GROUP BY state) as s1;
  q0  |  q1  |  q2  |  q3  |  q4
------+------+------+------+------
 10.0 | 17.5 | 35.0 | 52.5 | 60.0
 15.0 | 17.5 | 20.0 | 30.0 | 40.0
(2 rows)

See also

7 - APPROXIMATE_QUANTILES

Computes an array of weighted, approximate percentiles of a column within some user-specified error.

Computes an array of weighted, approximate percentiles of a column within some user-specified error. This algorithm is similar to APPROXIMATE_PERCENTILE [aggregate], which instead returns a single percentile.

The performance of this function depends entirely on the specified epsilon and the size of the provided array.

The OVER clause for this function must be empty.

Behavior type

Immutable

Syntax

APPROXIMATE_QUANTILES ( column USING PARAMETERS [nquantiles=n], [epsilon=error] ) OVER() FROM table

Parameters

column
The INTEGER or FLOAT column for which to calculate the percentiles. NULL values are ignored.
n
An integer that specifies the number of desired quantiles in the returned array.

Default: 11

error
The allowed error for any returned percentile. Specifically, for an array of size N, the specified error ε (epsilon) for the φ-quantile guarantees that the rank r of the return value with respect to the rank ⌊φN⌋ of the exact value is such that:

⌊(φ-ε)N⌋ ≤ r ≤ ⌊(φ+ε)N⌋

For n quantiles, if the error ε is specified such that ε > 1/n, this function will return non-deterministic results.

Default: 0.001

table
The table containing column.

Examples

The following example uses this table:

=> CREATE TABLE allsales(state VARCHAR(20), name VARCHAR(20), sales INT) ORDER BY state;
INSERT INTO allsales VALUES('MA', 'A', 60);
INSERT INTO allsales VALUES('NY', 'B', 20);
INSERT INTO allsales VALUES('NY', 'C', 15);
INSERT INTO allsales VALUES('MA', 'D', 20);
INSERT INTO allsales VALUES('MA', 'E', 50);
INSERT INTO allsales VALUES('NY', 'F', 40);
INSERT INTO allsales VALUES('MA', 'G', 10);
COMMIT;

=> SELECT * FROM allsales;
 state | name | sales
-------+------+-------
 MA    | A    |    60
 NY    | B    |    20
 NY    | C    |    15
 NY    | F    |    40
 MA    | D    |    20
 MA    | E    |    50
 MA    | G    |    10
(7 rows)

This call to APPROXIMATE_QUANTILES returns a 6-element array of approximate percentiles, one for each quantile. Each quantile relates to the percentile by a factor of 100. For example, the second entry in the output indicates that 15 is the 0.2-quantile of the input column, so 15 is the 20th percentile of the input column.

=> SELECT APPROXIMATE_QUANTILES(sales USING PARAMETERS nquantiles=6) OVER() FROM allsales;
 Quantile | Value
----------+-------
        0 |    10
      0.2 |    15
      0.4 |    20
      0.6 |    40
      0.8 |    50
        1 |    60
(6 rows)

8 - ARGMAX_AGG

Takes two arguments target and arg, where both are columns or column expressions in the queried dataset.

Takes two arguments target and arg, where both are columns or column expressions in the queried dataset. ARGMAX_AGG finds the row with the highest non-null value in target and returns the value of arg in that row. If multiple rows contain the highest target value, ARGMAX_AGG returns arg from the first row that it finds. Use the WITHIN GROUP ORDER BY clause to control which row ARGMAX_AGG finds first.

Behavior type

Immutable if the WITHIN GROUP ORDER BY clause specifies a column or set of columns that resolves to unique values within the group; otherwise Volatile.

Syntax

ARGMAX_AGG ( target, arg ) [ within-group-order-by-clause ]

Arguments

target, arg
Columns in the queried dataset.
within-group-order-by-clause
Sorts target values within each group of rows:
WITHIN GROUP (ORDER BY { column-expression[ sort-qualifiers ] }[,...])

sort-qualifiers:

   { ASC | DESC [ NULLS { FIRST | LAST | AUTO } ] }

Use this clause to determine which row is returned when multiple rows contain the highest target value; otherwise, results are likely to vary with each iteration of the same query.

Examples

The following example calls ARGMAX_AGG in a WITH clause to find which employees in each region are at or near retirement age. If multiple employees within each region have the same age, ARGMAX_AGG chooses the employees with the highest salary level and returns with their IDs. The primary query returns with details on the employees selected from each region:

=> WITH r AS (SELECT employee_region, ARGMAX_AGG(employee_age, employee_key)
       WITHIN GROUP (ORDER BY annual_salary DESC) emp_id
       FROM employee_dim GROUP BY employee_region ORDER BY employee_region)
    SELECT r.employee_region, ed.annual_salary AS highest_salary, employee_key,
       ed.employee_first_name||' '||ed.employee_last_name AS employee_name, ed.employee_age
       FROM r JOIN employee_dim ed ON r.emp_id = ed.employee_key ORDER BY ed.employee_region;
         employee_region          | highest_salary | employee_key |  employee_name   | employee_age
----------------------------------+----------------+--------------+------------------+--------------
 East                             |         927335 |           70 | Sally Gauthier   |           65
 MidWest                          |         177716 |          869 | Rebecca McCabe   |           65
 NorthWest                        |         100300 |         7597 | Kim Jefferson    |           65
 South                            |         196454 |          275 | Alexandra Harris |           65
 SouthWest                        |         198669 |         1043 | Seth Stein       |           65
 West                             |         197203 |          681 | Seth Jones       |           65
(6 rows)

See also

ARGMIN_AGG

9 - ARGMIN_AGG

Takes two arguments target and arg, where both are columns or column expressions in the queried dataset.

Takes two arguments target and arg, where both are columns or column expressions in the queried dataset. ARGMIN_AGG finds the row with the lowest non-null value in target and returns the value of arg in that row. If multiple rows contain the lowest target value, ARGMIN_AGG returns arg from the first row that it finds. Use the WITHIN GROUP ORDER BY clause to control which row ARGMMIN_AGG finds first.

Behavior type

Immutable if the WITHIN GROUP ORDER BY clause specifies a column or set of columns that resolves to unique values within the group; otherwise Volatile.

Syntax

ARGMIN_AGG ( target, arg ) [ within-group-order-by-clause ]

Arguments

target, arg
Columns in the queried dataset.
within-group-order-by-clause
Sorts target values within each group of rows:
WITHIN GROUP (ORDER BY { column-expression[ sort-qualifiers ] }[,...])

sort-qualifiers:

   { ASC | DESC [ NULLS { FIRST | LAST | AUTO } ] }

Use this clause to determine which row is returned when multiple rows contain the lowest target value; otherwise, results are likely to vary with each iteration of the same query.

Examples

The following example calls ARGMIN_AGG in a WITH clause to find the lowest salary among all employees in each region, and returns with the lowest-paid employee IDs. The primary query returns with the salary amounts and employee names:

=> WITH msr (employee_region, emp_id) AS
    (SELECT employee_region, argmin_agg(annual_salary, employee_key) lowest_paid_employee FROM employee_dim GROUP BY employee_region)
    SELECT msr.employee_region, ed.annual_salary AS lowest_salary, ed.employee_first_name||' '||ed.employee_last_name AS employee_name
     FROM msr JOIN employee_dim ed ON msr.emp_id = ed.employee_key ORDER BY annual_salary DESC;
         employee_region          | lowest_salary |  employee_name
----------------------------------+---------------+-----------------
 NorthWest                        |         20913 | Raja Garnett
 SouthWest                        |         20750 | Seth Moore
 West                             |         20443 | Midori Taylor
 South                            |         20363 | David Bauer
 East                             |         20306 | Craig Jefferson
 MidWest                          |         20264 | Dean Vu
(6 rows)

See also

ARGMAX_AGG

10 - AVG [aggregate]

Computes the average (arithmetic mean) of an expression over a group of rows.

Computes the average (arithmetic mean) of an expression over a group of rows. AVG always returns a DOUBLE PRECISION value.

The AVG aggregate function differs from the [AVG](/en/sql-reference/functions/analytic-functions/avg-analytic/) analytic function, which computes the average of an expression over a group of rows within a window.

Behavior type

Immutable

Syntax

AVG ( [ ALL | DISTINCT ] expression )

Parameters

ALL
Invokes the aggregate function for all rows in the group (default).
DISTINCT
Invokes the aggregate function for all distinct non-null values of the expression found in the group.
expression
The value whose average is calculated over a set of rows, any expression that can have a DOUBLE PRECISION result.

Overflow handling

By default, Vertica allows silent numeric overflow when you call this function on numeric data types. For more information on this behavior and how to change it, seeNumeric data type overflow with SUM, SUM_FLOAT, and AVG.

Examples

The following query returns the average income from the customer table:

=> SELECT AVG(annual_income) FROM customer_dimension;
     AVG
--------------
 2104270.6485
(1 row)

See also

11 - BIT_AND

Takes the bitwise AND of all non-null input values.

Takes the bitwise AND of all non-null input values. If the input parameter is NULL, the return value is also NULL.

Behavior type

Immutable

Syntax

BIT_AND ( expression )

Parameters

expression
The BINARY or VARBINARY input value to evaluate. BIT_AND operates on VARBINARY types explicitly and on BINARY types implicitly through casts.

Returns

BIT_AND returns:

  • The same value as the argument data type.

  • 1 for each bit compared, if all bits are 1; otherwise 0.

If the columns are different lengths, the return values are treated as though they are all equal in length and are right-extended with zero bytes. For example, given a group containing hex values ff, null, and f, BIT_AND ignores the null value and extends the value f to f0.

Examples

The example that follows uses table t with a single column of VARBINARY data type:

=> CREATE TABLE t ( c VARBINARY(2) );
=> INSERT INTO t values(HEX_TO_BINARY('0xFF00'));
=> INSERT INTO t values(HEX_TO_BINARY('0xFFFF'));
=> INSERT INTO t values(HEX_TO_BINARY('0xF00F'));

Query table t to see column c output:

=> SELECT TO_HEX(c) FROM t;
 TO_HEX
--------
 ff00
 ffff
 f00f
(3 rows)

Query table t to get the AND value for column c:

=> SELECT TO_HEX(BIT_AND(c)) FROM t;
 TO_HEX
--------
 f000
(1 row)

The function is applied pairwise to all values in the group, resulting in f000, which is determined as follows:

  1. ff00 (record 1) is compared with ffff (record 2), which results in ff00.

  2. The result from the previous comparison is compared with f00f (record 3), which results in f000.

See also

Binary data types (BINARY and VARBINARY)

12 - BIT_OR

Takes the bitwise OR of all non-null input values.

Takes the bitwise OR of all non-null input values. If the input parameter is NULL, the return value is also NULL.

Behavior type

Immutable

Syntax

BIT_OR ( expression )

Parameters

expression
The BINARY or VARBINARY input value to evaluate. BIT_OR operates on VARBINARY types explicitly and on BINARY types implicitly through casts.

Returns

BIT_OR returns:

  • The same value as the argument data type.

  • 1 for each bit compared, if any bit is 1; otherwise 0.

If the columns are different lengths, the return values are treated as though they are all equal in length and are right-extended with zero bytes. For example, given a group containing hex values ff, null, and f, the function ignores the null value and extends the value f to f0.

Examples

The example that follows uses table t with a single column of VARBINARY data type:

=> CREATE TABLE t ( c VARBINARY(2) );
=> INSERT INTO t values(HEX_TO_BINARY('0xFF00'));
=> INSERT INTO t values(HEX_TO_BINARY('0xFFFF'));
=> INSERT INTO t values(HEX_TO_BINARY('0xF00F'));

Query table t to see column c output:

=> SELECT TO_HEX(c) FROM t;
 TO_HEX
--------
 ff00
 ffff
 f00f
(3 rows)

Query table t to get the OR value for column c:

=> SELECT TO_HEX(BIT_OR(c)) FROM t;
 TO_HEX
--------
 ffff
(1 row)

The function is applied pairwise to all values in the group, resulting in ffff, which is determined as follows:

  1. ff00 (record 1) is compared with ffff, which results in ffff.

  2. The ff00 result from the previous comparison is compared with f00f (record 3), which results in ffff.

See also

Binary data types (BINARY and VARBINARY)

13 - BIT_XOR

Takes the bitwise XOR of all non-null input values.

Takes the bitwise XOR of all non-null input values. If the input parameter is NULL, the return value is also NULL.

Behavior type

Immutable

Syntax

BIT_XOR ( expression )

Parameters

expression
The BINARY or VARBINARY input value to evaluate. BIT_XOR operates on VARBINARY types explicitly and on BINARY types implicitly through casts.

Returns

BIT_XOR returns:

  • The same value as the argument data type.

  • 1 for each bit compared, if there are an odd number of arguments with set bits; otherwise 0.

If the columns are different lengths, the return values are treated as though they are all equal in length and are right-extended with zero bytes. For example, given a group containing hex values ff, null, and f, the function ignores the null value and extends the value f to f0.

Examples

First create a sample table and projections with binary columns:

The example that follows uses table t with a single column of VARBINARY data type:

=> CREATE TABLE t ( c VARBINARY(2) );
=> INSERT INTO t values(HEX_TO_BINARY('0xFF00'));
=> INSERT INTO t values(HEX_TO_BINARY('0xFFFF'));
=> INSERT INTO t values(HEX_TO_BINARY('0xF00F'));

Query table t to see column c output:

=> SELECT TO_HEX(c) FROM t;
 TO_HEX
--------
 ff00
 ffff
 f00f
(3 rows)

Query table t to get the XOR value for column c:

=> SELECT TO_HEX(BIT_XOR(c)) FROM t;
 TO_HEX
--------
 f0f0
(1 row)

See also

Binary data types (BINARY and VARBINARY)

14 - BOOL_AND [aggregate]

Processes Boolean values and returns a Boolean value result.

Processes Boolean values and returns a Boolean value result. If all input values are true, BOOL_AND returns t. Otherwise it returns f (false).

Behavior type

Immutable

Syntax

BOOL_AND ( expression )

Parameters

expression
A Boolean data type or any non-Boolean data type that can be implicitly coerced to a Boolean data type.

Examples

The following example shows how to use aggregate functions BOOL_AND, BOOL_OR, and BOOL_XOR. The sample table mixers includes columns for models and colors.

=> CREATE TABLE mixers(model VARCHAR(20), colors VARCHAR(20));
CREATE TABLE

Insert sample data into the table. The sample adds two color fields for each model.

=> INSERT INTO mixers
SELECT 'beginner', 'green'
UNION ALL
SELECT 'intermediate', 'blue'
UNION ALL
SELECT 'intermediate', 'blue'
UNION ALL
SELECT 'advanced', 'green'
UNION ALL
SELECT 'advanced', 'blue'
UNION ALL
SELECT 'professional', 'blue'
UNION ALL
SELECT 'professional', 'green'
UNION ALL
SELECT 'beginner', 'green';
 OUTPUT
--------
      8
(1 row)

Query the table. The result shows models that have two blue (BOOL_AND), one or two blue (BOOL_OR), and specifically not more than one blue (BOOL_XOR) mixer.

=> SELECT model,
BOOL_AND(colors= 'blue')AS two_blue,
BOOL_OR(colors= 'blue')AS one_or_two_blue,
BOOL_XOR(colors= 'blue')AS specifically_not_more_than_one_blue
FROM mixers
GROUP BY model;

    model     | two_blue | one_or_two_blue | specifically_not_more_than_one_blue
--------------+----------+-----------------+-------------------------------------
 advanced     | f        | t               | t
 beginner     | f        | f               | f
 intermediate | t        | t               | f
 professional | f        | t               | t
(4 rows)

See also

15 - BOOL_OR [aggregate]

Processes Boolean values and returns a Boolean value result.

Processes Boolean values and returns a Boolean value result. If at least one input value is true, BOOL_OR returns t. Otherwise, it returns f.

Behavior type

Immutable

Syntax

BOOL_OR ( expression )

Parameters

expression
A Boolean data type or any non-Boolean data type that can be implicitly coerced to a Boolean data type.

Examples

The following example shows how to use aggregate functions BOOL_AND, BOOL_OR, and BOOL_XOR. The sample table mixers includes columns for models and colors.

=> CREATE TABLE mixers(model VARCHAR(20), colors VARCHAR(20));
CREATE TABLE

Insert sample data into the table. The sample adds two color fields for each model.

=> INSERT INTO mixers
SELECT 'beginner', 'green'
UNION ALL
SELECT 'intermediate', 'blue'
UNION ALL
SELECT 'intermediate', 'blue'
UNION ALL
SELECT 'advanced', 'green'
UNION ALL
SELECT 'advanced', 'blue'
UNION ALL
SELECT 'professional', 'blue'
UNION ALL
SELECT 'professional', 'green'
UNION ALL
SELECT 'beginner', 'green';
 OUTPUT
--------
      8
(1 row)

Query the table. The result shows models that have two blue (BOOL_AND), one or two blue (BOOL_OR), and specifically not more than one blue (BOOL_XOR) mixer.

=> SELECT model,
BOOL_AND(colors= 'blue')AS two_blue,
BOOL_OR(colors= 'blue')AS one_or_two_blue,
BOOL_XOR(colors= 'blue')AS specifically_not_more_than_one_blue
FROM mixers
GROUP BY model;

    model     | two_blue | one_or_two_blue | specifically_not_more_than_one_blue
--------------+----------+-----------------+-------------------------------------
 advanced     | f        | t               | t
 beginner     | f        | f               | f
 intermediate | t        | t               | f
 professional | f        | t               | t
(4 rows)

See also

16 - BOOL_XOR [aggregate]

Processes Boolean values and returns a Boolean value result.

Processes Boolean values and returns a Boolean value result. If specifically only one input value is true, BOOL_XOR returns t. Otherwise, it returns f.

Behavior type

Immutable

Syntax

BOOL_XOR ( expression )

Parameters

expression
A Boolean data type or any non-Boolean data type that can be implicitly coerced to a Boolean data type.

Examples

The following example shows how to use aggregate functions BOOL_AND, BOOL_OR, and BOOL_XOR. The sample table mixers includes columns for models and colors.

=> CREATE TABLE mixers(model VARCHAR(20), colors VARCHAR(20));
CREATE TABLE

Insert sample data into the table. The sample adds two color fields for each model.

=> INSERT INTO mixers
SELECT 'beginner', 'green'
UNION ALL
SELECT 'intermediate', 'blue'
UNION ALL
SELECT 'intermediate', 'blue'
UNION ALL
SELECT 'advanced', 'green'
UNION ALL
SELECT 'advanced', 'blue'
UNION ALL
SELECT 'professional', 'blue'
UNION ALL
SELECT 'professional', 'green'
UNION ALL
SELECT 'beginner', 'green';
 OUTPUT
--------
      8
(1 row)

Query the table. The result shows models that have two blue (BOOL_AND), one or two blue (BOOL_OR), and specifically not more than one blue (BOOL_XOR) mixer.

=> SELECT model,
BOOL_AND(colors= 'blue')AS two_blue,
BOOL_OR(colors= 'blue')AS one_or_two_blue,
BOOL_XOR(colors= 'blue')AS specifically_not_more_than_one_blue
FROM mixers
GROUP BY model;

    model     | two_blue | one_or_two_blue | specifically_not_more_than_one_blue
--------------+----------+-----------------+-------------------------------------
 advanced     | f        | t               | t
 beginner     | f        | f               | f
 intermediate | t        | t               | f
 professional | f        | t               | t
(4 rows)

See also

17 - CORR

Returns the DOUBLE PRECISION coefficient of correlation of a set of expression pairs, as per the Pearson correlation coefficient.

Returns the DOUBLE PRECISION coefficient of correlation of a set of expression pairs, as per the Pearson correlation coefficient. CORR eliminates expression pairs where either expression in the pair is NULL. If no rows remain, the function returns NULL.

Syntax

CORR ( expression1, expression2 )

Parameters

expression1
The dependent DOUBLE PRECISION expression
expression2
The independent DOUBLE PRECISION expression

Examples

=> SELECT CORR (Annual_salary, Employee_age) FROM employee_dimension;
         CORR
----------------------
 -0.00719153413192422
(1 row)

18 - COUNT [aggregate]

Returns as a BIGINT the number of rows in each group where the expression is not NULL.

Returns as a BIGINT the number of rows in each group where the expression is not NULL. If the query has no GROUP BY clause, COUNT returns the number of table rows.

The COUNT aggregate function differs from the COUNT analytic function, which returns the number over a group of rows within a window.

Behavior type

Immutable

Syntax

COUNT ( [ * ] [ ALL | DISTINCT ] expression )

Parameters

*
Specifies to count all rows in the specified table or each group.
ALL | DISTINCT
Specifies how to count rows where expression has a non-null value:
  • ALL (default): Counts all rows where expression evaluates to a non-null value.

  • DISTINCT: Counts all rows where expression evaluates to a distinct non-null value.

expression
The column or expression whose non-null values are counted.

Examples

The following query returns the number of distinct values in the primary_key column of the date_dimension table:

=> SELECT COUNT (DISTINCT date_key) FROM date_dimension;

 COUNT
-------
  1826
(1 row)

This example returns all distinct values of evaluating the expression x+y for all inventory_fact records.

=> SELECT COUNT (DISTINCT date_key + product_key) FROM inventory_fact;

 COUNT
-------
 21560
(1 row)

You can create an equivalent query using the LIMIT keyword to restrict the number of rows returned:

=> SELECT COUNT(date_key + product_key) FROM inventory_fact GROUP BY date_key LIMIT 10;

 COUNT
-------
   173
    31
   321
   113
   286
    84
   244
   238
   145
   202
(10 rows)

This query returns the number of distinct values of date_key in all records with the specific distinct product_key value.

=> SELECT product_key, COUNT (DISTINCT date_key)  FROM  inventory_fact
   GROUP BY product_key LIMIT 10;

 product_key | count
-------------+-------
           1 |    12
           2 |    18
           3 |    13
           4 |    17
           5 |    11
           6 |    14
           7 |    13
           8 |    17
           9 |    15
          10 |    12
(10 rows)

This query counts each distinct product_key value in inventory_fact table with the constant 1.

=> SELECT product_key, COUNT (DISTINCT product_key) FROM inventory_fact
   GROUP BY product_key LIMIT 10;

 product_key | count
-------------+-------
           1 |     1
           2 |     1
           3 |     1
           4 |     1
           5 |     1
           6 |     1
           7 |     1
           8 |     1
           9 |     1
          10 |     1
(10 rows)

This query selects each distinct date_key value and counts the number of distinct product_key values for all records with the specific product_key value. It then sums the qty_in_stock values in all records with the specific product_key value and groups the results by date_key.

=> SELECT date_key, COUNT (DISTINCT product_key), SUM(qty_in_stock) FROM inventory_fact
   GROUP BY date_key LIMIT 10;

 date_key | count |  sum
----------+-------+--------
        1 |   173 |  88953
        2 |    31 |  16315
        3 |   318 | 156003
        4 |   113 |  53341
        5 |   285 | 148380
        6 |    84 |  42421
        7 |   241 | 119315
        8 |   238 | 122380
        9 |   142 |  70151
       10 |   202 |  95274
(10 rows)

This query selects each distinct product_key value and then counts the number of distinct date_key values for all records with the specific product_key value. It also counts the number of distinct warehouse_key values in all records with the specific product_key value.

=> SELECT product_key, COUNT (DISTINCT date_key), COUNT (DISTINCT warehouse_key) FROM inventory_fact
   GROUP BY product_key LIMIT 15;

 product_key | count | count
-------------+-------+-------
           1 |    12 |    12
           2 |    18 |    18
           3 |    13 |    12
           4 |    17 |    18
           5 |    11 |     9
           6 |    14 |    13
           7 |    13 |    13
           8 |    17 |    15
           9 |    15 |    14
          10 |    12 |    12
          11 |    11 |    11
          12 |    13 |    12
          13 |     9 |     7
          14 |    13 |    13
          15 |    18 |    17
(15 rows)

This query selects each distinct product_key value, counts the number of distinct date_key and warehouse_key values for all records with the specific product_key value, and then sums all qty_in_stock values in records with the specific product_key value. It then returns the number of product_version values in records with the specific product_key value.

=> SELECT product_key, COUNT (DISTINCT date_key),
      COUNT (DISTINCT warehouse_key),
      SUM (qty_in_stock),
      COUNT (product_version)
      FROM inventory_fact GROUP BY product_key LIMIT 15;

 product_key | count | count |  sum  | count
-------------+-------+-------+-------+-------
           1 |    12 |    12 |  5530 |    12
           2 |    18 |    18 |  9605 |    18
           3 |    13 |    12 |  8404 |    13
           4 |    17 |    18 | 10006 |    18
           5 |    11 |     9 |  4794 |    11
           6 |    14 |    13 |  7359 |    14
           7 |    13 |    13 |  7828 |    13
           8 |    17 |    15 |  9074 |    17
           9 |    15 |    14 |  7032 |    15
          10 |    12 |    12 |  5359 |    12
          11 |    11 |    11 |  6049 |    11
          12 |    13 |    12 |  6075 |    13
          13 |     9 |     7 |  3470 |     9
          14 |    13 |    13 |  5125 |    13
          15 |    18 |    17 |  9277 |    18
(15 rows)

The following example returns the number of warehouses from the warehouse dimension table:

=> SELECT COUNT(warehouse_name) FROM warehouse_dimension;

 COUNT
-------
   100
(1 row)

This next example returns the total number of vendors:

=> SELECT COUNT(*) FROM vendor_dimension;

 COUNT
-------
    50
(1 row)

See also

19 - COVAR_POP

Returns the population covariance for a set of expression pairs.

Returns the population covariance for a set of expression pairs. The return value is of type DOUBLE PRECISION. COVAR_POP eliminates expression pairs where either expression in the pair is NULL. If no rows remain, the function returns NULL.

Syntax

SELECT COVAR_POP ( expression1, expression2 )

Parameters

expression1
The dependent DOUBLE PRECISION expression
expression2
The independent DOUBLE PRECISION expression

Examples

=> SELECT COVAR_POP (Annual_salary, Employee_age)
      FROM employee_dimension;
     COVAR_POP
-------------------
 -9032.34810730019
(1 row)

20 - COVAR_SAMP

Returns the sample covariance for a set of expression pairs.

Returns the sample covariance for a set of expression pairs. The return value is of type DOUBLE PRECISION. COVAR_SAMP eliminates expression pairs where either expression in the pair is NULL. If no rows remain, the function returns NULL.

Syntax

SELECT COVAR_SAMP ( expression1, expression2 )

Parameters

expression1
The dependent DOUBLE PRECISION expression
expression2
The independent DOUBLE PRECISION expression

Examples

=> SELECT COVAR_SAMP (Annual_salary, Employee_age)
      FROM employee_dimension;
    COVAR_SAMP
-------------------
 -9033.25143244343
(1 row)

21 - GROUP_ID

Uniquely identifies duplicate sets for GROUP BY queries that return duplicate grouping sets.

Uniquely identifies duplicate sets for GROUP BY queries that return duplicate grouping sets. This function returns one or more integers, starting with zero (0), as identifiers.

For the number of duplicates n for a particular grouping, GROUP_ID returns a range of sequential numbers, 0 to n–1. For the first each unique group it encounters, GROUP_ID returns the value 0. If GROUP_ID finds the same grouping again, the function returns 1, then returns 2 for the next found grouping, and so on.

Behavior type

Immutable

Syntax

GROUP_ID ()

Examples

This example shows how GROUP_ID creates unique identifiers when a query produces duplicate groupings. For an expenses table, the following query groups the results by category of expense and year and rolls up the sum for those two columns. The results have duplicate groupings for category and NULL. The first grouping has a GROUP_ID of 0, and the second grouping has a GROUP_ID of 1.

=> SELECT Category, Year, SUM(Amount), GROUPING_ID(Category, Year),
   GROUP_ID() FROM expenses GROUP BY Category, ROLLUP(Category,Year)
   ORDER BY Category, Year, GROUPING_ID();
  Category   | Year |  SUM   | GROUPING_ID | GROUP_ID
-------------+------+--------+-------------+----------
 Books       | 2005 |  39.98 |           0 |        0
 Books       | 2007 |  29.99 |           0 |        0
 Books       | 2008 |  29.99 |           0 |        0
 Books       |      |  99.96 |           1 |        0
 Books       |      |  99.96 |           1 |        1
 Electricity | 2005 | 109.99 |           0 |        0
 Electricity | 2006 | 109.99 |           0 |        0
 Electricity | 2007 | 229.98 |           0 |        0
 Electricity |      | 449.96 |           1 |        1
 Electricity |      | 449.96 |           1 |        0

See also

22 - GROUPING

Disambiguates the use of NULL values when GROUP BY queries with multilevel aggregates generate NULL values to identify subtotals in grouping columns.

Disambiguates the use of NULL values when GROUP BY queries with multilevel aggregates generate NULL values to identify subtotals in grouping columns. Such NULL values from the original data can also occur in rows. GROUPING returns 1, if the value of expression is:

  • NULL, representing an aggregated value

  • 0 for any other value, including NULL values in rows

Behavior type

Immutable

Syntax

GROUPING ( expression )

Parameters

expression
An expression in the GROUP BY clause

Examples

The following query uses the GROUPING function, taking one of the GROUP BY expressions as an argument. For each row, GROUPING returns one of the following:

  • 0: The column is part of the group for that row

  • 1: The column is not part of the group for that row

The 1 in the GROUPING(Year) column for electricity and books indicates that these values are subtotals. The right-most column values for both GROUPING(Category) and GROUPING(Year) are 1. This value indicates that neither column contributed to the GROUP BY. The final row represents the total sales.

=> SELECT Category, Year, SUM(Amount),
   GROUPING(Category), GROUPING(Year) FROM expenses
   GROUP BY ROLLUP(Category, Year) ORDER BY Category, Year, GROUPING_ID();
   Category  | Year |  SUM   | GROUPING | GROUPING
-------------+------+--------+----------+----------
 Books       | 2005 |  39.98 |        0 |        0
 Books       | 2007 |  29.99 |        0 |        0
 Books       | 2008 |  29.99 |        0 |        0
 Books       |      |  99.96 |        0 |        1
 Electricity | 2005 | 109.99 |        0 |        0
 Electricity | 2006 | 109.99 |        0 |        0
 Electricity | 2007 | 229.98 |        0 |        0
 Electricity |      | 449.96 |        0 |        1
             |      | 549.92 |        1 |        1

See also

23 - GROUPING_ID

Concatenates the set of Boolean values generated by the GROUPING function into a bit vector.

Concatenates the set of Boolean values generated by the GROUPING function into a bit vector. GROUPING_ID treats the bit vector as a binary number and returns it as a base-10 value that identifies the grouping set combination.

By using GROUPING_ID you avoid the need for multiple, individual GROUPING functions. GROUPING_ID simplifies row-filtering conditions, because rows of interest are identified using a single return from GROUPING_ID = n. Use GROUPING_ID to identify grouping combinations.

Behavior type

Immutable

Syntax

GROUPING_ID ( [expression[,...] )
expression
An expression that matches one of the expressions in the GROUP BY clause.

If the GROUP BY clause includes a list of expressions, GROUPING_ID returns a number corresponding to the GROUPING bit vector associated with a row.

Examples

This example shows how calling GROUPING_ID without an expression returns the GROUPING bit vector associated with a full set of multilevel aggregate expressions. The GROUPING_ID value is comparable to GROUPING_ID(a,b) because GROUPING_ID() includes all columns in the GROUP BY ROLLUP:

=> SELECT a,b,COUNT(*), GROUPING_ID() FROM T GROUP BY ROLLUP(a,b);

In the following query, the GROUPING(Category) and GROUPING(Year) columns have three combinations:

  • 0,0

  • 0,1

  • 1,1

=> SELECT Category, Year, SUM(Amount),
   GROUPING(Category), GROUPING(Year) FROM expenses
   GROUP BY ROLLUP(Category, Year) ORDER BY Category, Year, GROUPING_ID();
  Category   | Year |  SUM   | GROUPING | GROUPING
-------------+------+--------+----------+----------
 Books       | 2005 |  39.98 |        0 |        0
 Books       | 2007 |  29.99 |        0 |        0
 Books       | 2008 |  29.99 |        0 |        0
 Books       |      |  99.96 |        0 |        1
 Electricity | 2005 | 109.99 |        0 |        0
 Electricity | 2006 | 109.99 |        0 |        0
 Electricity | 2007 | 229.98 |        0 |        0
 Electricity |      | 449.96 |        0 |        1
             |      | 549.92 |        1 |        1

GROUPING_ID converts these values as follows:

Binary Set Values Decimal Equivalents
00 0
01 1
11 3
0 Category, Year

The following query returns the single number for each GROUP BY level that appears in the gr_id column:

=> SELECT Category, Year, SUM(Amount),
   GROUPING(Category),GROUPING(Year),GROUPING_ID(Category,Year) AS gr_id
   FROM expenses GROUP BY ROLLUP(Category, Year);
  Category   | Year |  SUM   | GROUPING | GROUPING | gr_id
-------------+------+--------+----------+----------+-------
 Books       | 2008 |  29.99 |        0 |        0 |     0
 Books       | 2005 |  39.98 |        0 |        0 |     0
 Electricity | 2007 | 229.98 |        0 |        0 |     0
 Books       | 2007 |  29.99 |        0 |        0 |     0
 Electricity | 2005 | 109.99 |        0 |        0 |     0
 Electricity |      | 449.96 |        0 |        1 |     1
             |      | 549.92 |        1 |        1 |     3
 Electricity | 2006 | 109.99 |        0 |        0 |     0
 Books       |      |  99.96 |        0 |        1 |     1

The gr_id value determines the GROUP BY level for each row:

GROUP BY Level
GROUP BY Row Level
3
Total sum
1
Category
0
Category, year

You can also use the DECODE function to give the values more meaning by comparing each search value individually:

=> SELECT Category, Year, SUM(AMOUNT), DECODE(GROUPING_ID(Category, Year),
       3, 'Total',
       1, 'Category',
       0, 'Category,Year')
   AS GROUP_NAME FROM expenses GROUP BY ROLLUP(Category, Year);
  Category   | Year |  SUM   |  GROUP_NAME
-------------+------+--------+---------------
 Electricity | 2006 | 109.99 | Category,Year
 Books       |      |  99.96 | Category
 Electricity | 2007 | 229.98 | Category,Year
 Books       | 2007 |  29.99 | Category,Year
 Electricity | 2005 | 109.99 | Category,Year
 Electricity |      | 449.96 | Category
             |      | 549.92 | Total
 Books       | 2005 |  39.98 | Category,Year
 Books       | 2008 |  29.99 | Category,Year

See also

24 - LISTAGG

Transforms non-null values from a group of rows into a list of values that are delimited by commas (default) or a configurable separator.

Transforms non-null values from a group of rows into a list of values that are delimited by commas (default) or a configurable separator. LISTAGG can be used to denormalize rows into a string of concatenated values.

Behavior type

Immutable if the WITHIN GROUP ORDER BY clause specifies a column or set of columns that resolves to unique values within the aggregated list; otherwise Volatile.

Syntax

LISTAGG ( aggregate-expression [ USING PARAMETERS parameter=value][,...] ] ) [ within-group-order-by-clause ]

Arguments

aggregate-expression
Aggregation of one or more columns or column expressions to select from the source table or view.

LISTAGG does not support spatial data types directly. In order to pass column data of this type, convert the data to strings with the geospatial function ST_AsText.

within-group-order-by-clause
Sorts aggregated values within each group of rows, where column-expression is typically a column in aggregate-expression:
WITHIN GROUP (ORDER BY { column-expression[ sort-qualifiers ] }[,...])

sort-qualifiers:

   { ASC | DESC [ NULLS { FIRST | LAST | AUTO } ] }

Parameters

Parameter name Set to...
max_length

An integer or integer expression that specifies in bytes the maximum length of the result, up to 32M.

Default: 1024

separator

Separator string of length 0 to 80, inclusive. A length of 0 concatenates the output with no separators.

Default: comma (,)

on_overflow

Specifies behavior when the result overflows the max_length setting, one of the following strings:

  • ERROR (default): Return an error when overflow occurs.

  • TRUNCATE: Remove any characters that exceed max_length setting from the query result, and return the truncated string.

Privileges

None

Examples

In the following query, the aggregated results in the CityState column use the string " | " as a separator. The outer GROUP BY clause groups the output rows according to their Region values. Within each group, the aggregated list items are sorted according to their city values, as per the WITHIN GROUP ORDER BY clause:

=> \x
Expanded display is on.
=> WITH cd AS (SELECT DISTINCT (customer_city) city, customer_state, customer_region FROM customer_dimension)
SELECT customer_region Region, LISTAGG(city||', '||customer_state USING PARAMETERS separator=' | ')
   WITHIN GROUP (ORDER BY city) CityAndState FROM cd GROUP BY region ORDER BY region;
-[ RECORD 1 ]+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Region       | East
CityAndState | Alexandria, VA | Allentown, PA | Baltimore, MD | Boston, MA | Cambridge, MA | Charlotte, NC | Clarksville, TN | Columbia, SC | Elizabeth, NJ | Erie, PA | Fayetteville, NC | Hartford, CT | Lowell, MA | Manchester, NH | Memphis, TN | Nashville, TN | New Haven, CT | New York, NY | Philadelphia, PA | Portsmouth, VA | Stamford, CT | Sterling Heights, MI | Washington, DC | Waterbury, CT
-[ RECORD 2 ]+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Region       | MidWest
CityAndState | Ann Arbor, MI | Cedar Rapids, IA | Chicago, IL | Columbus, OH | Detroit, MI | Evansville, IN | Flint, MI | Gary, IN | Green Bay, WI | Indianapolis, IN | Joliet, IL | Lansing, MI | Livonia, MI | Milwaukee, WI | Naperville, IL | Peoria, IL | Sioux Falls, SD | South Bend, IN | Springfield, IL
-[ RECORD 3 ]+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Region       | NorthWest
CityAndState | Bellevue, WA | Portland, OR | Seattle, WA
-[ RECORD 4 ]+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Region       | South
CityAndState | Abilene, TX | Athens, GA | Austin, TX | Beaumont, TX | Cape Coral, FL | Carrollton, TX | Clearwater, FL | Coral Springs, FL | Dallas, TX | El Paso, TX | Fort Worth, TX | Grand Prairie, TX | Houston, TX | Independence, MS | Jacksonville, FL | Lafayette, LA | McAllen, TX | Mesquite, TX | San Antonio, TX | Savannah, GA | Waco, TX | Wichita Falls, TX
-[ RECORD 5 ]+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Region       | SouthWest
CityAndState | Arvada, CO | Denver, CO | Fort Collins, CO | Gilbert, AZ | Las Vegas, NV | North Las Vegas, NV | Peoria, AZ | Phoenix, AZ | Pueblo, CO | Topeka, KS | Westminster, CO
-[ RECORD 6 ]+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Region       | West
CityAndState | Berkeley, CA | Burbank, CA | Concord, CA | Corona, CA | Costa Mesa, CA | Daly City, CA | Downey, CA | El Monte, CA | Escondido, CA | Fontana, CA | Fullerton, CA | Inglewood, CA | Lancaster, CA | Los Angeles, CA | Norwalk, CA | Orange, CA | Palmdale, CA | Pasadena, CA | Provo, UT | Rancho Cucamonga, CA | San Diego, CA | San Francisco, CA | San Jose, CA | Santa Clara, CA | Simi Valley, CA | Sunnyvale, CA | Thousand Oaks, CA | Vallejo, CA | Ventura, CA | West Covina, CA | West Valley City, UT

25 - MAX [aggregate]

Returns the greatest value of an expression over a group of rows.

Returns the greatest value of an expression over a group of rows. The return value has the same type as the expression data type.

The MAX analytic function function differs from the aggregate function, in that it returns the maximum value of an expression over a group of rows within a window.

Aggregate functions MIN and MAX can operate with Boolean values. MAX can act upon a Boolean data type or a value that can be implicitly converted to a Boolean. If at least one input value is true, MAX returns t (true). Otherwise, it returns f (false). In the same scenario, MIN returns t (true) if all input values are true. Otherwise it returns f.

Behavior type

Immutable

Syntax

MAX ( expression )

Parameters

expression
Any expression for which the maximum value is calculated, typically a column reference.

Examples

The following query returns the largest value in column sales_dollar_amount.

=> SELECT MAX(sales_dollar_amount) AS highest_sale FROM store.store_sales_fact;
 highest_sale
--------------
          600
(1 row)

The following example shows you the difference between the MIN and MAX aggregate functions when you use them with a Boolean value. The sample creates a table, adds two rows of data, and shows sample output for MIN and MAX.

See also

Data aggregation

26 - MIN [aggregate]

Returns the smallest value of an expression over a group of rows.

Returns the smallest value of an expression over a group of rows. The return value has the same type as the expression data type.

The MIN analytic function differs from the aggregate function, in that it returns the minimum value of an expression over a group of rows within a window.

Aggregate functions MIN and MAX can operate with Boolean values. MAX can act upon a Boolean data type or a value that can be implicitly converted to a Boolean. If at least one input value is true, MAX returns t (true). Otherwise, it returns f (false). In the same scenario, MIN returns t (true) if all input values are true. Otherwise it returns f.

Behavior type

Immutable

Syntax

MIN ( expression )

Parameters

expression
Any expression for which the minimum value is calculated, typically a column reference.

Examples

The following query returns the lowest salary from the employee dimension table.

This example shows how you can query to return the lowest salary from the employee dimension table.

=> SELECT MIN(annual_salary) AS lowest_paid FROM employee_dimension;
 lowest_paid
-------------
        1200
(1 row)

The following example shows you the difference between the MIN and MAX aggregate functions when you use them with a Boolean value. The sample creates a table, adds two rows of data, and shows sample output for MIN and MAX.

See also

Data aggregation

27 - REGR_AVGX

Returns the DOUBLE PRECISION average of the independent expression in an expression pair.

Returns the DOUBLE PRECISION average of the independent expression in an expression pair. REGR_AVGX eliminates expression pairs where either expression in the pair is NULL. If no rows remain, REGR_AVGX returns NULL.

Syntax

SELECT REGR_AVGX ( expression1, expression2 )

Parameters

expression1
The dependent DOUBLE PRECISION expression
expression2
The independent DOUBLE PRECISION expression

Examples

=> SELECT REGR_AVGX (Annual_salary, Employee_age)
      FROM employee_dimension;
 REGR_AVGX
-----------
    39.321
(1 row)

28 - REGR_AVGY

Returns the DOUBLE PRECISION average of the dependent expression in an expression pair.

Returns the DOUBLE PRECISION average of the dependent expression in an expression pair. The function eliminates expression pairs where either expression in the pair is NULL. If no rows remain, the function returns NULL.

Syntax

REGR_AVGY ( expression1, expression2 )

Parameters

expression1
The dependent DOUBLE PRECISION expression
expression2
The independent DOUBLE PRECISION expression

Examples

=> SELECT REGR_AVGY (Annual_salary, Employee_age)
      FROM employee_dimension;
 REGR_AVGY
------------
 58354.4913
(1 row)

29 - REGR_COUNT

Returns the count of all rows in an expression pair.

Returns the count of all rows in an expression pair. The function eliminates expression pairs where either expression in the pair is NULL. If no rows remain, the function returns 0.

Syntax

SELECT REGR_COUNT ( expression1, expression2 )

Parameters

expression1
The dependent DOUBLE PRECISION expression
expression2
The independent DOUBLE PRECISION expression

Examples

=> SELECT REGR_COUNT (Annual_salary, Employee_age) FROM employee_dimension;
 REGR_COUNT
------------
      10000
(1 row)

30 - REGR_INTERCEPT

Returns the y-intercept of the regression line determined by a set of expression pairs.

Returns the y-intercept of the regression line determined by a set of expression pairs. The return value is of type DOUBLE PRECISION. REGR_INTERCEPT eliminates expression pairs where either expression in the pair is NULL. If no rows remain, REGR_INTERCEPT returns NULL.

Syntax

SELECT REGR_INTERCEPT ( expression1, expression2 )

Parameters

expression1
The dependent DOUBLE PRECISION expression
expression2
The independent DOUBLE PRECISION expression

Examples

=> SELECT REGR_INTERCEPT (Annual_salary, Employee_age) FROM employee_dimension;
  REGR_INTERCEPT
------------------
 59929.5490163437
(1 row)

31 - REGR_R2

Returns the square of the correlation coefficient of a set of expression pairs.

Returns the square of the correlation coefficient of a set of expression pairs. The return value is of type DOUBLE PRECISION. REGR_R2 eliminates expression pairs where either expression in the pair is NULL. If no rows remain, REGR_R2 returns NULL.

Syntax

SELECT REGR_R2 ( expression1, expression2 )

Parameters

expression1
The dependent DOUBLE PRECISION expression
expression2
The independent DOUBLE PRECISION expression

Examples

=> SELECT REGR_R2 (Annual_salary, Employee_age) FROM employee_dimension;
       REGR_R2
----------------------
 5.17181631706311e-05
(1 row)

32 - REGR_SLOPE

Returns the slope of the regression line, determined by a set of expression pairs.

Returns the slope of the regression line, determined by a set of expression pairs. The return value is of type DOUBLE PRECISION. REGR_SLOPE eliminates expression pairs where either expression in the pair is NULL. If no rows remain, REGR_SLOPE returns NULL.

Syntax

SELECT REGR_SLOPE ( expression1, expression2 )

Parameters

expression1
The dependent DOUBLE PRECISION expression
expression2
The independent DOUBLE PRECISION expression

Examples

=> SELECT REGR_SLOPE (Annual_salary, Employee_age) FROM employee_dimension;
    REGR_SLOPE
------------------
 -40.056400303749
(1 row)

33 - REGR_SXX

Returns the sum of squares of the difference between the independent expression (expression2) and its average.

Returns the sum of squares of the difference between the independent expression (expression2) and its average.

That is, REGR_SXX returns: ∑[(expression2 - average(expression2)(expression2 - average(expression2)]

The return value is of type DOUBLE PRECISION. REGR_SXX eliminates expression pairs where either expression in the pair is NULL. If no rows remain, REGR_SXX returns NULL.

Syntax

SELECT REGR_SXX ( expression1, expression2 )

Parameters

expression1
The dependent DOUBLE PRECISION expression
expression2
The independent DOUBLE PRECISION expression

Examples

=> SELECT REGR_SXX (Annual_salary, Employee_age) FROM employee_dimension;
  REGR_SXX
------------
 2254907.59
(1 row)

34 - REGR_SXY

Returns the sum of products of the difference between the dependent expression (expression1) and its average and the difference between the independent expression (expression2) and its average.

Returns the sum of products of the difference between the dependent expression (expression1) and its average and the difference between the independent expression (expression2) and its average.

That is, REGR_SXY returns: ∑[(expression1 - average(expression1)(expression2 - average(expression2))]

The return value is of type DOUBLE PRECISION. REGR_SXY eliminates expression pairs where either expression in the pair is NULL. If no rows remain, REGR_SXY returns NULL.

Syntax

SELECT REGR_SXY ( expression1, expression2 )

Parameters

expression1
The dependent DOUBLE PRECISION expression
expression2
The independent DOUBLE PRECISION expression

Examples

=> SELECT REGR_SXY (Annual_salary, Employee_age) FROM employee_dimension;
     REGR_SXY
-------------------
 -90323481.0730019
(1 row)

35 - REGR_SYY

Returns the sum of squares of the difference between the dependent expression (expression1) and its average.

Returns the sum of squares of the difference between the dependent expression (expression1) and its average.

That is, REGR_SYY returns: ∑[(expression1 - average(expression1)(expression1 - average(expression1)]

The return value is of type DOUBLE PRECISION. REGR_SYY eliminates expression pairs where either expression in the pair is NULL. If no rows remain, REGR_SYY returns NULL.

Syntax

SELECT REGR_SYY ( expression1, expression2 )

Parameters

expression1
The dependent DOUBLE PRECISION expression
expression2
The independent DOUBLE PRECISION expression

Examples

=> SELECT REGR_SYY (Annual_salary, Employee_age) FROM employee_dimension;
     REGR_SYY
------------------
 69956728794707.2
(1 row)

36 - STDDEV [aggregate]

Evaluates the statistical sample standard deviation for each member of the group.

Evaluates the statistical sample standard deviation for each member of the group. The return value is the same as the square root of VAR_SAMP:

STDDEV(expression) = SQRT(VAR_SAMP(expression))

Behavior type

Immutable

Syntax

STDDEV ( expression )

Parameters

expression
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. STDDEV returns the same data type as expression.
  • Nonstandard function STDDEV is provided for compatibility with other databases. It is semantically identical to STDDEV_SAMP.

  • This aggregate function differs from analytic function STDDEV, which computes the statistical sample standard deviation of the current row with respect to the group of rows within a window.

  • When VAR_SAMP returns NULL, STDDEV returns NULL.

Examples

The following example returns the statistical sample standard deviation for each household ID from the customer_dimension table of the VMart example database:

=> SELECT STDDEV(household_id) FROM customer_dimension;
   STDDEV
-----------------
 8651.5084240071

37 - STDDEV_POP [aggregate]

Evaluates the statistical population standard deviation for each member of the group.

Evaluates the statistical population standard deviation for each member of the group.

Behavior type

Immutable

Syntax

STDDEV_POP ( expression )

Parameters

expression
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. STDDEV_POP returns the same data type as expression.
  • This function differs from the analytic function STDDEV_POP, which evaluates the statistical population standard deviation for each member of the group of rows within a window.

  • STDDEV_POP returns the same value as the square root of VAR_POP:

    STDDEV_POP(expression) = SQRT(VAR_POP(expression))
    
  • When VAR_SAMP returns NULL, this function returns NULL.

Examples

The following example returns the statistical population standard deviation for each household ID in the customer table.

=> SELECT STDDEV_POP(household_id) FROM customer_dimension;
   STDDEV_POP
------------------
 8651.41895973367
(1 row)

See also

38 - STDDEV_SAMP [aggregate]

Evaluates the statistical sample standard deviation for each member of the group.

Evaluates the statistical sample standard deviation for each member of the group. The return value is the same as the square root of VAR_SAMP:

STDDEV_SAMP(expression) = SQRT(VAR_SAMP(expression))

Behavior type

Immutable

Syntax

STDDEV_SAMP ( expression )

Parameters

expression
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. STDDEV_SAMP returns the same data type as expression.
  • STDDEV_SAMP is semantically identical to nonstandard function STDDEV, which is provided for compatibility with other databases.

  • This aggregate function differs from analytic function STDDEV_SAMP, which computes the statistical sample standard deviation of the current row with respect to the group of rows within a window.

  • When VAR_SAMP returns NULL, STDDEV_SAMP returns NULL.

Examples

The following example returns the statistical sample standard deviation for each household ID from the customer dimension table.

=> SELECT STDDEV_SAMP(household_id) FROM customer_dimension;
   stddev_samp
------------------
 8651.50842400771
(1 row)

39 - SUM [aggregate]

Computes the sum of an expression over a group of rows.

Computes the sum of an expression over a group of rows. SUM returns a DOUBLE PRECISION value for a floating-point expression. Otherwise, the return value is the same as the expression data type.

The SUM aggregate function differs from the SUM analytic function, which computes the sum of an expression over a group of rows within a window.

Behavior type

Immutable

Syntax

SUM ( [ ALL | DISTINCT ] expression )

Parameters

ALL
Invokes the aggregate function for all rows in the group (default)
DISTINCT
Invokes the aggregate function for all distinct non-null values of the expression found in the group
expression
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument.

Overflow handling

If you encounter data overflow when using SUM(), use SUM_FLOAT which converts the data to a floating point.

By default, Vertica allows silent numeric overflow when you call this function on numeric data types. For more information on this behavior and how to change it, seeNumeric data type overflow with SUM, SUM_FLOAT, and AVG.

Examples

The following query returns the total sum of the product_cost column.

=> SELECT SUM(product_cost) AS cost FROM product_dimension;
   cost
---------
 9042850
(1 row)

See also

40 - SUM_FLOAT [aggregate]

Computes the sum of an expression over a group of rows and returns a DOUBLE PRECISION value.

Computes the sum of an expression over a group of rows and returns a DOUBLE PRECISION value.

Behavior type

Immutable

Syntax

SUM_FLOAT ( [ ALL | DISTINCT ] expression )

Parameters

ALL
Invokes the aggregate function for all rows in the group (default).
DISTINCT
Invokes the aggregate function for all distinct non-null values of the expression found in the group.
expression
Any expression whose result is type DOUBLE PRECISION.

Overflow handling

By default, Vertica allows silent numeric overflow when you call this function on numeric data types. For more information on this behavior and how to change it, seeNumeric data type overflow with SUM, SUM_FLOAT, and AVG.

Examples

The following query returns the floating-point sum of the average price from the product table:

=> SELECT SUM_FLOAT(average_competitor_price) AS cost FROM product_dimension;
   cost
----------
 18181102
(1 row)

41 - VAR_POP [aggregate]

Evaluates the population variance for each member of the group.

Evaluates the population variance for each member of the group. This is defined as the sum of squares of the difference of *expression*from the mean of expression, divided by the number of remaining rows:

(SUM(expression*expression) - SUM(expression)*SUM(expression) / COUNT(expression)) / COUNT(expression)

Behavior type

Immutable

Syntax

VAR_POP ( expression )

Parameters

expression
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. VAR_POP returns the same data type as expression.

This aggregate function differs from analytic function VAR_POP, which computes the population variance of the current row with respect to the group of rows within a window.

Examples

The following example returns the population variance for each household ID in the customer table.

=> SELECT VAR_POP(household_id) FROM customer_dimension;
    var_pop
------------------
 74847050.0168393
(1 row)

42 - VAR_SAMP [aggregate]

Evaluates the sample variance for each row of the group.

Evaluates the sample variance for each row of the group. This is defined as the sum of squares of the difference of expression from the mean of expression divided by the number of remaining rows minus 1:

(SUM(expression*expression) - SUM(expression) *SUM(expression) / COUNT(expression)) / (COUNT(expression) -1)

Behavior type

Immutable

Syntax

VAR_SAMP ( expression )

Parameters

expression
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. VAR_SAMP returns the same data type as expression.
  • VAR_SAMP is semantically identical to nonstandard function VARIANCE, which is provided for compatibility with other databases.

  • This aggregate function differs from analytic function VAR_SAMP, which computes the sample variance of the current row with respect to the group of rows within a window.

Examples

The following example returns the sample variance for each household ID in the customer table.

=> SELECT VAR_SAMP(household_id) FROM customer_dimension;
     var_samp
------------------
 74848598.0106764
(1 row)

See also

VARIANCE [aggregate]

43 - VARIANCE [aggregate]

Evaluates the sample variance for each row of the group.

Evaluates the sample variance for each row of the group. This is defined as the sum of squares of the difference of expression from the mean of expression divided by the number of remaining rows minus 1.

(SUM(expression*expression) - SUM(expression) *SUM(expression) /COUNT(expression)) / (COUNT(expression) -1)

Behavior type

Immutable

Syntax

VARIANCE ( expression )

Parameters

expression
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. VARIANCE returns the same data type as expression.

The nonstandard function VARIANCE is provided for compatibility with other databases. It is semantically identical to VAR_SAMP.

This aggregate function differs from analytic function VARIANCE, which computes the sample variance of the current row with respect to the group of rows within a window.

Examples

The following example returns the sample variance for each household ID in the customer table.

=> SELECT VARIANCE(household_id) FROM customer_dimension;
     variance
------------------
 74848598.0106764
(1 row)

See also

44 - WITHIN GROUP ORDER BY clause

Specifies how to sort rows that are grouped by aggregate functions, one of the following:.

Specifies how to sort rows that are grouped by aggregate functions, one of the following:

This clause is also supported for user-defined aggregate functions.

The order clause only specifies order within the result set of each group. The query can have its own ORDER BY clause, which has precedence over order that is specified by WITHIN GROUP ORDER BY, and orders the final result set.

Syntax

WITHIN GROUP (ORDER BY
  { column-expression [ ASC | DESC [ NULLS { FIRST | LAST | AUTO } ] ]
  }[,...])

Parameters

column-expression
A column, constant, or arbitrary expression formed on columns, on which to sort grouped rows.
ASC | DESC
Specifies the ordering sequence as ascending (default) or descending.
NULLS {FIRST | LAST | AUTO}
Specifies whether to position null values first or last. Default positioning depends on whether the sort order is ascending or descending:
  • Ascending default: NULLS LAST

  • Descending default: NULLS FIRST

If you specify NULLS AUTO, Vertica chooses the positioning that is most efficient for this query, either NULLS FIRST or NULLS LAST.

If you omit all sort qualifiers, Vertica uses ASC NULLS LAST.

Examples

For usage examples, see these functions: