This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Aggregate functions
All functions in this section that have an analytic function counterpart are appended with [Aggregate] to avoid confusion between the two.
Note
All functions in this section that have an
analytic function counterpart are appended with [Aggregate] to avoid confusion between the two.
Aggregate functions summarize data over groups of rows from a query result set. The groups are specified using the GROUP BY clause. They are allowed only in the select list and in the HAVING and ORDER BY clauses of a SELECT statement (as described in Aggregate expressions).
Notes
-
Except for COUNT, these functions return a null value when no rows are selected. In particular, SUM of no rows returns NULL, not zero.
-
In some cases you can replace an expression that includes multiple aggregates with an single aggregate of an expression. For example SUM(x) + SUM(y) can be expressed as as SUM(x+y) (where x and y are NOT NULL).
-
Vertica does not support nested aggregate functions.
You can also use some of the simple aggregate functions as analytic (window) functions. See Analytic functions for details. See also SQL analytics.
1 - APPROXIMATE_COUNT_DISTINCT
Returns the number of distinct non-NULL values in a data set.
Returns the number of distinct non-NULL values in a data set.
Behavior type
Immutable
Syntax
APPROXIMATE_COUNT_DISTINCT ( expression[, error-tolerance ] )
Parameters
expression
- Value to be evaluated using any data type that supports equality comparison.
error-tolerance
Numeric value that represents the desired percentage of error tolerance, distributed around the value returned by this function. The smaller the error tolerance, the closer the approximation.
You can set error-tolerance
to a minimum value of 0.88. Vertica imposes no maximum restriction, but any value greater than 5 is implemented with 5% error tolerance.
If you omit this argument, Vertica uses an error tolerance of 1.25(%).
Restrictions
APPROXIMATE_COUNT_DISTINCT and DISTINCT aggregates cannot be in the same query block.
Error tolerance
APPROXIMATE_COUNT_DISTINCT(
x
,
error-tolerance
)
returns a value equal to COUNT(DISTINCT
x
)
, with an error that is lognormally distributed with standard deviation.
Parameter error-tolerance
is optional. Supply this argument to specify the desired standard deviation. error-tolerance
is defined as 2.17 standard deviations, which corresponds to a 97 percent confidence interval:
standard-deviation = error-tolerance / 2.17
For example:
-
error-tolerance
= 1
Default setting, corresponds to a standard deviation
97 percent of the time, APPROXIMATE_COUNT_DISTINCT(x,5
) returns a value between:
-
COUNT(DISTINCT
x
) * 0.99
-
COUNT(DISTINCT
x
) * 1.01
-
error-tolerance
= 5
97 percent of the time, APPROXIMATE_COUNT_DISTINCT(
x
)
returns a value between:
-
COUNT(DISTINCT
x
) * 0.95
-
COUNT(DISTINCT
x
) * 1.05
A 99 percent confidence interval corresponds to 2.58
standard deviations. To set error-tolerance
confidence level corresponding to 99 (instead of a 97) percent , multiply error-tolerance
by 2.17 / 2.58 = 0.841
.
For example, if you specify error-tolerance
as 5 * 0.841 = 4.2
, APPROXIMATE_COUNT_DISTINCT(
x,4.2
)
returns values 99 percent of the time between:
Examples
Count the total number of distinct values in column product_key
from table store.store_sales_fact
:
=> SELECT COUNT(DISTINCT product_key) FROM store.store_sales_fact;
COUNT
-------
19982
(1 row)
Count the approximate number of distinct values in product_key
with various error tolerances. The smaller the error tolerance, the closer the approximation:
=> SELECT APPROXIMATE_COUNT_DISTINCT(product_key,5) AS five_pct_accuracy,
APPROXIMATE_COUNT_DISTINCT(product_key,1) AS one_pct_accuracy,
APPROXIMATE_COUNT_DISTINCT(product_key,.88) AS point_eighteight_pct_accuracy
FROM store.store_sales_fact;
five_pct_accuracy | one_pct_accuracy | point_eighteight_pct_accuracy
-------------------+------------------+-------------------------------
19431 | 19921 | 19921
(1 row)
See also
Approximate count distinct functions
2 - APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS
Calculates the number of distinct non-NULL values from the synopsis objects created by APPROXIMATE_COUNT_DISTINCT_SYNOPSIS.
Calculates the number of distinct non-NULL values from the synopsis objects created by APPROXIMATE_COUNT_DISTINCT_SYNOPSIS.
Behavior type
Immutable
Syntax
APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS ( synopsis-obj[, error-tolerance ] )
Parameters
synopsis-obj
- A synopsis object created by APPROXIMATE_COUNT_DISTINCT_SYNOPSIS.
error-tolerance
Numeric value that represents the desired percentage of error tolerance, distributed around the value returned by this function. The smaller the error tolerance, the closer the approximation.
You can set error-tolerance
to a minimum value of 0.88. Vertica imposes no maximum restriction, but any value greater than 5 is implemented with 5% error tolerance.
If you omit this argument, Vertica uses an error tolerance of 1.25(%).
For more details, see APPROXIMATE_COUNT_DISTINCT.
Restrictions
APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS and DISTINCT aggregates cannot be in the same query block.
Examples
The following examples review and compare different ways to obtain a count of unique values in a table column:
Return an exact count of unique values in column product_key, from table store.store_sales_fact
:
=> \timing
Timing is on.
=> SELECT COUNT(DISTINCT product_key) from store.store_sales_fact;
count
-------
19982
(1 row)
Time: First fetch (1 row): 553.033 ms. All rows formatted: 553.075 ms
Return an approximate count of unique values in column product_key
:
=> SELECT APPROXIMATE_COUNT_DISTINCT(product_key) as unique_product_keys
FROM store.store_sales_fact;
unique_product_keys
---------------------
19921
(1 row)
Time: First fetch (1 row): 394.562 ms. All rows formatted: 394.600 ms
Create a synopsis object that represents a set of store.store_sales_fact
data with unique product_key
values, store the synopsis in the new table my_summary
:
=> CREATE TABLE my_summary AS SELECT APPROXIMATE_COUNT_DISTINCT_SYNOPSIS (product_key) syn
FROM store.store_sales_fact;
CREATE TABLE
Time: First fetch (0 rows): 582.662 ms. All rows formatted: 582.682 ms
Return a count from the saved synopsis:
=> SELECT APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS(syn) FROM my_summary;
ApproxCountDistinctOfSynopsis
-------------------------------
19921
(1 row)
Time: First fetch (1 row): 105.295 ms. All rows formatted: 105.335 ms
See also
Approximate count distinct functions
3 - APPROXIMATE_COUNT_DISTINCT_SYNOPSIS
Summarizes the information of distinct non-NULL values and materializes the result set in a VARBINARY or LONG VARBINARY synopsis object.
Summarizes the information of distinct non-NULL values and materializes the result set in a VARBINARY or LONG VARBINARY synopsis
object. The calculated result is within a specified range of error tolerance. You save the synopsis object in a Vertica table for use by APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS.
Behavior type
Immutable
Syntax
APPROXIMATE_COUNT_DISTINCT_SYNOPSIS ( expression[, error-tolerance] )
Parameters
expression
- Value to evaluate using any data type that supports equality comparison.
error-tolerance
Numeric value that represents the desired percentage of error tolerance, distributed around the value returned by this function. The smaller the error tolerance, the closer the approximation.
You can set error-tolerance
to a minimum value of 0.88. Vertica imposes no maximum restriction, but any value greater than 5 is implemented with 5% error tolerance.
If you omit this argument, Vertica uses an error tolerance of 1.25(%).
For more details, see APPROXIMATE_COUNT_DISTINCT.
Restrictions
APPROXIMATE_COUNT_DISTINCT_SYNOPSIS and DISTINCT aggregates cannot be in the same query block.
Examples
See APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS.
See also
Approximate count distinct functions
4 - APPROXIMATE_COUNT_DISTINCT_SYNOPSIS_MERGE
Aggregates multiple synopses into one new synopsis.
Aggregates multiple synopses into one new synopsis. This function is similar to APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS but returns one synopsis instead of the count estimate. The benefit of this function is that it speeds up final estimation when calling APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS.
For example, if you need to regularly estimate count distinct of users for a long period of time (such as several years) you can pre-accumulate synopses of days into one synopsis for a year.
Behavior type
Immutable
Syntax
APPROXIMATE_COUNT_DISTINCT_SYNOPSIS_MERGE ( synopsis-obj [, error-tolerance] )
Parameters
synopsis-obj
- An expression that can be evaluated to one or more synopses. Typically a
synopsis-obj
is generated as a binary string by either the APPROXIMATE_COUNT_DISTINCT or APPROXIMATE_COUNT_DISTINCT_SYNOPSIS_MERGE function and is stored in a table column of type VARBINARY or LONG VARBINARY.
error-tolerance
Numeric value that represents the desired percentage of error tolerance, distributed around the value returned by this function. The smaller the error tolerance, the closer the approximation.
You can set error-tolerance
to a minimum value of 0.88. Vertica imposes no maximum restriction, but any value greater than 5 is implemented with 5% error tolerance.
If you omit this argument, Vertica uses an error tolerance of 1.25(%).
For more details, see APPROXIMATE_COUNT_DISTINCT.
Examples
See Approximate count distinct functions.
5 - APPROXIMATE_MEDIAN [aggregate]
Computes the approximate median of an expression over a group of rows.
Computes the approximate median of an expression over a group of rows. The function returns a FLOAT value.
APPROXIMATE_MEDIAN
is an alias of APPROXIMATE_PERCENTILE [aggregate] with a parameter of 0.5.
Note
Note: This function is best suited for large groups of data. If you have a small group of data, use the exact
MEDIAN [analytic] function.
Behavior type
Immutable
Syntax
APPROXIMATE_MEDIAN ( expression )
Parameters
expression
- Any FLOAT or INTEGER data type. The function returns the approximate middle value or an interpolated value that would be the approximate middle value once the values are sorted. Null values are ignored in the calculation.
Examples
Tip
For optimal performance when using GROUP BY
in your query, verify that your table is sorted on the GROUP BY
column.
The following examples uses this table:
CREATE TABLE allsales(state VARCHAR(20), name VARCHAR(20), sales INT) ORDER BY state;
INSERT INTO allsales VALUES('MA', 'A', 60);
INSERT INTO allsales VALUES('NY', 'B', 20);
INSERT INTO allsales VALUES('NY', 'C', 15);
INSERT INTO allsales VALUES('MA', 'D', 20);
INSERT INTO allsales VALUES('MA', 'E', 50);
INSERT INTO allsales VALUES('NY', 'F', 40);
INSERT INTO allsales VALUES('MA', 'G', 10);
COMMIT;
Calculate the approximate median of all sales in this table:
=> SELECT APPROXIMATE_MEDIAN (sales) FROM allsales;
APROXIMATE_MEDIAN
--------------------
20
(1 row)
Modify the query to group sales by state, and obtain the approximate median for each one:
=> SELECT state, APPROXIMATE_MEDIAN(sales) FROM allsales GROUP BY state;
state | APPROXIMATE_MEDIAN
-------+--------------------
MA | 35
NY | 20
(2 rows)
See also
6 - APPROXIMATE_PERCENTILE [aggregate]
Computes the approximate percentile of an expression over a group of rows.
Computes the approximate percentile of an expression over a group of rows. This function returns a FLOAT value.
Note
Note: Use this function when many rows are aggregated into groups. If the number of aggregated rows is small, use the analytic function
PERCENTILE_CONT.
Behavior type
Immutable
Syntax
APPROXIMATE_PERCENTILE ( column-expression USING PARAMETERS percentiles='percentile-values' )
Arguments
column-expression
- A column of FLOAT or INTEGER data types whose percentiles will be calculated. NULL values are ignored.
Parameters
percentiles
- One or more (up to 1000) comma-separated
FLOAT
constants ranging from 0 to 1 inclusive, specifying the percentile values to be calculated.
Note
Note: The deprecated parameter percentile
, which takes only a single float, continues to be supported for backwards-compatibility.
Examples
Tip
For optimal performance when using GROUP BY
in your query, verify that your table is sorted on the GROUP BY
column.
The following examples use this table:
=> CREATE TABLE allsales(state VARCHAR(20), name VARCHAR(20), sales INT) ORDER BY state;
INSERT INTO allsales VALUES('MA', 'A', 60);
INSERT INTO allsales VALUES('NY', 'B', 20);
INSERT INTO allsales VALUES('NY', 'C', 15);
INSERT INTO allsales VALUES('MA', 'D', 20);
INSERT INTO allsales VALUES('MA', 'E', 50);
INSERT INTO allsales VALUES('NY', 'F', 40);
INSERT INTO allsales VALUES('MA', 'G', 10);
COMMIT;
=> SELECT * FROM allsales;
state | name | sales
-------+------+-------
MA | A | 60
NY | B | 20
NY | C | 15
NY | F | 40
MA | D | 20
MA | E | 50
MA | G | 10
(7 rows)
Calculate the approximate percentile for sales in each state:
=> SELECT state, APPROXIMATE_PERCENTILE(sales USING PARAMETERS percentiles='0.5') AS median
FROM allsales GROUP BY state;
state | median
-------+--------
MA | 35
NY | 20
(2 rows)
Calculate multiple approximate percentiles for sales in each state:
=> SELECT state, APPROXIMATE_PERCENTILE(sales USING PARAMETERS percentiles='0.5,1.0')
FROM allsales GROUP BY state;
state | APPROXIMATE_PERCENTILE
-------+--------
MA | [35.0,60.0]
NY | [20.0,40.0]
(2 rows)
Calculate multiple approximate percentiles for sales in each state and show results for each percentile in separate columns:
=> SELECT ps[0] as q0, ps[1] as q1, ps[2] as q2, ps[3] as q3, ps[4] as q4
FROM (SELECT APPROXIMATE_PERCENTILE(sales USING PARAMETERS percentiles='0, 0.25, 0.5, 0.75, 1')
AS ps FROM allsales GROUP BY state) as s1;
q0 | q1 | q2 | q3 | q4
------+------+------+------+------
10.0 | 17.5 | 35.0 | 52.5 | 60.0
15.0 | 17.5 | 20.0 | 30.0 | 40.0
(2 rows)
See also
7 - APPROXIMATE_QUANTILES
Computes an array of weighted, approximate percentiles of a column within some user-specified error.
Computes an array of weighted, approximate percentiles of a column within some user-specified error. This algorithm is similar to APPROXIMATE_PERCENTILE [aggregate], which instead returns a single percentile.
The performance of this function depends entirely on the specified epsilon and the size of the provided array.
The OVER clause for this function must be empty.
Behavior type
Immutable
Syntax
APPROXIMATE_QUANTILES ( column USING PARAMETERS [nquantiles=n], [epsilon=error] ) OVER() FROM table
Parameters
column
- The
INTEGER
or FLOAT
column for which to calculate the percentiles. NULL values are ignored.
n
- An integer that specifies the number of desired quantiles in the returned array.
Default: 11
error
- The allowed error for any returned percentile. Specifically, for an array of size N, the specified error ε (epsilon) for the φ-quantile guarantees that the rank r of the return value with respect to the rank ⌊φN⌋ of the exact value is such that:
⌊(φ-ε)N⌋ ≤ r ≤ ⌊(φ+ε)N⌋
For n quantiles, if the error ε is specified such that ε > 1/n, this function will return non-deterministic results.
Default: 0.001
table
- The table containing
column
.
Examples
The following example uses this table:
=> CREATE TABLE allsales(state VARCHAR(20), name VARCHAR(20), sales INT) ORDER BY state;
INSERT INTO allsales VALUES('MA', 'A', 60);
INSERT INTO allsales VALUES('NY', 'B', 20);
INSERT INTO allsales VALUES('NY', 'C', 15);
INSERT INTO allsales VALUES('MA', 'D', 20);
INSERT INTO allsales VALUES('MA', 'E', 50);
INSERT INTO allsales VALUES('NY', 'F', 40);
INSERT INTO allsales VALUES('MA', 'G', 10);
COMMIT;
=> SELECT * FROM allsales;
state | name | sales
-------+------+-------
MA | A | 60
NY | B | 20
NY | C | 15
NY | F | 40
MA | D | 20
MA | E | 50
MA | G | 10
(7 rows)
This call to APPROXIMATE_QUANTILES returns a 6-element array of approximate percentiles, one for each quantile. Each quantile relates to the percentile by a factor of 100. For example, the second entry in the output indicates that 15 is the 0.2-quantile of the input column, so 15 is the 20th percentile of the input column.
=> SELECT APPROXIMATE_QUANTILES(sales USING PARAMETERS nquantiles=6) OVER() FROM allsales;
Quantile | Value
----------+-------
0 | 10
0.2 | 15
0.4 | 20
0.6 | 40
0.8 | 50
1 | 60
(6 rows)
8 - ARGMAX_AGG
Takes two arguments target and arg, where both are columns or column expressions in the queried dataset.
Takes two arguments target
and arg
, where both are columns or column expressions in the queried dataset. ARGMAX_AGG finds the row with the highest non-null value in target
and returns the value of arg
in that row. If multiple rows contain the highest target
value, ARGMAX_AGG returns arg
from the first row that it finds. Use the WITHIN GROUP ORDER BY clause to control which row ARGMAX_AGG finds first.
Behavior type
Immutable if the WITHIN GROUP ORDER BY clause specifies a column or set of columns that resolves to unique values within the group; otherwise Volatile.
Syntax
ARGMAX_AGG ( target, arg ) [ within-group-order-by-clause ]
Arguments
target
, arg
- Columns in the queried dataset.
Note
The
target
argument cannot reference a
spatial data type column, GEOMETRY or GEOGRAPHY.
- [within-group-order-by-clause](/en/sql-reference/functions/aggregate-functions/within-group-order-by-clause/)
- Sorts target values within each group of rows:
WITHIN GROUP (ORDER BY { column-expression[ sort-qualifiers ] }[,...])
sort-qualifiers
:
{ ASC | DESC [ NULLS { FIRST | LAST | AUTO } ] }
Use this clause to determine which row is returned when multiple rows contain the highest target value; otherwise, results are likely to vary with each iteration of the same query.
Tip
WITHIN GROUP ORDER BY can consume a large amount of memory per group. To minimize memory consumption, create projections that support
GROUPBY PIPELINED.
Examples
The following example calls ARGMAX_AGG in a WITH clause to find which employees in each region are at or near retirement age. If multiple employees within each region have the same age, ARGMAX_AGG chooses the employees with the highest salary level and returns with their IDs. The primary query returns with details on the employees selected from each region:
=> WITH r AS (SELECT employee_region, ARGMAX_AGG(employee_age, employee_key)
WITHIN GROUP (ORDER BY annual_salary DESC) emp_id
FROM employee_dim GROUP BY employee_region ORDER BY employee_region)
SELECT r.employee_region, ed.annual_salary AS highest_salary, employee_key,
ed.employee_first_name||' '||ed.employee_last_name AS employee_name, ed.employee_age
FROM r JOIN employee_dim ed ON r.emp_id = ed.employee_key ORDER BY ed.employee_region;
employee_region | highest_salary | employee_key | employee_name | employee_age
----------------------------------+----------------+--------------+------------------+--------------
East | 927335 | 70 | Sally Gauthier | 65
MidWest | 177716 | 869 | Rebecca McCabe | 65
NorthWest | 100300 | 7597 | Kim Jefferson | 65
South | 196454 | 275 | Alexandra Harris | 65
SouthWest | 198669 | 1043 | Seth Stein | 65
West | 197203 | 681 | Seth Jones | 65
(6 rows)
See also
ARGMIN_AGG
9 - ARGMIN_AGG
Takes two arguments target and arg, where both are columns or column expressions in the queried dataset.
Takes two arguments target
and arg
, where both are columns or column expressions in the queried dataset. ARGMIN_AGG finds the row with the lowest non-null value in target
and returns the value of arg
in that row. If multiple rows contain the lowest target
value, ARGMIN_AGG returns arg
from the first row that it finds. Use the WITHIN GROUP ORDER BY clause to control which row ARGMMIN_AGG finds first.
Behavior type
Immutable if the WITHIN GROUP ORDER BY clause specifies a column or set of columns that resolves to unique values within the group; otherwise Volatile.
Syntax
ARGMIN_AGG ( target, arg ) [ within-group-order-by-clause ]
Arguments
target
, arg
- Columns in the queried dataset.
Note
The
target
argument cannot reference a
spatial data type column, GEOMETRY or GEOGRAPHY.
- [within-group-order-by-clause](/en/sql-reference/functions/aggregate-functions/within-group-order-by-clause/)
- Sorts target values within each group of rows:
WITHIN GROUP (ORDER BY { column-expression[ sort-qualifiers ] }[,...])
sort-qualifiers
:
{ ASC | DESC [ NULLS { FIRST | LAST | AUTO } ] }
Use this clause to determine which row is returned when multiple rows contain the lowest target value; otherwise, results are likely to vary with each iteration of the same query.
Tip
WITHIN GROUP ORDER BY can consume a large amount of memory per group. To minimize memory consumption, create projections that support
GROUPBY PIPELINED.
Examples
The following example calls ARGMIN_AGG in a WITH clause to find the lowest salary among all employees in each region, and returns with the lowest-paid employee IDs. The primary query returns with the salary amounts and employee names:
=> WITH msr (employee_region, emp_id) AS
(SELECT employee_region, argmin_agg(annual_salary, employee_key) lowest_paid_employee FROM employee_dim GROUP BY employee_region)
SELECT msr.employee_region, ed.annual_salary AS lowest_salary, ed.employee_first_name||' '||ed.employee_last_name AS employee_name
FROM msr JOIN employee_dim ed ON msr.emp_id = ed.employee_key ORDER BY annual_salary DESC;
employee_region | lowest_salary | employee_name
----------------------------------+---------------+-----------------
NorthWest | 20913 | Raja Garnett
SouthWest | 20750 | Seth Moore
West | 20443 | Midori Taylor
South | 20363 | David Bauer
East | 20306 | Craig Jefferson
MidWest | 20264 | Dean Vu
(6 rows)
See also
ARGMAX_AGG
10 - AVG [aggregate]
Computes the average (arithmetic mean) of an expression over a group of rows.
Computes the average (arithmetic mean) of an expression over a group of rows. AVG always returns a DOUBLE PRECISION value.
The AVG aggregate function differs from the AVG analytic function, which computes the average of an expression over a group of rows within a window.
Behavior type
Immutable
Syntax
AVG ( [ ALL | DISTINCT ] expression )
Parameters
ALL
- Invokes the aggregate function for all rows in the group (default).
DISTINCT
- Invokes the aggregate function for all distinct non-null values of the expression found in the group.
expression
- The value whose average is calculated over a set of rows, any expression that can have a DOUBLE PRECISION result.
Overflow handling
By default, Vertica allows silent numeric overflow when you call this function on numeric data types. For more information on this behavior and how to change it, seeNumeric data type overflow with SUM, SUM_FLOAT, and AVG.
Examples
The following query returns the average income from the customer table:
=> SELECT AVG(annual_income) FROM customer_dimension;
AVG
--------------
2104270.6485
(1 row)
See also
11 - BIT_AND
Takes the bitwise AND of all non-null input values.
Takes the bitwise AND of all non-null input values. If the input parameter is NULL, the return value is also NULL.
Behavior type
Immutable
Syntax
BIT_AND ( expression )
Parameters
expression
- The BINARY or VARBINARY input value to evaluate. BIT_AND operates on VARBINARY types explicitly and on BINARY types implicitly through casts.
Returns
BIT_AND returns:
If the columns are different lengths, the return values are treated as though they are all equal in length and are right-extended with zero bytes. For example, given a group containing hex values ff
, null
, and f
, BIT_AND
ignores the null value and extends the value f
to f0
.
Examples
The example that follows uses table t
with a single column of VARBINARY
data type:
=> CREATE TABLE t ( c VARBINARY(2) );
=> INSERT INTO t values(HEX_TO_BINARY('0xFF00'));
=> INSERT INTO t values(HEX_TO_BINARY('0xFFFF'));
=> INSERT INTO t values(HEX_TO_BINARY('0xF00F'));
Query table t
to see column c
output:
=> SELECT TO_HEX(c) FROM t;
TO_HEX
--------
ff00
ffff
f00f
(3 rows)
Query table t
to get the AND value for column c
:
=> SELECT TO_HEX(BIT_AND(c)) FROM t;
TO_HEX
--------
f000
(1 row)
The function is applied pairwise to all values in the group, resulting in f000
, which is determined as follows:
-
ff00
(record 1) is compared with ffff
(record 2), which results in ff00
.
-
The result from the previous comparison is compared with f00f
(record 3), which results in f000
.
See also
Binary data types (BINARY and VARBINARY)
12 - BIT_OR
Takes the bitwise OR of all non-null input values.
Takes the bitwise OR of all non-null input values. If the input parameter is NULL, the return value is also NULL.
Behavior type
Immutable
Syntax
BIT_OR ( expression )
Parameters
expression
- The BINARY or VARBINARY input value to evaluate. BIT_OR operates on VARBINARY types explicitly and on BINARY types implicitly through casts.
Returns
BIT_OR
returns:
If the columns are different lengths, the return values are treated as though they are all equal in length and are right-extended with zero bytes. For example, given a group containing hex values ff
, null
, and f
, the function ignores the null value and extends the value f
to f0
.
Examples
The example that follows uses table t
with a single column of VARBINARY
data type:
=> CREATE TABLE t ( c VARBINARY(2) );
=> INSERT INTO t values(HEX_TO_BINARY('0xFF00'));
=> INSERT INTO t values(HEX_TO_BINARY('0xFFFF'));
=> INSERT INTO t values(HEX_TO_BINARY('0xF00F'));
Query table t
to see column c
output:
=> SELECT TO_HEX(c) FROM t;
TO_HEX
--------
ff00
ffff
f00f
(3 rows)
Query table t
to get the OR value for column c
:
=> SELECT TO_HEX(BIT_OR(c)) FROM t;
TO_HEX
--------
ffff
(1 row)
The function is applied pairwise to all values in the group, resulting in ffff
, which is determined as follows:
-
ff00
(record 1) is compared with ffff
, which results in ffff
.
-
The ff00
result from the previous comparison is compared with f00f
(record 3), which results in ffff
.
See also
Binary data types (BINARY and VARBINARY)
13 - BIT_XOR
Takes the bitwise XOR of all non-null input values.
Takes the bitwise XOR
of all non-null input values. If the input parameter is NULL
, the return value is also NULL
.
Behavior type
Immutable
Syntax
BIT_XOR ( expression )
Parameters
expression
- The
BINARY
or VARBINARY
input value to evaluate. BIT_XOR
operates on VARBINARY
types explicitly and on BINARY
types implicitly through casts.
Returns
BIT_XOR
returns:
-
The same value as the argument data type.
-
1 for each bit compared, if there are an odd number of arguments with set bits; otherwise 0.
If the columns are different lengths, the return values are treated as though they are all equal in length and are right-extended with zero bytes. For example, given a group containing hex values ff
, null
, and f
, the function ignores the null value and extends the value f
to f0
.
Examples
First create a sample table and projections with binary columns:
The example that follows uses table t
with a single column of VARBINARY
data type:
=> CREATE TABLE t ( c VARBINARY(2) );
=> INSERT INTO t values(HEX_TO_BINARY('0xFF00'));
=> INSERT INTO t values(HEX_TO_BINARY('0xFFFF'));
=> INSERT INTO t values(HEX_TO_BINARY('0xF00F'));
Query table t
to see column c
output:
=> SELECT TO_HEX(c) FROM t;
TO_HEX
--------
ff00
ffff
f00f
(3 rows)
Query table t
to get the XOR value for column c
:
=> SELECT TO_HEX(BIT_XOR(c)) FROM t;
TO_HEX
--------
f0f0
(1 row)
See also
Binary data types (BINARY and VARBINARY)
14 - BOOL_AND [aggregate]
Processes Boolean values and returns a Boolean value result.
Processes Boolean values and returns a Boolean value result. If all input values are true, BOOL_AND
returns t
. Otherwise it returns f
(false).
Behavior type
Immutable
Syntax
BOOL_AND ( expression )
Parameters
expression
- A Boolean data type or any non-Boolean data type that can be implicitly coerced to a Boolean data type.
Examples
The following example shows how to use aggregate functions BOOL_AND
, BOOL_OR
, and BOOL_XOR
. The sample table mixers
includes columns for models and colors.
=> CREATE TABLE mixers(model VARCHAR(20), colors VARCHAR(20));
CREATE TABLE
Insert sample data into the table. The sample adds two color fields for each model.
=> INSERT INTO mixers
SELECT 'beginner', 'green'
UNION ALL
SELECT 'intermediate', 'blue'
UNION ALL
SELECT 'intermediate', 'blue'
UNION ALL
SELECT 'advanced', 'green'
UNION ALL
SELECT 'advanced', 'blue'
UNION ALL
SELECT 'professional', 'blue'
UNION ALL
SELECT 'professional', 'green'
UNION ALL
SELECT 'beginner', 'green';
OUTPUT
--------
8
(1 row)
Query the table. The result shows models that have two blue (BOOL_AND
), one or two blue (BOOL_OR
), and specifically not more than one blue (BOOL_XOR
) mixer.
=> SELECT model,
BOOL_AND(colors= 'blue')AS two_blue,
BOOL_OR(colors= 'blue')AS one_or_two_blue,
BOOL_XOR(colors= 'blue')AS specifically_not_more_than_one_blue
FROM mixers
GROUP BY model;
model | two_blue | one_or_two_blue | specifically_not_more_than_one_blue
--------------+----------+-----------------+-------------------------------------
advanced | f | t | t
beginner | f | f | f
intermediate | t | t | f
professional | f | t | t
(4 rows)
See also
15 - BOOL_OR [aggregate]
Processes Boolean values and returns a Boolean value result.
Processes Boolean values and returns a Boolean value result. If at least one input value is true, BOOL_OR
returns t
. Otherwise, it returns f
.
Behavior type
Immutable
Syntax
BOOL_OR ( expression )
Parameters
expression
- A Boolean data type or any non-Boolean data type that can be implicitly coerced to a Boolean data type.
Examples
The following example shows how to use aggregate functions BOOL_AND
, BOOL_OR
, and BOOL_XOR
. The sample table mixers
includes columns for models and colors.
=> CREATE TABLE mixers(model VARCHAR(20), colors VARCHAR(20));
CREATE TABLE
Insert sample data into the table. The sample adds two color fields for each model.
=> INSERT INTO mixers
SELECT 'beginner', 'green'
UNION ALL
SELECT 'intermediate', 'blue'
UNION ALL
SELECT 'intermediate', 'blue'
UNION ALL
SELECT 'advanced', 'green'
UNION ALL
SELECT 'advanced', 'blue'
UNION ALL
SELECT 'professional', 'blue'
UNION ALL
SELECT 'professional', 'green'
UNION ALL
SELECT 'beginner', 'green';
OUTPUT
--------
8
(1 row)
Query the table. The result shows models that have two blue (BOOL_AND
), one or two blue (BOOL_OR
), and specifically not more than one blue (BOOL_XOR
) mixer.
=> SELECT model,
BOOL_AND(colors= 'blue')AS two_blue,
BOOL_OR(colors= 'blue')AS one_or_two_blue,
BOOL_XOR(colors= 'blue')AS specifically_not_more_than_one_blue
FROM mixers
GROUP BY model;
model | two_blue | one_or_two_blue | specifically_not_more_than_one_blue
--------------+----------+-----------------+-------------------------------------
advanced | f | t | t
beginner | f | f | f
intermediate | t | t | f
professional | f | t | t
(4 rows)
See also
16 - BOOL_XOR [aggregate]
Processes Boolean values and returns a Boolean value result.
Processes Boolean values and returns a Boolean value result. If specifically only one input value is true, BOOL_XOR
returns t
. Otherwise, it returns f
.
Behavior type
Immutable
Syntax
BOOL_XOR ( expression )
Parameters
expression
- A Boolean data type or any non-Boolean data type that can be implicitly coerced to a Boolean data type.
Examples
The following example shows how to use aggregate functions BOOL_AND
, BOOL_OR
, and BOOL_XOR
. The sample table mixers
includes columns for models and colors.
=> CREATE TABLE mixers(model VARCHAR(20), colors VARCHAR(20));
CREATE TABLE
Insert sample data into the table. The sample adds two color fields for each model.
=> INSERT INTO mixers
SELECT 'beginner', 'green'
UNION ALL
SELECT 'intermediate', 'blue'
UNION ALL
SELECT 'intermediate', 'blue'
UNION ALL
SELECT 'advanced', 'green'
UNION ALL
SELECT 'advanced', 'blue'
UNION ALL
SELECT 'professional', 'blue'
UNION ALL
SELECT 'professional', 'green'
UNION ALL
SELECT 'beginner', 'green';
OUTPUT
--------
8
(1 row)
Query the table. The result shows models that have two blue (BOOL_AND
), one or two blue (BOOL_OR
), and specifically not more than one blue (BOOL_XOR
) mixer.
=> SELECT model,
BOOL_AND(colors= 'blue')AS two_blue,
BOOL_OR(colors= 'blue')AS one_or_two_blue,
BOOL_XOR(colors= 'blue')AS specifically_not_more_than_one_blue
FROM mixers
GROUP BY model;
model | two_blue | one_or_two_blue | specifically_not_more_than_one_blue
--------------+----------+-----------------+-------------------------------------
advanced | f | t | t
beginner | f | f | f
intermediate | t | t | f
professional | f | t | t
(4 rows)
See also
17 - CORR
Returns the DOUBLE PRECISION coefficient of correlation of a set of expression pairs, as per the Pearson correlation coefficient.
Returns the DOUBLE PRECISION
coefficient of correlation of a set of expression pairs, as per the Pearson correlation coefficient. CORR
eliminates expression pairs where either expression in the pair is NULL
. If no rows remain, the function returns NULL
.
Syntax
CORR ( expression1, expression2 )
Parameters
expression1
- The dependent
DOUBLE PRECISION
expression
expression2
- The independent
DOUBLE PRECISION
expression
Examples
=> SELECT CORR (Annual_salary, Employee_age) FROM employee_dimension;
CORR
----------------------
-0.00719153413192422
(1 row)
18 - COUNT [aggregate]
Returns as a BIGINT the number of rows in each group where the expression is not NULL.
Returns as a BIGINT the number of rows in each group where the expression is not NULL. If the query has no GROUP BY clause, COUNT returns the number of table rows.
The COUNT aggregate function differs from the COUNT analytic function, which returns the number over a group of rows within a window.
Behavior type
Immutable
Syntax
COUNT ( [ * ] [ ALL | DISTINCT ] expression )
Parameters
*
- Specifies to count all rows in the specified table or each group.
ALL | DISTINCT
- Specifies how to count rows where
expression
has a non-null value:
expression
- The column or expression whose non-null values are counted.
Examples
The following query returns the number of distinct values in a column:
=> SELECT COUNT (DISTINCT date_key) FROM date_dimension;
COUNT
-------
1826
(1 row)
This example returns the number of distinct return values from an expression:
=> SELECT COUNT (DISTINCT date_key + product_key) FROM inventory_fact;
COUNT
-------
21560
(1 row)
You can create an equivalent query using the LIMIT keyword to restrict the number of rows returned:
=> SELECT COUNT(date_key + product_key) FROM inventory_fact GROUP BY date_key LIMIT 10;
COUNT
-------
173
31
321
113
286
84
244
238
145
202
(10 rows)
The following query uses GROUP BY to count distinct values within groups:
=> SELECT product_key, COUNT (DISTINCT date_key) FROM inventory_fact
GROUP BY product_key LIMIT 10;
product_key | count
-------------+-------
1 | 12
2 | 18
3 | 13
4 | 17
5 | 11
6 | 14
7 | 13
8 | 17
9 | 15
10 | 12
(10 rows)
The following query returns the number of distinct products and the total inventory within each date key:
=> SELECT date_key, COUNT (DISTINCT product_key), SUM(qty_in_stock) FROM inventory_fact
GROUP BY date_key LIMIT 10;
date_key | count | sum
----------+-------+--------
1 | 173 | 88953
2 | 31 | 16315
3 | 318 | 156003
4 | 113 | 53341
5 | 285 | 148380
6 | 84 | 42421
7 | 241 | 119315
8 | 238 | 122380
9 | 142 | 70151
10 | 202 | 95274
(10 rows)
This query selects each distinct product_key
value and then counts the number of distinct date_key
values for all records with the specific product_key
value. It also counts the number of distinct warehouse_key
values in all records with the specific product_key
value:
=> SELECT product_key, COUNT (DISTINCT date_key), COUNT (DISTINCT warehouse_key) FROM inventory_fact
GROUP BY product_key LIMIT 15;
product_key | count | count
-------------+-------+-------
1 | 12 | 12
2 | 18 | 18
3 | 13 | 12
4 | 17 | 18
5 | 11 | 9
6 | 14 | 13
7 | 13 | 13
8 | 17 | 15
9 | 15 | 14
10 | 12 | 12
11 | 11 | 11
12 | 13 | 12
13 | 9 | 7
14 | 13 | 13
15 | 18 | 17
(15 rows)
This query selects each distinct product_key
value, counts the number of distinct date_key
and warehouse_key
values for all records with the specific product_key
value, and then sums all qty_in_stock
values in records with the specific product_key
value. It then returns the number of product_version
values in records with the specific product_key
value:
=> SELECT product_key, COUNT (DISTINCT date_key),
COUNT (DISTINCT warehouse_key),
SUM (qty_in_stock),
COUNT (product_version)
FROM inventory_fact GROUP BY product_key LIMIT 15;
product_key | count | count | sum | count
-------------+-------+-------+-------+-------
1 | 12 | 12 | 5530 | 12
2 | 18 | 18 | 9605 | 18
3 | 13 | 12 | 8404 | 13
4 | 17 | 18 | 10006 | 18
5 | 11 | 9 | 4794 | 11
6 | 14 | 13 | 7359 | 14
7 | 13 | 13 | 7828 | 13
8 | 17 | 15 | 9074 | 17
9 | 15 | 14 | 7032 | 15
10 | 12 | 12 | 5359 | 12
11 | 11 | 11 | 6049 | 11
12 | 13 | 12 | 6075 | 13
13 | 9 | 7 | 3470 | 9
14 | 13 | 13 | 5125 | 13
15 | 18 | 17 | 9277 | 18
(15 rows)
See also
19 - COVAR_POP
Returns the population covariance for a set of expression pairs.
Returns the population covariance for a set of expression pairs. The return value is of type DOUBLE PRECISION
. COVAR_POP
eliminates expression pairs where either expression in the pair is NULL
. If no rows remain, the function returns NULL
.
Syntax
SELECT COVAR_POP ( expression1, expression2 )
Parameters
expression1
- The dependent
DOUBLE PRECISION
expression
expression2
- The independent
DOUBLE PRECISION
expression
Examples
=> SELECT COVAR_POP (Annual_salary, Employee_age)
FROM employee_dimension;
COVAR_POP
-------------------
-9032.34810730019
(1 row)
20 - COVAR_SAMP
Returns the sample covariance for a set of expression pairs.
Returns the sample covariance for a set of expression pairs. The return value is of type DOUBLE PRECISION
. COVAR_SAMP
eliminates expression pairs where either expression in the pair is NULL
. If no rows remain, the function returns NULL
.
Syntax
SELECT COVAR_SAMP ( expression1, expression2 )
Parameters
expression1
- The dependent
DOUBLE PRECISION
expression
expression2
- The independent
DOUBLE PRECISION
expression
Examples
=> SELECT COVAR_SAMP (Annual_salary, Employee_age)
FROM employee_dimension;
COVAR_SAMP
-------------------
-9033.25143244343
(1 row)
21 - GROUP_ID
Uniquely identifies duplicate sets for GROUP BY queries that return duplicate grouping sets.
Uniquely identifies duplicate sets for GROUP BY queries that return duplicate grouping sets. This function returns one or more integers, starting with zero (0), as identifiers.
For the number of duplicates n
for a particular grouping, GROUP_ID returns a range of sequential numbers, 0 to n
–1. For the first each unique group it encounters, GROUP_ID returns the value 0. If GROUP_ID finds the same grouping again, the function returns 1, then returns 2 for the next found grouping, and so on.
Behavior type
Immutable
Syntax
GROUP_ID ()
Examples
This example shows how GROUP_ID creates unique identifiers when a query produces duplicate groupings. For an expenses table, the following query groups the results by category of expense and year and rolls up the sum for those two columns. The results have duplicate groupings for category and NULL. The first grouping has a GROUP_ID of 0, and the second grouping has a GROUP_ID of 1.
=> SELECT Category, Year, SUM(Amount), GROUPING_ID(Category, Year),
GROUP_ID() FROM expenses GROUP BY Category, ROLLUP(Category,Year)
ORDER BY Category, Year, GROUPING_ID();
Category | Year | SUM | GROUPING_ID | GROUP_ID
-------------+------+--------+-------------+----------
Books | 2005 | 39.98 | 0 | 0
Books | 2007 | 29.99 | 0 | 0
Books | 2008 | 29.99 | 0 | 0
Books | | 99.96 | 1 | 0
Books | | 99.96 | 1 | 1
Electricity | 2005 | 109.99 | 0 | 0
Electricity | 2006 | 109.99 | 0 | 0
Electricity | 2007 | 229.98 | 0 | 0
Electricity | | 449.96 | 1 | 1
Electricity | | 449.96 | 1 | 0
See also
22 - GROUPING
Disambiguates the use of NULL values when GROUP BY queries with multilevel aggregates generate NULL values to identify subtotals in grouping columns.
Disambiguates the use of NULL
values when GROUP BY
queries with multilevel aggregates generate NULL values to identify subtotals in grouping columns. Such NULL
values from the original data can also occur in rows. GROUPING
returns 1, if the value of expression
is:
-
NULL
, representing an aggregated value
-
0 for any other value, including NULL
values in rows
Behavior type
Immutable
Syntax
GROUPING ( expression )
Parameters
expression
- An expression in the
GROUP BY
clause
Examples
The following query uses the GROUPING
function, taking one of the GROUP BY
expressions as an argument. For each row, GROUPING
returns one of the following:
The 1 in the GROUPING(Year)
column for electricity and books indicates that these values are subtotals. The right-most column values for both GROUPING(Category)
and GROUPING(Year)
are 1
. This value indicates that neither column contributed to the GROUP BY
. The final row represents the total sales.
=> SELECT Category, Year, SUM(Amount),
GROUPING(Category), GROUPING(Year) FROM expenses
GROUP BY ROLLUP(Category, Year) ORDER BY Category, Year, GROUPING_ID();
Category | Year | SUM | GROUPING | GROUPING
-------------+------+--------+----------+----------
Books | 2005 | 39.98 | 0 | 0
Books | 2007 | 29.99 | 0 | 0
Books | 2008 | 29.99 | 0 | 0
Books | | 99.96 | 0 | 1
Electricity | 2005 | 109.99 | 0 | 0
Electricity | 2006 | 109.99 | 0 | 0
Electricity | 2007 | 229.98 | 0 | 0
Electricity | | 449.96 | 0 | 1
| | 549.92 | 1 | 1
See also
23 - GROUPING_ID
Concatenates the set of Boolean values generated by the GROUPING function into a bit vector.
Concatenates the set of Boolean values generated by the GROUPING function into a bit vector. GROUPING_ID
treats the bit vector as a binary number and returns it as a base-10 value that identifies the grouping set combination.
By using GROUPING_ID
you avoid the need for multiple, individual GROUPING functions. GROUPING_ID
simplifies row-filtering conditions, because rows of interest are identified using a single return from GROUPING_ID =
n
. Use GROUPING_ID
to identify grouping combinations.
Behavior type
Immutable
Syntax
GROUPING_ID ( [expression[,...] )
expression
- An expression that matches one of the expressions in the
GROUP BY
clause.
If the GROUP BY
clause includes a list of expressions, GROUPING_ID
returns a number corresponding to the GROUPING
bit vector associated with a row.
Examples
This example shows how calling GROUPING_ID
without an expression returns the GROUPING bit vector associated with a full set of multilevel aggregate expressions. The GROUPING_ID
value is comparable to GROUPING_ID(a,b)
because GROUPING_ID()
includes all columns in the GROUP BY ROLLUP
:
=> SELECT a,b,COUNT(*), GROUPING_ID() FROM T GROUP BY ROLLUP(a,b);
In the following query, the GROUPING(Category)
and GROUPING(Year)
columns have three combinations:
=> SELECT Category, Year, SUM(Amount),
GROUPING(Category), GROUPING(Year) FROM expenses
GROUP BY ROLLUP(Category, Year) ORDER BY Category, Year, GROUPING_ID();
Category | Year | SUM | GROUPING | GROUPING
-------------+------+--------+----------+----------
Books | 2005 | 39.98 | 0 | 0
Books | 2007 | 29.99 | 0 | 0
Books | 2008 | 29.99 | 0 | 0
Books | | 99.96 | 0 | 1
Electricity | 2005 | 109.99 | 0 | 0
Electricity | 2006 | 109.99 | 0 | 0
Electricity | 2007 | 229.98 | 0 | 0
Electricity | | 449.96 | 0 | 1
| | 549.92 | 1 | 1
GROUPING_ID
converts these values as follows:
Binary Set Values |
Decimal Equivalents |
00 |
0 |
01 |
1 |
11 |
3 |
0 |
Category, Year |
The following query returns the single number for each GROUP BY
level that appears in the gr_id column:
=> SELECT Category, Year, SUM(Amount),
GROUPING(Category),GROUPING(Year),GROUPING_ID(Category,Year) AS gr_id
FROM expenses GROUP BY ROLLUP(Category, Year);
Category | Year | SUM | GROUPING | GROUPING | gr_id
-------------+------+--------+----------+----------+-------
Books | 2008 | 29.99 | 0 | 0 | 0
Books | 2005 | 39.98 | 0 | 0 | 0
Electricity | 2007 | 229.98 | 0 | 0 | 0
Books | 2007 | 29.99 | 0 | 0 | 0
Electricity | 2005 | 109.99 | 0 | 0 | 0
Electricity | | 449.96 | 0 | 1 | 1
| | 549.92 | 1 | 1 | 3
Electricity | 2006 | 109.99 | 0 | 0 | 0
Books | | 99.96 | 0 | 1 | 1
The gr_id
value determines the GROUP BY
level for each row:
- GROUP BY Level
- GROUP BY Row Level
- 3
- Total sum
- 1
- Category
- 0
- Category, year
You can also use the DECODE function to give the values more meaning by comparing each search value individually:
=> SELECT Category, Year, SUM(AMOUNT), DECODE(GROUPING_ID(Category, Year),
3, 'Total',
1, 'Category',
0, 'Category,Year')
AS GROUP_NAME FROM expenses GROUP BY ROLLUP(Category, Year);
Category | Year | SUM | GROUP_NAME
-------------+------+--------+---------------
Electricity | 2006 | 109.99 | Category,Year
Books | | 99.96 | Category
Electricity | 2007 | 229.98 | Category,Year
Books | 2007 | 29.99 | Category,Year
Electricity | 2005 | 109.99 | Category,Year
Electricity | | 449.96 | Category
| | 549.92 | Total
Books | 2005 | 39.98 | Category,Year
Books | 2008 | 29.99 | Category,Year
See also
24 - LISTAGG
Transforms non-null values from a group of rows into a list of values that are delimited by commas (default) or a configurable separator.
Transforms non-null values from a group of rows into a list of values that are delimited by commas (default) or a configurable separator. LISTAGG can be used to denormalize rows into a string of concatenated values.
Behavior type
Immutable if the WITHIN GROUP ORDER BY clause specifies a column or set of columns that resolves to unique values within the aggregated list; otherwise Volatile.
Syntax
LISTAGG ( aggregate-expression [ USING PARAMETERS parameter=value][,...] ] ) [ within-group-order-by-clause ]
Arguments
aggregate-expression
- Aggregation of one or more columns or column expressions to select from the source table or view.
LISTAGG does not support spatial data types directly. In order to pass column data of this type, convert the data to strings with the geospatial function ST_AsText.
Caution
Converted spatial data frequently contains commas. LISTAGG uses comma as the default separator character. To avoid ambiguous output, override this default by setting the function's separator
parameter to another character.
- [within-group-order-by-clause](/en/sql-reference/functions/aggregate-functions/within-group-order-by-clause/)
- Sorts aggregated values within each group of rows, where
column-expression
is typically a column in aggregate-expression
:
WITHIN GROUP (ORDER BY { column-expression[ sort-qualifiers ] }[,...])
sort-qualifiers
:
{ ASC | DESC [ NULLS { FIRST | LAST | AUTO } ] }
Tip
WITHIN GROUP ORDER BY can consume a large amount of memory per group. Including wide strings in the aggregate expression can also adversely affect performance. To minimize memory consumption, create projections that support
GROUPBY PIPELINED.
Parameters
Parameter name |
Set to... |
max_length |
An integer or integer expression that specifies in bytes the maximum length of the result, up to 32M.
Default: 1024
|
separator |
Separator string of length 0 to 80, inclusive. A length of 0 concatenates the output with no separators.
Default: comma (, )
|
on_overflow |
Specifies behavior when the result overflows the max_length setting, one of the following strings:
-
ERROR (default): Return an error when overflow occurs.
-
TRUNCATE : Remove any characters that exceed max_length setting from the query result, and return the truncated string.
|
Privileges
None
Examples
In the following query, the aggregated results in the CityStat
e column use the string " | " as a separator. The outer GROUP BY clause groups the output rows according to their Region
values. Within each group, the aggregated list items are sorted according to their city
values, as per the WITHIN GROUP ORDER BY clause:
=> \x
Expanded display is on.
=> WITH cd AS (SELECT DISTINCT (customer_city) city, customer_state, customer_region FROM customer_dimension)
SELECT customer_region Region, LISTAGG(city||', '||customer_state USING PARAMETERS separator=' | ')
WITHIN GROUP (ORDER BY city) CityAndState FROM cd GROUP BY region ORDER BY region;
-[ RECORD 1 ]+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Region | East
CityAndState | Alexandria, VA | Allentown, PA | Baltimore, MD | Boston, MA | Cambridge, MA | Charlotte, NC | Clarksville, TN | Columbia, SC | Elizabeth, NJ | Erie, PA | Fayetteville, NC | Hartford, CT | Lowell, MA | Manchester, NH | Memphis, TN | Nashville, TN | New Haven, CT | New York, NY | Philadelphia, PA | Portsmouth, VA | Stamford, CT | Sterling Heights, MI | Washington, DC | Waterbury, CT
-[ RECORD 2 ]+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Region | MidWest
CityAndState | Ann Arbor, MI | Cedar Rapids, IA | Chicago, IL | Columbus, OH | Detroit, MI | Evansville, IN | Flint, MI | Gary, IN | Green Bay, WI | Indianapolis, IN | Joliet, IL | Lansing, MI | Livonia, MI | Milwaukee, WI | Naperville, IL | Peoria, IL | Sioux Falls, SD | South Bend, IN | Springfield, IL
-[ RECORD 3 ]+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Region | NorthWest
CityAndState | Bellevue, WA | Portland, OR | Seattle, WA
-[ RECORD 4 ]+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Region | South
CityAndState | Abilene, TX | Athens, GA | Austin, TX | Beaumont, TX | Cape Coral, FL | Carrollton, TX | Clearwater, FL | Coral Springs, FL | Dallas, TX | El Paso, TX | Fort Worth, TX | Grand Prairie, TX | Houston, TX | Independence, MS | Jacksonville, FL | Lafayette, LA | McAllen, TX | Mesquite, TX | San Antonio, TX | Savannah, GA | Waco, TX | Wichita Falls, TX
-[ RECORD 5 ]+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Region | SouthWest
CityAndState | Arvada, CO | Denver, CO | Fort Collins, CO | Gilbert, AZ | Las Vegas, NV | North Las Vegas, NV | Peoria, AZ | Phoenix, AZ | Pueblo, CO | Topeka, KS | Westminster, CO
-[ RECORD 6 ]+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Region | West
CityAndState | Berkeley, CA | Burbank, CA | Concord, CA | Corona, CA | Costa Mesa, CA | Daly City, CA | Downey, CA | El Monte, CA | Escondido, CA | Fontana, CA | Fullerton, CA | Inglewood, CA | Lancaster, CA | Los Angeles, CA | Norwalk, CA | Orange, CA | Palmdale, CA | Pasadena, CA | Provo, UT | Rancho Cucamonga, CA | San Diego, CA | San Francisco, CA | San Jose, CA | Santa Clara, CA | Simi Valley, CA | Sunnyvale, CA | Thousand Oaks, CA | Vallejo, CA | Ventura, CA | West Covina, CA | West Valley City, UT
25 - MAX [aggregate]
Returns the greatest value of an expression over a group of rows.
Returns the greatest value of an expression over a group of rows. The return value has the same type as the expression data type.
The MAX
analytic function function differs from the aggregate function, in that it returns the maximum value of an expression over a group of rows within a window.
Aggregate functions MIN
and MAX
can operate with Boolean values. MAX
can act upon a Boolean data type or a value that can be implicitly converted to a Boolean. If at least one input value is true, MAX
returns t
(true). Otherwise, it returns f
(false). In the same scenario, MIN
returns t
(true) if all input values are true. Otherwise it returns f
.
Behavior type
Immutable
Syntax
MAX ( expression )
Parameters
expression
- Any expression for which the maximum value is calculated, typically a column reference.
Examples
The following query returns the largest value in column sales_dollar_amount
.
=> SELECT MAX(sales_dollar_amount) AS highest_sale FROM store.store_sales_fact;
highest_sale
--------------
600
(1 row)
The following example shows you the difference between the MIN
and MAX
aggregate functions when you use them with a Boolean value. The sample creates a table, adds two rows of data, and shows sample output for MIN
and MAX
.
=> CREATE TABLE min_max_functions (torf BOOL);
=> INSERT INTO min_max_functions VALUES (1);
=> INSERT INTO min_max_functions VALUES (0);
=> SELECT * FROM min_max_functions;
torf
------
t
f
(2 rows)
=> SELECT min(torf) FROM min_max_functions;
min
-----
f
(1 row)
=> SELECT max(torf) FROM min_max_functions;
max
-----
t
(1 row)
See also
Data aggregation
26 - MIN [aggregate]
Returns the smallest value of an expression over a group of rows.
Returns the smallest value of an expression over a group of rows. The return value has the same type as the expression data type.
The MIN
analytic function differs from the aggregate function, in that it returns the minimum value of an expression over a group of rows within a window.
Aggregate functions MIN
and MAX
can operate with Boolean values. MAX
can act upon a Boolean data type or a value that can be implicitly converted to a Boolean. If at least one input value is true, MAX
returns t
(true). Otherwise, it returns f
(false). In the same scenario, MIN
returns t
(true) if all input values are true. Otherwise it returns f
.
Behavior type
Immutable
Syntax
MIN ( expression )
Parameters
expression
- Any expression for which the minimum value is calculated, typically a column reference.
Examples
The following query returns the lowest salary from the employee
dimension table.
This example shows how you can query to return the lowest salary from the employee
dimension table.
=> SELECT MIN(annual_salary) AS lowest_paid FROM employee_dimension;
lowest_paid
-------------
1200
(1 row)
The following example shows you the difference between the MIN
and MAX
aggregate functions when you use them with a Boolean value. The sample creates a table, adds two rows of data, and shows sample output for MIN
and MAX
.
=> CREATE TABLE min_max_functions (torf BOOL);
=> INSERT INTO min_max_functions VALUES (1);
=> INSERT INTO min_max_functions VALUES (0);
=> SELECT * FROM min_max_functions;
torf
------
t
f
(2 rows)
=> SELECT min(torf) FROM min_max_functions;
min
-----
f
(1 row)
=> SELECT max(torf) FROM min_max_functions;
max
-----
t
(1 row)
See also
Data aggregation
27 - REGR_AVGX
Returns the DOUBLE PRECISION average of the independent expression in an expression pair.
Returns the DOUBLE PRECISION
average of the independent expression in an expression pair. REGR_AVGX
eliminates expression pairs where either expression in the pair is NULL
. If no rows remain, REGR_AVGX
returns NULL
.
Syntax
SELECT REGR_AVGX ( expression1, expression2 )
Parameters
expression1
- The dependent
DOUBLE PRECISION
expression
expression2
- The independent
DOUBLE PRECISION
expression
Examples
=> SELECT REGR_AVGX (Annual_salary, Employee_age)
FROM employee_dimension;
REGR_AVGX
-----------
39.321
(1 row)
28 - REGR_AVGY
Returns the DOUBLE PRECISION average of the dependent expression in an expression pair.
Returns the DOUBLE PRECISION
average of the dependent expression in an expression pair. The function eliminates expression pairs where either expression in the pair is NULL
. If no rows remain, the function returns NULL
.
Syntax
REGR_AVGY ( expression1, expression2 )
Parameters
expression1
- The dependent
DOUBLE PRECISION
expression
expression2
- The independent
DOUBLE PRECISION
expression
Examples
=> SELECT REGR_AVGY (Annual_salary, Employee_age)
FROM employee_dimension;
REGR_AVGY
------------
58354.4913
(1 row)
29 - REGR_COUNT
Returns the count of all rows in an expression pair.
Returns the count of all rows in an expression pair. The function eliminates expression pairs where either expression in the pair is NULL
. If no rows remain, the function returns 0
.
Syntax
SELECT REGR_COUNT ( expression1, expression2 )
Parameters
expression1
- The dependent
DOUBLE PRECISION
expression
expression2
- The independent
DOUBLE PRECISION
expression
Examples
=> SELECT REGR_COUNT (Annual_salary, Employee_age) FROM employee_dimension;
REGR_COUNT
------------
10000
(1 row)
30 - REGR_INTERCEPT
Returns the y-intercept of the regression line determined by a set of expression pairs.
Returns the y-intercept of the regression line determined by a set of expression pairs. The return value is of type DOUBLE PRECISION
. REGR_INTERCEPT
eliminates expression pairs where either expression in the pair is NULL
. If no rows remain, REGR_INTERCEPT
returns NULL
.
Syntax
SELECT REGR_INTERCEPT ( expression1, expression2 )
Parameters
expression1
- The dependent
DOUBLE PRECISION
expression
expression2
- The independent
DOUBLE PRECISION
expression
Examples
=> SELECT REGR_INTERCEPT (Annual_salary, Employee_age) FROM employee_dimension;
REGR_INTERCEPT
------------------
59929.5490163437
(1 row)
31 - REGR_R2
Returns the square of the correlation coefficient of a set of expression pairs.
Returns the square of the correlation coefficient of a set of expression pairs. The return value is of type DOUBLE PRECISION
. REGR_R2
eliminates expression pairs where either expression in the pair is NULL
. If no rows remain, REGR_R2
returns NULL
.
Syntax
SELECT REGR_R2 ( expression1, expression2 )
Parameters
expression1
- The dependent
DOUBLE PRECISION
expression
expression2
- The independent
DOUBLE PRECISION
expression
Examples
=> SELECT REGR_R2 (Annual_salary, Employee_age) FROM employee_dimension;
REGR_R2
----------------------
5.17181631706311e-05
(1 row)
32 - REGR_SLOPE
Returns the slope of the regression line, determined by a set of expression pairs.
Returns the slope of the regression line, determined by a set of expression pairs. The return value is of type DOUBLE PRECISION
. REGR_SLOPE
eliminates expression pairs where either expression in the pair is NULL
. If no rows remain, REGR_SLOPE
returns NULL
.
Syntax
SELECT REGR_SLOPE ( expression1, expression2 )
Parameters
expression1
- The dependent
DOUBLE PRECISION
expression
expression2
- The independent
DOUBLE PRECISION
expression
Examples
=> SELECT REGR_SLOPE (Annual_salary, Employee_age) FROM employee_dimension;
REGR_SLOPE
------------------
-40.056400303749
(1 row)
33 - REGR_SXX
Returns the sum of squares of the difference between the independent expression (expression2) and its average.
Returns the sum of squares of the difference between the independent expression (expression2
) and its average.
That is, REGR_SXX returns: ∑[(expression2
- average(expression2
)(expression2
- average(expression2
)]
The return value is of type DOUBLE PRECISION
. REGR_SXX
eliminates expression pairs where either expression in the pair is NULL
. If no rows remain, REGR_SXX
returns NULL
.
Syntax
SELECT REGR_SXX ( expression1, expression2 )
Parameters
expression1
- The dependent
DOUBLE PRECISION
expression
expression2
- The independent
DOUBLE PRECISION
expression
Examples
=> SELECT REGR_SXX (Annual_salary, Employee_age) FROM employee_dimension;
REGR_SXX
------------
2254907.59
(1 row)
34 - REGR_SXY
Returns the sum of products of the difference between the dependent expression (expression1) and its average and the difference between the independent expression (expression2) and its average.
Returns the sum of products of the difference between the dependent expression (expression1
) and its average and the difference between the independent expression (expression2
) and its average.
That is, REGR_SXY returns: ∑[(expression1
- average(expression1
)(expression2
- average(expression2
))]
The return value is of type DOUBLE PRECISION
. REGR_SXY
eliminates expression pairs where either expression in the pair is NULL
. If no rows remain, REGR_SXY
returns NULL
.
Syntax
SELECT REGR_SXY ( expression1, expression2 )
Parameters
expression1
- The dependent
DOUBLE PRECISION
expression
expression2
- The independent
DOUBLE PRECISION
expression
Examples
=> SELECT REGR_SXY (Annual_salary, Employee_age) FROM employee_dimension;
REGR_SXY
-------------------
-90323481.0730019
(1 row)
35 - REGR_SYY
Returns the sum of squares of the difference between the dependent expression (expression1) and its average.
Returns the sum of squares of the difference between the dependent expression (expression1
) and its average.
That is, REGR_SYY returns: ∑[(expression1
- average(expression1
)(expression1
- average(expression1
)]
The return value is of type DOUBLE PRECISION
. REGR_SYY
eliminates expression pairs where either expression in the pair is NULL
. If no rows remain, REGR_SYY
returns NULL
.
Syntax
SELECT REGR_SYY ( expression1, expression2 )
Parameters
expression1
- The dependent
DOUBLE PRECISION
expression
expression2
- The independent
DOUBLE PRECISION
expression
Examples
=> SELECT REGR_SYY (Annual_salary, Employee_age) FROM employee_dimension;
REGR_SYY
------------------
69956728794707.2
(1 row)
36 - STDDEV [aggregate]
Evaluates the statistical sample standard deviation for each member of the group.
Evaluates the statistical sample standard deviation for each member of the group. The return value is the same as the square root of
VAR_SAMP
:
STDDEV(expression) = SQRT(VAR_SAMP(expression))
Behavior type
Immutable
Syntax
STDDEV ( expression )
Parameters
expression
- Any
NUMERIC
data type or any non-numeric data type that can be implicitly converted to a numeric data type. STDDEV
returns the same data type as expression
.
-
Nonstandard function STDDEV
is provided for compatibility with other databases. It is semantically identical to
STDDEV_SAMP
.
-
This aggregate function differs from analytic function
STDDEV
, which computes the statistical sample standard deviation of the current row with respect to the group of rows within a window.
-
When
VAR_SAMP
returns NULL
, STDDEV
returns NULL
.
Examples
The following example returns the statistical sample standard deviation for each household ID from the customer_dimension
table of the VMart example database:
=> SELECT STDDEV(household_id) FROM customer_dimension;
STDDEV
-----------------
8651.5084240071
37 - STDDEV_POP [aggregate]
Evaluates the statistical population standard deviation for each member of the group.
Evaluates the statistical population standard deviation for each member of the group.
Behavior type
Immutable
Syntax
STDDEV_POP ( expression )
Parameters
expression
- Any
NUMERIC
data type or any non-numeric data type that can be implicitly converted to a numeric data type. STDDEV_POP
returns the same data type as expression
.
-
This function differs from the analytic function
STDDEV_POP
, which evaluates the statistical population standard deviation for each member of the group of rows within a window.
-
STDDEV_POP
returns the same value as the square root of
VAR_POP
:
STDDEV_POP(expression) = SQRT(VAR_POP(expression))
-
When
VAR_SAMP
returns NULL
, this function returns NULL
.
Examples
The following example returns the statistical population standard deviation for each household ID in the customer
table.
=> SELECT STDDEV_POP(household_id) FROM customer_dimension;
STDDEV_POP
------------------
8651.41895973367
(1 row)
See also
38 - STDDEV_SAMP [aggregate]
Evaluates the statistical sample standard deviation for each member of the group.
Evaluates the statistical sample standard deviation for each member of the group. The return value is the same as the square root of
VAR_SAMP
:
STDDEV_SAMP(expression) = SQRT(VAR_SAMP(expression))
Behavior type
Immutable
Syntax
STDDEV_SAMP ( expression )
Parameters
expression
- Any
NUMERIC
data type or any non-numeric data type that can be implicitly converted to a numeric data type. STDDEV_SAMP
returns the same data type as expression
.
-
STDDEV_SAMP
is semantically identical to nonstandard function
STDDEV
, which is provided for compatibility with other databases.
-
This aggregate function differs from analytic function
STDDEV_SAMP
, which computes the statistical sample standard deviation of the current row with respect to the group of rows within a window.
-
When
VAR_SAMP
returns NULL
, STDDEV_SAMP
returns NULL
.
Examples
The following example returns the statistical sample standard deviation for each household ID from the customer
dimension table.
=> SELECT STDDEV_SAMP(household_id) FROM customer_dimension;
stddev_samp
------------------
8651.50842400771
(1 row)
39 - SUM [aggregate]
Computes the sum of an expression over a group of rows.
Computes the sum of an expression over a group of rows. SUM
returns a DOUBLE PRECISION
value for a floating-point expression. Otherwise, the return value is the same as the expression data type.
The SUM
aggregate function differs from the
SUM
analytic function, which computes the sum of an expression over a group of rows within a window.
Behavior type
Immutable
Syntax
SUM ( [ ALL | DISTINCT ] expression )
Parameters
ALL
- Invokes the aggregate function for all rows in the group (default)
DISTINCT
- Invokes the aggregate function for all distinct non-null values of the expression found in the group
expression
- Any
NUMERIC
data type or any non-numeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument.
Overflow handling
If you encounter data overflow when using SUM()
, use
SUM_FLOAT
which converts the data to a floating point.
By default, Vertica allows silent numeric overflow when you call this function on numeric data types. For more information on this behavior and how to change it, seeNumeric data type overflow with SUM, SUM_FLOAT, and AVG.
Examples
The following query returns the total sum of the product_cost
column.
=> SELECT SUM(product_cost) AS cost FROM product_dimension;
cost
---------
9042850
(1 row)
See also
40 - SUM_FLOAT [aggregate]
Computes the sum of an expression over a group of rows and returns a DOUBLE PRECISION value.
Computes the sum of an expression over a group of rows and returns a DOUBLE PRECISION
value.
Behavior type
Immutable
Syntax
SUM_FLOAT ( [ ALL | DISTINCT ] expression )
Parameters
ALL
- Invokes the aggregate function for all rows in the group (default).
DISTINCT
- Invokes the aggregate function for all distinct non-null values of the expression found in the group.
expression
- Any expression whose result is type
DOUBLE PRECISION
.
Overflow handling
By default, Vertica allows silent numeric overflow when you call this function on numeric data types. For more information on this behavior and how to change it, seeNumeric data type overflow with SUM, SUM_FLOAT, and AVG.
Examples
The following query returns the floating-point sum of the average price from the product table:
=> SELECT SUM_FLOAT(average_competitor_price) AS cost FROM product_dimension;
cost
----------
18181102
(1 row)
41 - TS_FIRST_VALUE
Processes the data that belongs to each time slice.
Processes the data that belongs to each time slice. A time series aggregate (TSA) function, TS_FIRST_VALUE
returns the value at the start of the time slice, where an interpolation scheme is applied if the timeslice is missing, in which case the value is determined by the values corresponding to the previous (and next) timeslices based on the interpolation scheme of const (linear).
TS_FIRST_VALUE
returns one output row per time slice, or one output row per partition per time slice if partition expressions are specified
Behavior type
Immutable
Syntax
TS_FIRST_VALUE ( expression [ IGNORE NULLS ] [, { 'CONST' | 'LINEAR' } ] )
Parameters
expression
- An
INTEGER
or FLOAT
expression on which to aggregate and interpolate.
IGNORE NULLS
- The
IGNORE NULLS
behavior changes depending on a CONST
or LINEAR
interpolation scheme. See When Time Series Data Contains Nulls in Analyzing Data for details.
'CONST' | 'LINEAR'
- Specifies the interpolation value as constant or linear:
Requirements
You must use an ORDER BY
clause with a TIMESTAMP
column.
Multiple time series aggregate functions
The same query can call multiple time series aggregate functions. They share the same gap-filling policy as defined by the TIMESERIES clause; however, each time series aggregate function can specify its own interpolation policy. For example:
=> SELECT slice_time, symbol,
TS_FIRST_VALUE(bid, 'const') fv_c,
TS_FIRST_VALUE(bid, 'linear') fv_l,
TS_LAST_VALUE(bid, 'const') lv_c
FROM TickStore
TIMESERIES slice_time AS '3 seconds'
OVER(PARTITION BY symbol ORDER BY ts);
Examples
See Gap Filling and Interpolation in Analyzing Data.
See also
42 - TS_LAST_VALUE
Processes the data that belongs to each time slice.
Processes the data that belongs to each time slice. A time series aggregate (TSA) function, TS_LAST_VALUE
returns the value at the end of the time slice, where an interpolation scheme is applied if the timeslice is missing. In this case the value is determined by the values corresponding to the previous (and next) timeslices based on the interpolation scheme of const (linear).
TS_LAST_VALUE
returns one output row per time slice, or one output row per partition per time slice if partition expressions are specified.
Behavior type
Immutable
Syntax
TS_LAST_VALUE ( expression [ IGNORE NULLS ] [, { 'CONST' | 'LINEAR' } ] )
Parameters
expression
- An
INTEGER
or FLOAT
expression on which to aggregate and interpolate.
IGNORE NULLS
- The
IGNORE NULLS
behavior changes depending on a CONST
or LINEAR
interpolation scheme. See When Time Series Data Contains Nulls in Analyzing Data for details.
'CONST' | 'LINEAR'
- Specifies the interpolation value as constant or linear:
Requirements
You must use the ORDER BY
clause with a TIMESTAMP
column.
Multiple time series aggregate functions
The same query can call multiple time series aggregate functions. They share the same gap-filling policy as defined by the TIMESERIES clause; however, each time series aggregate function can specify its own interpolation policy. For example:
=> SELECT slice_time, symbol,
TS_FIRST_VALUE(bid, 'const') fv_c,
TS_FIRST_VALUE(bid, 'linear') fv_l,
TS_LAST_VALUE(bid, 'const') lv_c
FROM TickStore
TIMESERIES slice_time AS '3 seconds'
OVER(PARTITION BY symbol ORDER BY ts);
Examples
See Gap Filling and Interpolation in Analyzing Data.
See also
43 - VAR_POP [aggregate]
Evaluates the population variance for each member of the group.
Evaluates the population variance for each member of the group. This is defined as the sum of squares of the difference of *expression
*from the mean of expression
, divided by the number of remaining rows:
(SUM(expression*expression) - SUM(expression)*SUM(expression) / COUNT(expression)) / COUNT(expression)
Behavior type
Immutable
Syntax
VAR_POP ( expression )
Parameters
expression
- Any
NUMERIC
data type or any non-numeric data type that can be implicitly converted to a numeric data type. VAR_POP
returns the same data type as expression
.
This aggregate function differs from analytic function
VAR_POP
, which computes the population variance of the current row with respect to the group of rows within a window.
Examples
The following example returns the population variance for each household ID in the customer
table.
=> SELECT VAR_POP(household_id) FROM customer_dimension;
var_pop
------------------
74847050.0168393
(1 row)
44 - VAR_SAMP [aggregate]
Evaluates the sample variance for each row of the group.
Evaluates the sample variance for each row of the group. This is defined as the sum of squares of the difference of expression
from the mean of expression
divided by the number of remaining rows minus 1:
(SUM(expression*expression) - SUM(expression) *SUM(expression) / COUNT(expression)) / (COUNT(expression) -1)
Behavior type
Immutable
Syntax
VAR_SAMP ( expression )
Parameters
expression
- Any
NUMERIC
data type or any non-numeric data type that can be implicitly converted to a numeric data type. VAR_SAMP
returns the same data type as expression
.
-
VAR_SAMP
is semantically identical to nonstandard function
VARIANCE
, which is provided for compatibility with other databases.
-
This aggregate function differs from analytic function
VAR_SAMP
, which computes the sample variance of the current row with respect to the group of rows within a window.
Examples
The following example returns the sample variance for each household ID in the customer
table.
=> SELECT VAR_SAMP(household_id) FROM customer_dimension;
var_samp
------------------
74848598.0106764
(1 row)
See also
VARIANCE [aggregate]
45 - VARIANCE [aggregate]
Evaluates the sample variance for each row of the group.
Evaluates the sample variance for each row of the group. This is defined as the sum of squares of the difference of expression
from the mean of expression
divided by the number of remaining rows minus 1.
(SUM(expression*expression) - SUM(expression) *SUM(expression) /COUNT(expression)) / (COUNT(expression) -1)
Behavior type
Immutable
Syntax
VARIANCE ( expression )
Parameters
expression
- Any
NUMERIC
data type or any non-numeric data type that can be implicitly converted to a numeric data type. VARIANCE
returns the same data type as expression
.
The nonstandard function VARIANCE
is provided for compatibility with other databases. It is semantically identical to
VAR_SAMP
.
This aggregate function differs from analytic function
VARIANCE
, which computes the sample variance of the current row with respect to the group of rows within a window.
Examples
The following example returns the sample variance for each household ID in the customer
table.
=> SELECT VARIANCE(household_id) FROM customer_dimension;
variance
------------------
74848598.0106764
(1 row)
See also
46 - WITHIN GROUP ORDER BY clause
Specifies how to sort rows that are grouped by aggregate functions, one of the following:.
Specifies how to sort rows that are grouped by aggregate functions, one of the following:
This clause is also supported for user-defined aggregate functions.
The order clause only specifies order within the result set of each group. The query can have its own ORDER BY clause, which has precedence over order that is specified by WITHIN GROUP ORDER BY, and orders the final result set.
Syntax
WITHIN GROUP (ORDER BY
{ column-expression [ ASC | DESC [ NULLS { FIRST | LAST | AUTO } ] ]
}[,...])
Parameters
column-expression
- A column, constant, or arbitrary expression formed on columns, on which to sort grouped rows.
ASC | DESC
- Specifies the ordering sequence as ascending (default) or descending.
NULLS {FIRST | LAST | AUTO}
- Specifies whether to position null values first or last. Default positioning depends on whether the sort order is ascending or descending:
If you specify NULLS AUTO
, Vertica chooses the positioning that is most efficient for this query, either NULLS FIRST
or NULLS LAST
.
If you omit all sort qualifiers, Vertica uses ASC NULLS LAST
.
Examples
For usage examples, see these functions: