Aggregate functions summarize data over groups of rows from a query result set. The groups are specified using the GROUP BY clause. They are allowed only in the select list and in the HAVING and ORDER BY clauses of a SELECT statement (as described in Aggregate expressions).
Except for COUNT, these functions return a null value when no rows are selected. In particular, SUM of no rows returns NULL, not zero.
In some cases, you can replace an expression that includes multiple aggregates with a single aggregate of an expression. For example SUM(x) + SUM(y) can be expressed as as SUM(x+y) if neither argument is NULL.
Vertica does not support nested aggregate functions.
You can use some of the simple aggregate functions as analytic (window) functions. See Analytic functions for details. See also SQL analytics.
Value to be evaluated using any data type that supports equality comparison.
error-tolerance
Numeric value that represents the desired percentage of error tolerance, distributed around the value returned by this function. The smaller the error tolerance, the closer the approximation.
You can set error-tolerance to a minimum value of 0.88. Vertica imposes no maximum restriction, but any value greater than 5 is implemented with 5% error tolerance.
If you omit this argument, Vertica uses an error tolerance of 1.25(%).
Restrictions
APPROXIMATE_COUNT_DISTINCT and DISTINCT aggregates cannot be in the same query block.
Error tolerance
APPROXIMATE_COUNT_DISTINCT(x, error-tolerance) returns a value equal to COUNT(DISTINCT x), with an error that is lognormally distributed with standard deviation.
Parameter error-tolerance is optional. Supply this argument to specify the desired standard deviation. error-tolerance is defined as 2.17 standard deviations, which corresponds to a 97 percent confidence interval:
standard-deviation = error-tolerance / 2.17
For example:
error-tolerance = 1
Default setting, corresponds to a standard deviation
97 percent of the time, APPROXIMATE_COUNT_DISTINCT(x,5) returns a value between:
COUNT(DISTINCT x) * 0.99
COUNT(DISTINCT x) * 1.01
error-tolerance = 5
97 percent of the time, APPROXIMATE_COUNT_DISTINCT(x) returns a value between:
COUNT(DISTINCT x) * 0.95
COUNT(DISTINCT x) * 1.05
A 99 percent confidence interval corresponds to 2.58 standard deviations. To set error-tolerance confidence level corresponding to 99 (instead of a 97) percent , multiply error-tolerance by 2.17 / 2.58 = 0.841.
For example, if you specify error-tolerance as 5 * 0.841 = 4.2, APPROXIMATE_COUNT_DISTINCT(x,4.2) returns values 99 percent of the time between:
COUNT (DISTINCT x) * 0.95
COUNT (DISTINCT x) * 1.05
Examples
Count the total number of distinct values in column product_key from table store.store_sales_fact:
Count the approximate number of distinct values in product_key with various error tolerances. The smaller the error tolerance, the closer the approximation:
=> SELECT APPROXIMATE_COUNT_DISTINCT(product_key,5) AS five_pct_accuracy,
APPROXIMATE_COUNT_DISTINCT(product_key,1) AS one_pct_accuracy,
APPROXIMATE_COUNT_DISTINCT(product_key,.88) AS point_eighteight_pct_accuracy
FROM store.store_sales_fact;
five_pct_accuracy | one_pct_accuracy | point_eighteight_pct_accuracy
-------------------+------------------+-------------------------------
19431 | 19921 | 19921
(1 row)
A synopsis object created by APPROXIMATE_COUNT_DISTINCT_SYNOPSIS.
error-tolerance
Numeric value that represents the desired percentage of error tolerance, distributed around the value returned by this function. The smaller the error tolerance, the closer the approximation.
You can set error-tolerance to a minimum value of 0.88. Vertica imposes no maximum restriction, but any value greater than 5 is implemented with 5% error tolerance.
If you omit this argument, Vertica uses an error tolerance of 1.25(%).
APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS and DISTINCT aggregates cannot be in the same query block.
Examples
The following examples review and compare different ways to obtain a count of unique values in a table column:
Return an exact count of unique values in column product_key, from table store.store_sales_fact:
=> \timing
Timing is on.
=> SELECT COUNT(DISTINCT product_key) from store.store_sales_fact;
count
-------
19982
(1 row)
Time: First fetch (1 row): 553.033 ms. All rows formatted: 553.075 ms
Return an approximate count of unique values in column product_key:
=> SELECT APPROXIMATE_COUNT_DISTINCT(product_key) as unique_product_keys
FROM store.store_sales_fact;
unique_product_keys
---------------------
19921
(1 row)
Time: First fetch (1 row): 394.562 ms. All rows formatted: 394.600 ms
Create a synopsis object that represents a set of store.store_sales_fact data with unique product_key values, store the synopsis in the new table my_summary:
=> CREATE TABLE my_summary AS SELECT APPROXIMATE_COUNT_DISTINCT_SYNOPSIS (product_key) syn
FROM store.store_sales_fact;
CREATE TABLE
Time: First fetch (0 rows): 582.662 ms. All rows formatted: 582.682 ms
Return a count from the saved synopsis:
=> SELECT APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS(syn) FROM my_summary;
ApproxCountDistinctOfSynopsis
-------------------------------
19921
(1 row)
Time: First fetch (1 row): 105.295 ms. All rows formatted: 105.335 ms
Summarizes the information of distinct non-NULL values and materializes the result set in a VARBINARY or LONG VARBINARY synopsis object.
Summarizes the information of distinct non-NULL values and materializes the result set in a VARBINARY or LONG VARBINARY synopsis object. The calculated result is within a specified range of error tolerance. You save the synopsis object in a Vertica table for use by APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS.
Value to evaluate using any data type that supports equality comparison.
error-tolerance
Numeric value that represents the desired percentage of error tolerance, distributed around the value returned by this function. The smaller the error tolerance, the closer the approximation.
You can set error-tolerance to a minimum value of 0.88. Vertica imposes no maximum restriction, but any value greater than 5 is implemented with 5% error tolerance.
If you omit this argument, Vertica uses an error tolerance of 1.25(%).
Aggregates multiple synopses into one new synopsis.
Aggregates multiple synopses into one new synopsis. This function is similar to APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS but returns one synopsis instead of the count estimate. The benefit of this function is that it speeds up final estimation when calling APPROXIMATE_COUNT_DISTINCT_OF_SYNOPSIS.
For example, if you need to regularly estimate count distinct of users for a long period of time (such as several years) you can pre-accumulate synopses of days into one synopsis for a year.
An expression that can be evaluated to one or more synopses. Typically a synopsis-obj is generated as a binary string by either the APPROXIMATE_COUNT_DISTINCT or APPROXIMATE_COUNT_DISTINCT_SYNOPSIS_MERGE function and is stored in a table column of type VARBINARY or LONG VARBINARY.
error-tolerance
Numeric value that represents the desired percentage of error tolerance, distributed around the value returned by this function. The smaller the error tolerance, the closer the approximation.
You can set error-tolerance to a minimum value of 0.88. Vertica imposes no maximum restriction, but any value greater than 5 is implemented with 5% error tolerance.
If you omit this argument, Vertica uses an error tolerance of 1.25(%).
Any FLOAT or INTEGER data type. The function returns the approximate middle value or an interpolated value that would be the approximate middle value once the values are sorted. Null values are ignored in the calculation.
Examples
Tip
For optimal performance when using GROUP BY in your query, verify that your table is sorted on the GROUP BY column.
The following examples uses this table:
CREATE TABLE allsales(state VARCHAR(20), name VARCHAR(20), sales INT) ORDER BY state;
INSERT INTO allsales VALUES('MA', 'A', 60);
INSERT INTO allsales VALUES('NY', 'B', 20);
INSERT INTO allsales VALUES('NY', 'C', 15);
INSERT INTO allsales VALUES('MA', 'D', 20);
INSERT INTO allsales VALUES('MA', 'E', 50);
INSERT INTO allsales VALUES('NY', 'F', 40);
INSERT INTO allsales VALUES('MA', 'G', 10);
COMMIT;
Calculate the approximate median of all sales in this table:
Modify the query to group sales by state, and obtain the approximate median for each one:
=> SELECT state, APPROXIMATE_MEDIAN(sales) FROM allsales GROUP BY state;
state | APPROXIMATE_MEDIAN
-------+--------------------
MA | 35
NY | 20
(2 rows)
Computes the approximate percentile of an expression over a group of rows.
Computes the approximate percentile of an expression over a group of rows. This function returns a FLOAT value.
Note
Note: Use this function when many rows are aggregated into groups. If the number of aggregated rows is small, use the analytic function PERCENTILE_CONT.
APPROXIMATE_PERCENTILE ( column-expression USING PARAMETERS percentiles='percentile-values' )
Arguments
column-expression
A column of FLOAT or INTEGER data types whose percentiles will be calculated. NULL values are ignored.
Parameters
percentiles
One or more (up to 1000) comma-separated FLOAT constants ranging from 0 to 1 inclusive, specifying the percentile values to be calculated.
Note
Note: The deprecated parameter percentile, which takes only a single float, continues to be supported for backwards-compatibility.
Examples
Tip
For optimal performance when using GROUP BY in your query, verify that your table is sorted on the GROUP BY column.
The following examples use this table:
=> CREATE TABLE allsales(state VARCHAR(20), name VARCHAR(20), sales INT) ORDER BY state;
INSERT INTO allsales VALUES('MA', 'A', 60);
INSERT INTO allsales VALUES('NY', 'B', 20);
INSERT INTO allsales VALUES('NY', 'C', 15);
INSERT INTO allsales VALUES('MA', 'D', 20);
INSERT INTO allsales VALUES('MA', 'E', 50);
INSERT INTO allsales VALUES('NY', 'F', 40);
INSERT INTO allsales VALUES('MA', 'G', 10);
COMMIT;
=> SELECT * FROM allsales;
state | name | sales
-------+------+-------
MA | A | 60
NY | B | 20
NY | C | 15
NY | F | 40
MA | D | 20
MA | E | 50
MA | G | 10
(7 rows)
Calculate the approximate percentile for sales in each state:
=> SELECT state, APPROXIMATE_PERCENTILE(sales USING PARAMETERS percentiles='0.5') AS median
FROM allsales GROUP BY state;
state | median
-------+--------
MA | 35
NY | 20
(2 rows)
Calculate multiple approximate percentiles for sales in each state:
=> SELECT state, APPROXIMATE_PERCENTILE(sales USING PARAMETERS percentiles='0.5,1.0')
FROM allsales GROUP BY state;
state | APPROXIMATE_PERCENTILE
-------+--------
MA | [35.0,60.0]
NY | [20.0,40.0]
(2 rows)
Calculate multiple approximate percentiles for sales in each state and show results for each percentile in separate columns:
=> SELECT ps[0] as q0, ps[1] as q1, ps[2] as q2, ps[3] as q3, ps[4] as q4
FROM (SELECT APPROXIMATE_PERCENTILE(sales USING PARAMETERS percentiles='0, 0.25, 0.5, 0.75, 1')
AS ps FROM allsales GROUP BY state) as s1;
q0 | q1 | q2 | q3 | q4
------+------+------+------+------
10.0 | 17.5 | 35.0 | 52.5 | 60.0
15.0 | 17.5 | 20.0 | 30.0 | 40.0
(2 rows)
Computes an array of weighted, approximate percentiles of a column within some user-specified error.
Computes an array of weighted, approximate percentiles of a column within some user-specified error. This algorithm is similar to APPROXIMATE_PERCENTILE [aggregate], which instead returns a single percentile.
The performance of this function depends entirely on the specified epsilon and the size of the provided array.
APPROXIMATE_QUANTILES ( column USING PARAMETERS [nquantiles=n], [epsilon=error] ) OVER() FROM table
Parameters
column
The INTEGER or FLOAT column for which to calculate the percentiles. NULL values are ignored.
n
An integer that specifies the number of desired quantiles in the returned array.
Default: 11
error
The allowed error for any returned percentile. Specifically, for an array of size N, the specified error ε (epsilon) for the φ-quantile guarantees that the rank r of the return value with respect to the rank ⌊φN⌋ of the exact value is such that:
⌊(φ-ε)N⌋ ≤ r ≤ ⌊(φ+ε)N⌋
For n quantiles, if the error ε is specified such that ε > 1/n, this function will return non-deterministic results.
Default: 0.001
table
The table containing column.
Examples
The following example uses this table:
=> CREATE TABLE allsales(state VARCHAR(20), name VARCHAR(20), sales INT) ORDER BY state;
INSERT INTO allsales VALUES('MA', 'A', 60);
INSERT INTO allsales VALUES('NY', 'B', 20);
INSERT INTO allsales VALUES('NY', 'C', 15);
INSERT INTO allsales VALUES('MA', 'D', 20);
INSERT INTO allsales VALUES('MA', 'E', 50);
INSERT INTO allsales VALUES('NY', 'F', 40);
INSERT INTO allsales VALUES('MA', 'G', 10);
COMMIT;
=> SELECT * FROM allsales;
state | name | sales
-------+------+-------
MA | A | 60
NY | B | 20
NY | C | 15
NY | F | 40
MA | D | 20
MA | E | 50
MA | G | 10
(7 rows)
This call to APPROXIMATE_QUANTILES returns a 6-element array of approximate percentiles, one for each quantile. Each quantile relates to the percentile by a factor of 100. For example, the second entry in the output indicates that 15 is the 0.2-quantile of the input column, so 15 is the 20th percentile of the input column.
Takes two arguments target and arg, where both are columns or column expressions in the queried dataset.
Takes two arguments target and arg, where both are columns or column expressions in the queried dataset. ARGMAX_AGG finds the row with the highest non-null value in target and returns the value of arg in that row. If multiple rows contain the highest target value, ARGMAX_AGG returns arg from the first row that it finds. Use the WITHIN GROUP ORDER BY clause to control which row ARGMAX_AGG finds first.
Behavior type
Immutable if the WITHIN GROUP ORDER BY clause specifies a column or set of columns that resolves to unique values within the group; otherwise Volatile.
WITHIN GROUP (ORDER BY { column-expression[ sort-qualifiers ] }[,...])
sort-qualifiers:
{ ASC | DESC [ NULLS { FIRST | LAST | AUTO } ] }
Use this clause to determine which row is returned when multiple rows contain the highest target value; otherwise, results are likely to vary with each iteration of the same query.
Tip
WITHIN GROUP ORDER BY can consume a large amount of memory per group. To minimize memory consumption, create projections that support GROUPBY PIPELINED.
Examples
The following example calls ARGMAX_AGG in a WITH clause to find which employees in each region are at or near retirement age. If multiple employees within each region have the same age, ARGMAX_AGG chooses the employees with the highest salary level and returns with their IDs. The primary query returns with details on the employees selected from each region:
=> WITH r AS (SELECT employee_region, ARGMAX_AGG(employee_age, employee_key)
WITHIN GROUP (ORDER BY annual_salary DESC) emp_id
FROM employee_dim GROUP BY employee_region ORDER BY employee_region)
SELECT r.employee_region, ed.annual_salary AS highest_salary, employee_key,
ed.employee_first_name||' '||ed.employee_last_name AS employee_name, ed.employee_age
FROM r JOIN employee_dim ed ON r.emp_id = ed.employee_key ORDER BY ed.employee_region;
employee_region | highest_salary | employee_key | employee_name | employee_age
----------------------------------+----------------+--------------+------------------+--------------
East | 927335 | 70 | Sally Gauthier | 65
MidWest | 177716 | 869 | Rebecca McCabe | 65
NorthWest | 100300 | 7597 | Kim Jefferson | 65
South | 196454 | 275 | Alexandra Harris | 65
SouthWest | 198669 | 1043 | Seth Stein | 65
West | 197203 | 681 | Seth Jones | 65
(6 rows)
Takes two arguments target and arg, where both are columns or column expressions in the queried dataset.
Takes two arguments target and arg, where both are columns or column expressions in the queried dataset. ARGMIN_AGG finds the row with the lowest non-null value in target and returns the value of arg in that row. If multiple rows contain the lowest target value, ARGMIN_AGG returns arg from the first row that it finds. Use the WITHIN GROUP ORDER BY clause to control which row ARGMMIN_AGG finds first.
Behavior type
Immutable if the WITHIN GROUP ORDER BY clause specifies a column or set of columns that resolves to unique values within the group; otherwise Volatile.
WITHIN GROUP (ORDER BY { column-expression[ sort-qualifiers ] }[,...])
sort-qualifiers:
{ ASC | DESC [ NULLS { FIRST | LAST | AUTO } ] }
Use this clause to determine which row is returned when multiple rows contain the lowest target value; otherwise, results are likely to vary with each iteration of the same query.
Tip
WITHIN GROUP ORDER BY can consume a large amount of memory per group. To minimize memory consumption, create projections that support GROUPBY PIPELINED.
Examples
The following example calls ARGMIN_AGG in a WITH clause to find the lowest salary among all employees in each region, and returns with the lowest-paid employee IDs. The primary query returns with the salary amounts and employee names:
=> WITH msr (employee_region, emp_id) AS
(SELECT employee_region, argmin_agg(annual_salary, employee_key) lowest_paid_employee FROM employee_dim GROUP BY employee_region)
SELECT msr.employee_region, ed.annual_salary AS lowest_salary, ed.employee_first_name||' '||ed.employee_last_name AS employee_name
FROM msr JOIN employee_dim ed ON msr.emp_id = ed.employee_key ORDER BY annual_salary DESC;
employee_region | lowest_salary | employee_name
----------------------------------+---------------+-----------------
NorthWest | 20913 | Raja Garnett
SouthWest | 20750 | Seth Moore
West | 20443 | Midori Taylor
South | 20363 | David Bauer
East | 20306 | Craig Jefferson
MidWest | 20264 | Dean Vu
(6 rows)
Invokes the aggregate function for all rows in the group (default).
DISTINCT
Invokes the aggregate function for all distinct non-null values of the expression found in the group.
expression
The value whose average is calculated over a set of rows, any expression that can have a DOUBLE PRECISION result.
Overflow handling
By default, Vertica allows silent numeric overflow when you call this function on numeric data types. For more information on this behavior and how to change it, seeNumeric data type overflow with SUM, SUM_FLOAT, and AVG.
Examples
The following query returns the average income from the customer table:
=> SELECT AVG(annual_income) FROM customer_dimension;
AVG
--------------
2104270.6485
(1 row)
The BINARY or VARBINARY input value to evaluate. BIT_AND operates on VARBINARY types explicitly and on BINARY types implicitly through casts.
Returns
BIT_AND returns:
The same value as the argument data type.
1 for each bit compared, if all bits are 1; otherwise 0.
If the columns are different lengths, the return values are treated as though they are all equal in length and are right-extended with zero bytes. For example, given a group containing hex values ff, null, and f, BIT_AND ignores the null value and extends the value f to f0.
Examples
The example that follows uses table t with a single column of VARBINARY data type:
=> CREATE TABLE t ( c VARBINARY(2) );
=> INSERT INTO t values(HEX_TO_BINARY('0xFF00'));
=> INSERT INTO t values(HEX_TO_BINARY('0xFFFF'));
=> INSERT INTO t values(HEX_TO_BINARY('0xF00F'));
The BINARY or VARBINARY input value to evaluate. BIT_OR operates on VARBINARY types explicitly and on BINARY types implicitly through casts.
Returns
BIT_OR returns:
The same value as the argument data type.
1 for each bit compared, if any bit is 1; otherwise 0.
If the columns are different lengths, the return values are treated as though they are all equal in length and are right-extended with zero bytes. For example, given a group containing hex values ff, null, and f, the function ignores the null value and extends the value f to f0.
Examples
The example that follows uses table t with a single column of VARBINARY data type:
=> CREATE TABLE t ( c VARBINARY(2) );
=> INSERT INTO t values(HEX_TO_BINARY('0xFF00'));
=> INSERT INTO t values(HEX_TO_BINARY('0xFFFF'));
=> INSERT INTO t values(HEX_TO_BINARY('0xF00F'));
The BINARY or VARBINARY input value to evaluate. BIT_XOR operates on VARBINARY types explicitly and on BINARY types implicitly through casts.
Returns
BIT_XOR returns:
The same value as the argument data type.
1 for each bit compared, if there are an odd number of arguments with set bits; otherwise 0.
If the columns are different lengths, the return values are treated as though they are all equal in length and are right-extended with zero bytes. For example, given a group containing hex values ff, null, and f, the function ignores the null value and extends the value f to f0.
Examples
First create a sample table and projections with binary columns:
The example that follows uses table t with a single column of VARBINARY data type:
=> CREATE TABLE t ( c VARBINARY(2) );
=> INSERT INTO t values(HEX_TO_BINARY('0xFF00'));
=> INSERT INTO t values(HEX_TO_BINARY('0xFFFF'));
=> INSERT INTO t values(HEX_TO_BINARY('0xF00F'));
A Boolean data type or any non-Boolean data type that can be implicitly coerced to a Boolean data type.
Examples
The following example shows how to use aggregate functions BOOL_AND, BOOL_OR, and BOOL_XOR. The sample table mixers includes columns for models and colors.
Insert sample data into the table. The sample adds two color fields for each model.
=> INSERT INTO mixers
SELECT 'beginner', 'green'
UNION ALL
SELECT 'intermediate', 'blue'
UNION ALL
SELECT 'intermediate', 'blue'
UNION ALL
SELECT 'advanced', 'green'
UNION ALL
SELECT 'advanced', 'blue'
UNION ALL
SELECT 'professional', 'blue'
UNION ALL
SELECT 'professional', 'green'
UNION ALL
SELECT 'beginner', 'green';
OUTPUT
--------
8
(1 row)
Query the table. The result shows models that have two blue (BOOL_AND), one or two blue (BOOL_OR), and specifically not more than one blue (BOOL_XOR) mixer.
=> SELECT model,
BOOL_AND(colors= 'blue')AS two_blue,
BOOL_OR(colors= 'blue')AS one_or_two_blue,
BOOL_XOR(colors= 'blue')AS specifically_not_more_than_one_blue
FROM mixers
GROUP BY model;
model | two_blue | one_or_two_blue | specifically_not_more_than_one_blue
--------------+----------+-----------------+-------------------------------------
advanced | f | t | t
beginner | f | f | f
intermediate | t | t | f
professional | f | t | t
(4 rows)
A Boolean data type or any non-Boolean data type that can be implicitly coerced to a Boolean data type.
Examples
The following example shows how to use aggregate functions BOOL_AND, BOOL_OR, and BOOL_XOR. The sample table mixers includes columns for models and colors.
Insert sample data into the table. The sample adds two color fields for each model.
=> INSERT INTO mixers
SELECT 'beginner', 'green'
UNION ALL
SELECT 'intermediate', 'blue'
UNION ALL
SELECT 'intermediate', 'blue'
UNION ALL
SELECT 'advanced', 'green'
UNION ALL
SELECT 'advanced', 'blue'
UNION ALL
SELECT 'professional', 'blue'
UNION ALL
SELECT 'professional', 'green'
UNION ALL
SELECT 'beginner', 'green';
OUTPUT
--------
8
(1 row)
Query the table. The result shows models that have two blue (BOOL_AND), one or two blue (BOOL_OR), and specifically not more than one blue (BOOL_XOR) mixer.
=> SELECT model,
BOOL_AND(colors= 'blue')AS two_blue,
BOOL_OR(colors= 'blue')AS one_or_two_blue,
BOOL_XOR(colors= 'blue')AS specifically_not_more_than_one_blue
FROM mixers
GROUP BY model;
model | two_blue | one_or_two_blue | specifically_not_more_than_one_blue
--------------+----------+-----------------+-------------------------------------
advanced | f | t | t
beginner | f | f | f
intermediate | t | t | f
professional | f | t | t
(4 rows)
Processes Boolean values and returns a Boolean value result.
Processes Boolean values and returns a Boolean value result. If specifically only one input value is true, BOOL_XOR returns t. Otherwise, it returns f.
A Boolean data type or any non-Boolean data type that can be implicitly coerced to a Boolean data type.
Examples
The following example shows how to use aggregate functions BOOL_AND, BOOL_OR, and BOOL_XOR. The sample table mixers includes columns for models and colors.
Insert sample data into the table. The sample adds two color fields for each model.
=> INSERT INTO mixers
SELECT 'beginner', 'green'
UNION ALL
SELECT 'intermediate', 'blue'
UNION ALL
SELECT 'intermediate', 'blue'
UNION ALL
SELECT 'advanced', 'green'
UNION ALL
SELECT 'advanced', 'blue'
UNION ALL
SELECT 'professional', 'blue'
UNION ALL
SELECT 'professional', 'green'
UNION ALL
SELECT 'beginner', 'green';
OUTPUT
--------
8
(1 row)
Query the table. The result shows models that have two blue (BOOL_AND), one or two blue (BOOL_OR), and specifically not more than one blue (BOOL_XOR) mixer.
=> SELECT model,
BOOL_AND(colors= 'blue')AS two_blue,
BOOL_OR(colors= 'blue')AS one_or_two_blue,
BOOL_XOR(colors= 'blue')AS specifically_not_more_than_one_blue
FROM mixers
GROUP BY model;
model | two_blue | one_or_two_blue | specifically_not_more_than_one_blue
--------------+----------+-----------------+-------------------------------------
advanced | f | t | t
beginner | f | f | f
intermediate | t | t | f
professional | f | t | t
(4 rows)
Returns the DOUBLE PRECISION coefficient of correlation of a set of expression pairs, as per the Pearson correlation coefficient.
Returns the DOUBLE PRECISION coefficient of correlation of a set of expression pairs, as per the Pearson correlation coefficient. CORR eliminates expression pairs where either expression in the pair is NULL. If no rows remain, the function returns NULL.
Returns as a BIGINT the number of rows in each group where the expression is not NULL.
Returns as a BIGINT the number of rows in each group where the expression is not NULL. If the query has no GROUP BY clause, COUNT returns the number of table rows.
The COUNT aggregate function differs from the COUNT analytic function, which returns the number over a group of rows within a window.
This query selects each distinct product_key value and then counts the number of distinct date_key values for all records with the specific product_key value. It also counts the number of distinct warehouse_key values in all records with the specific product_key value:
This query selects each distinct product_key value, counts the number of distinct date_key and warehouse_key values for all records with the specific product_key value, and then sums all qty_in_stock values in records with the specific product_key value. It then returns the number of product_version values in records with the specific product_key value:
Returns the population covariance for a set of expression pairs.
Returns the population covariance for a set of expression pairs. The return value is of type DOUBLE PRECISION. COVAR_POP eliminates expression pairs where either expression in the pair is NULL. If no rows remain, the function returns NULL.
Returns the sample covariance for a set of expression pairs.
Returns the sample covariance for a set of expression pairs. The return value is of type DOUBLE PRECISION. COVAR_SAMP eliminates expression pairs where either expression in the pair is NULL. If no rows remain, the function returns NULL.
Uniquely identifies duplicate sets for GROUP BY queries that return duplicate grouping sets.
Uniquely identifies duplicate sets for GROUP BY queries that return duplicate grouping sets. This function returns one or more integers, starting with zero (0), as identifiers.
For the number of duplicates n for a particular grouping, GROUP_ID returns a range of sequential numbers, 0 to n–1. For the first each unique group it encounters, GROUP_ID returns the value 0. If GROUP_ID finds the same grouping again, the function returns 1, then returns 2 for the next found grouping, and so on.
This example shows how GROUP_ID creates unique identifiers when a query produces duplicate groupings. For an expenses table, the following query groups the results by category of expense and year and rolls up the sum for those two columns. The results have duplicate groupings for category and NULL. The first grouping has a GROUP_ID of 0, and the second grouping has a GROUP_ID of 1.
Disambiguates the use of NULL values when GROUP BY queries with multilevel aggregates generate NULL values to identify subtotals in grouping columns.
Disambiguates the use of NULL values when GROUP BY queries with multilevel aggregates generate NULL values to identify subtotals in grouping columns. Such NULL values from the original data can also occur in rows. GROUPING returns 1, if the value of expression is:
NULL, representing an aggregated value
0 for any other value, including NULL values in rows
The following query uses the GROUPING function, taking one of the GROUP BY expressions as an argument. For each row, GROUPING returns one of the following:
0: The column is part of the group for that row
1: The column is not part of the group for that row
The 1 in the GROUPING(Year) column for electricity and books indicates that these values are subtotals. The right-most column values for both GROUPING(Category) and GROUPING(Year) are 1. This value indicates that neither column contributed to the GROUP BY. The final row represents the total sales.
Concatenates the set of Boolean values generated by the GROUPING function into a bit vector.
Concatenates the set of Boolean values generated by the GROUPING function into a bit vector. GROUPING_ID treats the bit vector as a binary number and returns it as a base-10 value that identifies the grouping set combination.
By using GROUPING_ID you avoid the need for multiple, individual GROUPING functions. GROUPING_ID simplifies row-filtering conditions, because rows of interest are identified using a single return from GROUPING_ID = n. Use GROUPING_ID to identify grouping combinations.
An expression that matches one of the expressions in the GROUP BY clause.
If the GROUP BY clause includes a list of expressions, GROUPING_ID returns a number corresponding to the GROUPING bit vector associated with a row.
Examples
This example shows how calling GROUPING_ID without an expression returns the GROUPING bit vector associated with a full set of multilevel aggregate expressions. The GROUPING_ID value is comparable to GROUPING_ID(a,b) because GROUPING_ID() includes all columns in the GROUP BY ROLLUP:
=> SELECT a,b,COUNT(*), GROUPING_ID() FROM T GROUP BY ROLLUP(a,b);
In the following query, the GROUPING(Category) and GROUPING(Year) columns have three combinations:
Transforms non-null values from a group of rows into a list of values that are delimited by commas (default) or a configurable separator.
Transforms non-null values from a group of rows into a list of values that are delimited by commas (default) or a configurable separator. LISTAGG can be used to denormalize rows into a string of concatenated values.
Behavior type
Immutable if the WITHIN GROUP ORDER BY clause specifies a column or set of columns that resolves to unique values within the aggregated list; otherwise Volatile.
Aggregation of one or more columns or column expressions to select from the source table or view.
LISTAGG does not support spatial data types directly. In order to pass column data of this type, convert the data to strings with the geospatial function ST_AsText.
Caution
Converted spatial data frequently contains commas. LISTAGG uses comma as the default separator character. To avoid ambiguous output, override this default by setting the function's separator parameter to another character.
Sorts aggregated values within each group of rows, where column-expression is typically a column in aggregate-expression:
WITHIN GROUP (ORDER BY { column-expression[ sort-qualifiers ] }[,...])
sort-qualifiers:
{ ASC | DESC [ NULLS { FIRST | LAST | AUTO } ] }
Tip
WITHIN GROUP ORDER BY can consume a large amount of memory per group. Including wide strings in the aggregate expression can also adversely affect performance. To minimize memory consumption, create projections that support GROUPBY PIPELINED.
Parameters
Parameter name
Set to...
max_length
An integer or integer expression that specifies in bytes the maximum length of the result, up to 32M.
Default: 1024
separator
Separator string of length 0 to 80, inclusive. A length of 0 concatenates the output with no separators.
Default: comma (,)
on_overflow
Specifies behavior when the result overflows the max_length setting, one of the following strings:
ERROR (default): Return an error when overflow occurs.
TRUNCATE: Remove any characters that exceed max_length setting from the query result, and return the truncated string.
Privileges
None
Examples
In the following query, the aggregated results in the CityState column use the string " | " as a separator. The outer GROUP BY clause groups the output rows according to their Region values. Within each group, the aggregated list items are sorted according to their city values, as per the WITHIN GROUP ORDER BY clause:
=> \x
Expanded display is on.
=> WITH cd AS (SELECT DISTINCT (customer_city) city, customer_state, customer_region FROM customer_dimension)
SELECT customer_region Region, LISTAGG(city||', '||customer_state USING PARAMETERS separator=' | ')
WITHIN GROUP (ORDER BY city) CityAndState FROM cd GROUP BY region ORDER BY region;
-[ RECORD 1 ]+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Region | East
CityAndState | Alexandria, VA | Allentown, PA | Baltimore, MD | Boston, MA | Cambridge, MA | Charlotte, NC | Clarksville, TN | Columbia, SC | Elizabeth, NJ | Erie, PA | Fayetteville, NC | Hartford, CT | Lowell, MA | Manchester, NH | Memphis, TN | Nashville, TN | New Haven, CT | New York, NY | Philadelphia, PA | Portsmouth, VA | Stamford, CT | Sterling Heights, MI | Washington, DC | Waterbury, CT
-[ RECORD 2 ]+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Region | MidWest
CityAndState | Ann Arbor, MI | Cedar Rapids, IA | Chicago, IL | Columbus, OH | Detroit, MI | Evansville, IN | Flint, MI | Gary, IN | Green Bay, WI | Indianapolis, IN | Joliet, IL | Lansing, MI | Livonia, MI | Milwaukee, WI | Naperville, IL | Peoria, IL | Sioux Falls, SD | South Bend, IN | Springfield, IL
-[ RECORD 3 ]+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Region | NorthWest
CityAndState | Bellevue, WA | Portland, OR | Seattle, WA
-[ RECORD 4 ]+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Region | South
CityAndState | Abilene, TX | Athens, GA | Austin, TX | Beaumont, TX | Cape Coral, FL | Carrollton, TX | Clearwater, FL | Coral Springs, FL | Dallas, TX | El Paso, TX | Fort Worth, TX | Grand Prairie, TX | Houston, TX | Independence, MS | Jacksonville, FL | Lafayette, LA | McAllen, TX | Mesquite, TX | San Antonio, TX | Savannah, GA | Waco, TX | Wichita Falls, TX
-[ RECORD 5 ]+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Region | SouthWest
CityAndState | Arvada, CO | Denver, CO | Fort Collins, CO | Gilbert, AZ | Las Vegas, NV | North Las Vegas, NV | Peoria, AZ | Phoenix, AZ | Pueblo, CO | Topeka, KS | Westminster, CO
-[ RECORD 6 ]+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Region | West
CityAndState | Berkeley, CA | Burbank, CA | Concord, CA | Corona, CA | Costa Mesa, CA | Daly City, CA | Downey, CA | El Monte, CA | Escondido, CA | Fontana, CA | Fullerton, CA | Inglewood, CA | Lancaster, CA | Los Angeles, CA | Norwalk, CA | Orange, CA | Palmdale, CA | Pasadena, CA | Provo, UT | Rancho Cucamonga, CA | San Diego, CA | San Francisco, CA | San Jose, CA | Santa Clara, CA | Simi Valley, CA | Sunnyvale, CA | Thousand Oaks, CA | Vallejo, CA | Ventura, CA | West Covina, CA | West Valley City, UT
25 - MAX [aggregate]
Returns the greatest value of an expression over a group of rows.
Returns the greatest value of an expression over a group of rows. The return value has the same type as the expression data type.
The MAX analytic function function differs from the aggregate function, in that it returns the maximum value of an expression over a group of rows within a window.
Aggregate functions MIN and MAX can operate with Boolean values. MAX can act upon a Boolean data type or a value that can be implicitly converted to a Boolean. If at least one input value is true, MAX returns t (true). Otherwise, it returns f (false). In the same scenario, MIN returns t (true) if all input values are true. Otherwise it returns f.
Any expression for which the maximum value is calculated, typically a column reference.
Examples
The following query returns the largest value in column sales_dollar_amount.
=> SELECT MAX(sales_dollar_amount) AS highest_sale FROM store.store_sales_fact;
highest_sale
--------------
600
(1 row)
The following example shows you the difference between the MIN and MAX aggregate functions when you use them with a Boolean value. The sample creates a table, adds two rows of data, and shows sample output for MIN and MAX.
=> CREATE TABLE min_max_functions (torf BOOL);
=> INSERT INTO min_max_functions VALUES (1);
=> INSERT INTO min_max_functions VALUES (0);
=> SELECT * FROM min_max_functions;
torf
------
t
f
(2 rows)
=> SELECT min(torf) FROM min_max_functions;
min
-----
f
(1 row)
=> SELECT max(torf) FROM min_max_functions;
max
-----
t
(1 row)
Returns the smallest value of an expression over a group of rows.
Returns the smallest value of an expression over a group of rows. The return value has the same type as the expression data type.
The MIN analytic function differs from the aggregate function, in that it returns the minimum value of an expression over a group of rows within a window.
Aggregate functions MIN and MAX can operate with Boolean values. MAX can act upon a Boolean data type or a value that can be implicitly converted to a Boolean. If at least one input value is true, MAX returns t (true). Otherwise, it returns f (false). In the same scenario, MIN returns t (true) if all input values are true. Otherwise it returns f.
Any expression for which the minimum value is calculated, typically a column reference.
Examples
The following query returns the lowest salary from the employee dimension table.
This example shows how you can query to return the lowest salary from the employee dimension table.
=> SELECT MIN(annual_salary) AS lowest_paid FROM employee_dimension;
lowest_paid
-------------
1200
(1 row)
The following example shows you the difference between the MIN and MAX aggregate functions when you use them with a Boolean value. The sample creates a table, adds two rows of data, and shows sample output for MIN and MAX.
=> CREATE TABLE min_max_functions (torf BOOL);
=> INSERT INTO min_max_functions VALUES (1);
=> INSERT INTO min_max_functions VALUES (0);
=> SELECT * FROM min_max_functions;
torf
------
t
f
(2 rows)
=> SELECT min(torf) FROM min_max_functions;
min
-----
f
(1 row)
=> SELECT max(torf) FROM min_max_functions;
max
-----
t
(1 row)
Returns the DOUBLE PRECISION average of the independent expression in an expression pair.
Returns the DOUBLE PRECISION average of the independent expression in an expression pair. REGR_AVGX eliminates expression pairs where either expression in the pair is NULL. If no rows remain, REGR_AVGX returns NULL.
Returns the DOUBLE PRECISION average of the dependent expression in an expression pair.
Returns the DOUBLE PRECISION average of the dependent expression in an expression pair. The function eliminates expression pairs where either expression in the pair is NULL. If no rows remain, the function returns NULL.
Returns the count of all rows in an expression pair.
Returns the count of all rows in an expression pair. The function eliminates expression pairs where either expression in the pair is NULL. If no rows remain, the function returns 0.
Returns the y-intercept of the regression line determined by a set of expression pairs.
Returns the y-intercept of the regression line determined by a set of expression pairs. The return value is of type DOUBLE PRECISION. REGR_INTERCEPT eliminates expression pairs where either expression in the pair is NULL. If no rows remain, REGR_INTERCEPT returns NULL.
Returns the square of the correlation coefficient of a set of expression pairs.
Returns the square of the correlation coefficient of a set of expression pairs. The return value is of type DOUBLE PRECISION. REGR_R2 eliminates expression pairs where either expression in the pair is NULL. If no rows remain, REGR_R2 returns NULL.
Returns the slope of the regression line, determined by a set of expression pairs.
Returns the slope of the regression line, determined by a set of expression pairs. The return value is of type DOUBLE PRECISION. REGR_SLOPE eliminates expression pairs where either expression in the pair is NULL. If no rows remain, REGR_SLOPE returns NULL.
Returns the sum of squares of the difference between the independent expression (expression2) and its average.
Returns the sum of squares of the difference between the independent expression (expression2) and its average.
That is, REGR_SXX returns: ∑[(expression2 - average(expression2)(expression2 - average(expression2)]
The return value is of type DOUBLE PRECISION. REGR_SXX eliminates expression pairs where either expression in the pair is NULL. If no rows remain, REGR_SXX returns NULL.
Returns the sum of products of the difference between the dependent expression (expression1) and its average and the difference between the independent expression (expression2) and its average.
Returns the sum of products of the difference between the dependent expression (expression1) and its average and the difference between the independent expression (expression2) and its average.
That is, REGR_SXY returns: ∑[(expression1 - average(expression1)(expression2 - average(expression2))]
The return value is of type DOUBLE PRECISION. REGR_SXY eliminates expression pairs where either expression in the pair is NULL. If no rows remain, REGR_SXY returns NULL.
Returns the sum of squares of the difference between the dependent expression (expression1) and its average.
Returns the sum of squares of the difference between the dependent expression (expression1) and its average.
That is, REGR_SYY returns: ∑[(expression1 - average(expression1)(expression1 - average(expression1)]
The return value is of type DOUBLE PRECISION. REGR_SYY eliminates expression pairs where either expression in the pair is NULL. If no rows remain, REGR_SYY returns NULL.
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. STDDEV returns the same data type as expression.
Related functions
Nonstandard function STDDEV is provided for compatibility with other databases. It is semantically identical to
STDDEV_SAMP.
This aggregate function differs from analytic function
STDDEV, which computes the statistical sample standard deviation of the current row with respect to the group of rows within a window.
The following example returns the statistical sample standard deviation for each household ID from the customer_dimension table of the VMart example database:
=> SELECT STDDEV(household_id) FROM customer_dimension;
STDDEV
-----------------
8651.5084240071
37 - STDDEV_POP [aggregate]
Evaluates the statistical population standard deviation for each member of the group.
Evaluates the statistical population standard deviation for each member of the group.
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. STDDEV_POP returns the same data type as expression.
Related functions
This function differs from the analytic function
STDDEV_POP, which evaluates the statistical population standard deviation for each member of the group of rows within a window.
STDDEV_POP returns the same value as the square root of
VAR_POP:
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. STDDEV_SAMP returns the same data type as expression.
Related functions
STDDEV_SAMP is semantically identical to nonstandard function
STDDEV, which is provided for compatibility with other databases.
This aggregate function differs from analytic function
STDDEV_SAMP, which computes the statistical sample standard deviation of the current row with respect to the group of rows within a window.
When
VAR_SAMP returns NULL, STDDEV_SAMP returns NULL.
Examples
The following example returns the statistical sample standard deviation for each household ID from the customer dimension table.
=> SELECT STDDEV_SAMP(household_id) FROM customer_dimension;
stddev_samp
------------------
8651.50842400771
(1 row)
39 - SUM [aggregate]
Computes the sum of an expression over a group of rows.
Computes the sum of an expression over a group of rows. SUM returns a DOUBLE PRECISION value for a floating-point expression. Otherwise, the return value is the same as the expression data type.
The SUM aggregate function differs from the
SUM analytic function, which computes the sum of an expression over a group of rows within a window.
Invokes the aggregate function for all rows in the group (default)
DISTINCT
Invokes the aggregate function for all distinct non-null values of the expression found in the group
expression
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument.
Overflow handling
If you encounter data overflow when using SUM(), use
SUM_FLOAT which converts the data to a floating point.
By default, Vertica allows silent numeric overflow when you call this function on numeric data types. For more information on this behavior and how to change it, seeNumeric data type overflow with SUM, SUM_FLOAT, and AVG.
Examples
The following query returns the total sum of the product_cost column.
=> SELECT SUM(product_cost) AS cost FROM product_dimension;
cost
---------
9042850
(1 row)
Invokes the aggregate function for all rows in the group (default).
DISTINCT
Invokes the aggregate function for all distinct non-null values of the expression found in the group.
expression
Any expression whose result is type DOUBLE PRECISION.
Overflow handling
By default, Vertica allows silent numeric overflow when you call this function on numeric data types. For more information on this behavior and how to change it, seeNumeric data type overflow with SUM, SUM_FLOAT, and AVG.
Examples
The following query returns the floating-point sum of the average price from the product table:
=> SELECT SUM_FLOAT(average_competitor_price) AS cost FROM product_dimension;
cost
----------
18181102
(1 row)
41 - TS_FIRST_VALUE
Processes the data that belongs to each time slice.
Processes the data that belongs to each time slice. A time series aggregate (TSA) function, TS_FIRST_VALUE returns the value at the start of the time slice, where an interpolation scheme is applied if the timeslice is missing, in which case the value is determined by the values corresponding to the previous (and next) timeslices based on the interpolation scheme of const (linear).
TS_FIRST_VALUE returns one output row per time slice, or one output row per partition per time slice if partition expressions are specified
An INTEGER or FLOAT expression on which to aggregate and interpolate.
IGNORE NULLS
The IGNORE NULLS behavior changes depending on a CONST or LINEAR interpolation scheme. See When Time Series Data Contains Nulls in Analyzing Data for details.
'CONST' | 'LINEAR'
Specifies the interpolation value as constant or linear:
CONST (default): New value is interpolated based on previous input records.
LINEAR: Values are interpolated in a linear slope based on the specified time slice.
Requirements
You must use an ORDER BY clause with a TIMESTAMP column.
Multiple time series aggregate functions
The same query can call multiple time series aggregate functions. They share the same gap-filling policy as defined by the TIMESERIES clause; however, each time series aggregate function can specify its own interpolation policy. For example:
=> SELECT slice_time, symbol,
TS_FIRST_VALUE(bid, 'const') fv_c,
TS_FIRST_VALUE(bid, 'linear') fv_l,
TS_LAST_VALUE(bid, 'const') lv_c
FROM TickStore
TIMESERIES slice_time AS '3 seconds'
OVER(PARTITION BY symbol ORDER BY ts);
Processes the data that belongs to each time slice.
Processes the data that belongs to each time slice. A time series aggregate (TSA) function, TS_LAST_VALUE returns the value at the end of the time slice, where an interpolation scheme is applied if the timeslice is missing. In this case the value is determined by the values corresponding to the previous (and next) timeslices based on the interpolation scheme of const (linear).
TS_LAST_VALUE returns one output row per time slice, or one output row per partition per time slice if partition expressions are specified.
An INTEGER or FLOAT expression on which to aggregate and interpolate.
IGNORE NULLS
The IGNORE NULLS behavior changes depending on a CONST or LINEAR interpolation scheme. See When Time Series Data Contains Nulls in Analyzing Data for details.
'CONST' | 'LINEAR'
Specifies the interpolation value as constant or linear:
CONST (default): New value is interpolated based on previous input records.
LINEAR: Values are interpolated in a linear slope based on the specified time slice.
Requirements
You must use the ORDER BY clause with a TIMESTAMP column.
Multiple time series aggregate functions
The same query can call multiple time series aggregate functions. They share the same gap-filling policy as defined by the TIMESERIES clause; however, each time series aggregate function can specify its own interpolation policy. For example:
=> SELECT slice_time, symbol,
TS_FIRST_VALUE(bid, 'const') fv_c,
TS_FIRST_VALUE(bid, 'linear') fv_l,
TS_LAST_VALUE(bid, 'const') lv_c
FROM TickStore
TIMESERIES slice_time AS '3 seconds'
OVER(PARTITION BY symbol ORDER BY ts);
Evaluates the population variance for each member of the group.
Evaluates the population variance for each member of the group. This is defined as the sum of squares of the difference of *expression*from the mean of expression, divided by the number of remaining rows:
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. VAR_POP returns the same data type as expression.
Related functions
This aggregate function differs from analytic function
VAR_POP, which computes the population variance of the current row with respect to the group of rows within a window.
Examples
The following example returns the population variance for each household ID in the customer table.
=> SELECT VAR_POP(household_id) FROM customer_dimension;
var_pop
------------------
74847050.0168393
(1 row)
44 - VAR_SAMP [aggregate]
Evaluates the sample variance for each row of the group.
Evaluates the sample variance for each row of the group. This is defined as the sum of squares of the difference of expression from the mean of expression divided by the number of remaining rows minus 1:
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. VAR_SAMP returns the same data type as expression.
Related functions
VAR_SAMP is semantically identical to nonstandard function
VARIANCE, which is provided for compatibility with other databases.
This aggregate function differs from analytic function
VAR_SAMP, which computes the sample variance of the current row with respect to the group of rows within a window.
Examples
The following example returns the sample variance for each household ID in the customer table.
=> SELECT VAR_SAMP(household_id) FROM customer_dimension;
var_samp
------------------
74848598.0106764
(1 row)
Evaluates the sample variance for each row of the group.
Evaluates the sample variance for each row of the group. This is defined as the sum of squares of the difference of expression from the mean of expression divided by the number of remaining rows minus 1.
Any NUMERIC data type or any non-numeric data type that can be implicitly converted to a numeric data type. VARIANCE returns the same data type as expression.
Related functions
The nonstandard function VARIANCE is provided for compatibility with other databases. It is semantically identical to
VAR_SAMP.
This aggregate function differs from analytic function
VARIANCE, which computes the sample variance of the current row with respect to the group of rows within a window.
Examples
The following example returns the sample variance for each household ID in the customer table.
=> SELECT VARIANCE(household_id) FROM customer_dimension;
variance
------------------
74848598.0106764
(1 row)
This clause is also supported for user-defined aggregate functions.
The order clause only specifies order within the result set of each group. The query can have its own ORDER BY clause, which has precedence over order that is specified by WITHIN GROUP ORDER BY, and orders the final result set.
Syntax
WITHIN GROUP (ORDER BY
{ column-expression [ ASC | DESC [ NULLS { FIRST | LAST | AUTO } ] ]
}[,...])
Parameters
column-expression
A column, constant, or arbitrary expression formed on columns, on which to sort grouped rows.
ASC | DESC
Specifies the ordering sequence as ascending (default) or descending.
NULLS {FIRST | LAST | AUTO}
Specifies whether to position null values first or last. Default positioning depends on whether the sort order is ascending or descending:
Ascending default: NULLS LAST
Descending default: NULLS FIRST
If you specify NULLS AUTO, Vertica chooses the positioning that is most efficient for this query, either NULLS FIRST or NULLS LAST.
If you omit all sort qualifiers, Vertica uses ASC NULLS LAST.