APPROXIMATE_COUNT_DISTINCT
Returns the number of distinct non-NULL values in a data set.
Behavior type
ImmutableSyntax
APPROXIMATE_COUNT_DISTINCT ( expression[, error-tolerance ] )
Parameters
- expression
- Value to be evaluated using any data type that supports equality comparison.
- error-tolerance
- Numeric value that represents the desired percentage of error tolerance, distributed around the value returned by this function. The smaller the error tolerance, the closer the approximation. - You can set - error-toleranceto a minimum value of 0.88. Vertica imposes no maximum restriction, but any value greater than 5 is implemented with 5% error tolerance.- If you omit this argument, OpenText™ Analytics Database uses an error tolerance of 1.25(%). 
Restrictions
APPROXIMATE_COUNT_DISTINCT and DISTINCT aggregates cannot be in the same query block.
Error tolerance
APPROXIMATE_COUNT_DISTINCT(x, error-tolerance) returns a value equal to COUNT(DISTINCT x), with an error that is lognormally distributed with standard deviation.
Parameter error-tolerance is optional. Supply this argument to specify the desired standard deviation. error-tolerance is defined as 2.17 standard deviations, which corresponds to a 97 percent confidence interval:
standard-deviation = error-tolerance / 2.17
For example:
- 
error-tolerance= 1Default setting, corresponds to a standard deviation 97 percent of the time, APPROXIMATE_COUNT_DISTINCT( x,5) returns a value between:- 
COUNT(DISTINCTx) * 0.99
- 
COUNT(DISTINCTx) * 1.01
 
- 
- 
error-tolerance= 597 percent of the time, APPROXIMATE_COUNT_DISTINCT(x)returns a value between:- 
COUNT(DISTINCTx) * 0.95
- 
COUNT(DISTINCTx) * 1.05
 
- 
A 99 percent confidence interval corresponds to 2.58 standard deviations. To set error-tolerance confidence level corresponding to 99 (instead of a 97) percent , multiply error-tolerance by 2.17 / 2.58 = 0.841.
For example, if you specify error-tolerance as 5 * 0.841 = 4.2, APPROXIMATE_COUNT_DISTINCT(x,4.2) returns values 99 percent of the time between:
- 
COUNT (DISTINCTx) * 0.95
- 
COUNT (DISTINCTx) * 1.05
Examples
Count the total number of distinct values in column product_key from table store.store_sales_fact:
=> SELECT COUNT(DISTINCT product_key) FROM store.store_sales_fact;
 COUNT
-------
 19982
(1 row)
Count the approximate number of distinct values in product_key with various error tolerances. The smaller the error tolerance, the closer the approximation:
=> SELECT APPROXIMATE_COUNT_DISTINCT(product_key,5) AS five_pct_accuracy,
   APPROXIMATE_COUNT_DISTINCT(product_key,1) AS one_pct_accuracy,
   APPROXIMATE_COUNT_DISTINCT(product_key,.88) AS point_eighteight_pct_accuracy
   FROM store.store_sales_fact;
 five_pct_accuracy | one_pct_accuracy | point_eighteight_pct_accuracy
-------------------+------------------+-------------------------------
             19431 |            19921 |                         19921
(1 row)