PERCENTILE_CONT [analytic]
An inverse distribution function where, for each row, PERCENTILE_CONT returns the value that would fall into the specified percentile among a set of values in each partition within a window. For example, if the argument to the function is 0.5, the result of the function is the median of the data set (50th percentile). PERCENTILE_CONT assumes a continuous distribution data model. NULL values are ignored.
PERCENTILE_CONT computes the percentile by first computing the row number where the percentile row would exist. For example:
row-number = 1 + percentile-value * (num-partition-rows -1)
If row-number
is a whole number (within an error of 0.00001), the percentile is the value of row row-number
.
Otherwise, Vertica interpolates the percentile value between the value of the CEILING(
row-number
)
row and the value of the FLOOR(
row-number
)
row. In other words, the percentile is calculated as follows:
( CEILING( row-number) - row-number ) * ( value of FLOOR(row-number) row )
+ ( row-number - FLOOR(row-number) ) * ( value of CEILING(row-number) row)
Note
If the percentile value is 0.5, PERCENTILE_CONT returns the same result set as the function MEDIAN.Behavior type
ImmutableSyntax
PERCENTILE_CONT ( percentile ) WITHIN GROUP ( ORDER BY expression [ ASC | DESC ] ) OVER ( [ window-partition-clause ] )
Parameters
percentile
- Percentile value, a FLOAT constant that ranges from 0 to 1 (inclusive).
WITHIN GROUP (ORDER BY
expression
)
- Specifies how to sort data within each group. ORDER BY takes only one column/expression that must be INTEGER, FLOAT, INTERVAL, or NUMERIC data type. NULL values are discarded.
The
WITHIN GROUP(ORDER BY)
clause does not guarantee the order of the SQL result. To order the final result , use the SQL ORDER BY clause set. ASC | DESC
- Specifies the ordering sequence as ascending (default) or descending.
Specifying ASC or DESC in the
WITHIN GROUP
clause affects results as long as thepercentile
is not0.5
. OVER()
- See Analytic Functions
Examples
This query computes the median annual income per group for the first 300 customers in Wisconsin and the District of Columbia.
=> SELECT customer_state, customer_key, annual_income, PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY annual_income)
OVER (PARTITION BY customer_state) AS PERCENTILE_CONT
FROM customer_dimension WHERE customer_state IN ('DC','WI') AND customer_key < 300
ORDER BY customer_state, customer_key;
customer_state | customer_key | annual_income | PERCENTILE_CONT
----------------+--------------+---------------+-----------------
DC | 52 | 168312 | 483266.5
DC | 118 | 798221 | 483266.5
WI | 62 | 283043 | 377691
WI | 139 | 472339 | 377691
(4 rows)
This query computes the median annual income per group for all customers in Wisconsin and the District of Columbia.
=> SELECT customer_state, customer_key, annual_income, PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY annual_income)
OVER (PARTITION BY customer_state) AS PERCENTILE_CONT
FROM customer_dimension WHERE customer_state IN ('DC','WI') ORDER BY customer_state, customer_key;
customer_state | customer_key | annual_income | PERCENTILE_CONT
----------------+--------------+---------------+-----------------
DC | 52 | 168312 | 483266.5
DC | 118 | 798221 | 483266.5
DC | 622 | 220782 | 555088
DC | 951 | 178453 | 555088
DC | 972 | 961582 | 555088
DC | 1286 | 760445 | 555088
DC | 1434 | 44836 | 555088
...
WI | 62 | 283043 | 377691
WI | 139 | 472339 | 377691
WI | 359 | 42242 | 517717
WI | 364 | 867543 | 517717
WI | 403 | 509031 | 517717
WI | 455 | 32000 | 517717
WI | 485 | 373129 | 517717
...
(1353 rows)