SUMMARIZE_CATCOL
Returns a statistical summary of categorical data input, in three columns:.
Returns a statistical summary of categorical data input, in three columns:
-
CATEGORY: Categorical levels, of the same SQL data type as the summarized column
-
COUNT: The number of category levels, of type INTEGER
-
PERCENT: Represents category percentage, of type FLOAT
Syntax
SUMMARIZE_CATCOL (target-column
[ USING PARAMETERS TOPK = topk-value [, WITH_TOTALCOUNT = show-total] ] )
OVER()
Arguments
target-column
- The name of the input column to summarize, one of the following data types:
-
BOOLEAN
-
FLOAT
-
INTEGER
-
DATE
-
CHAR/VARCHAR
-
Parameters
TOPK
- Integer, specifies how many of the most frequent rows to include in the output.
WITH_TOTALCOUNT
- A Boolean value that specifies whether the table contains a heading row that displays the total number of rows displayed in the target column, and a percent equal to 100.
Default:
true
Examples
This example shows the categorical summary for the current_salary
column in the salary_data
table. The output of the query shows the column category, count, and percent. The first column gives the categorical levels, with the same SQL data type as the input column, the second column gives a count of that value, and the third column gives a percentage.
=> SELECT SUMMARIZE_CATCOL (current_salary USING PARAMETERS TOPK = 5) OVER() FROM salary_data;
CATEGORY | COUNT | PERCENT
---------+-------+---------
| 1000 | 100
39004 | 2 | 0.2
35321 | 1 | 0.1
36313 | 1 | 0.1
36538 | 1 | 0.1
36562 | 1 | 0.1
(6 rows)