SUMMARIZE_CATCOL

Returns a statistical summary of categorical data input, in three columns:.

Returns a statistical summary of categorical data input, in three columns:

  • CATEGORY: Categorical levels, of the same SQL data type as the summarized column

  • COUNT: The number of category levels, of type INTEGER

  • PERCENT: Represents category percentage, of type FLOAT

Syntax

SUMMARIZE_CATCOL (target-column
        [ USING PARAMETERS TOPK = topk-value [, WITH_TOTALCOUNT = show-total] ] )
OVER()

Arguments

target-column
The name of the input column to summarize, one of the following data types:
  • BOOLEAN

  • FLOAT

  • INTEGER

  • DATE

  • CHAR/VARCHAR

Parameters

TOPK
Integer, specifies how many of the most frequent rows to include in the output.
WITH_TOTALCOUNT
A Boolean value that specifies whether the table contains a heading row that displays the total number of rows displayed in the target column, and a percent equal to 100.

Default:true

Examples

This example shows the categorical summary for the current_salary column in the salary_data table. The output of the query shows the column category, count, and percent. The first column gives the categorical levels, with the same SQL data type as the input column, the second column gives a count of that value, and the third column gives a percentage.

=> SELECT SUMMARIZE_CATCOL (current_salary USING PARAMETERS TOPK = 5) OVER() FROM salary_data;
CATEGORY | COUNT | PERCENT
---------+-------+---------
         |  1000 |     100
   39004 |     2 |     0.2
   35321 |     1 |     0.1
   36313 |     1 |     0.1
   36538 |     1 |     0.1
   36562 |     1 |     0.1
(6 rows)