WIDTH_BUCKET
Constructs equiwidth histograms, in which the histogram range is divided into intervals (buckets) of identical sizes. In addition, values below the low bucket return 0, and values above the high bucket return bucket-count
+1. Returns an integer value.
Behavior type
ImmutableSyntax
WIDTH_BUCKET ( expression, hist-min, hist-max, bucket-count )
Arguments
expression
- The expression for which the histogram is created. This expression must resolve to a numeric or datetime value or a value that can be implicitly converted to a numeric or datetime value. If *
expression
*evaluates to null, then the *expression
*returns null. hist-min
- Resolves to the low boundary of
bucket-count
, a non-null numeric or datetime value. hist-max
- Resolves to the high boundary of
bucket-count
, a non-null numeric or datetime value. bucket-count
- Resolves to an INTEGER constant that indicates the number of buckets.
Notes
-
WIDTH_BUCKET divides a data set into buckets of equal width. For example, Age = 0–20, 20–40, 40–60, 60–80. This is known as an equiwidth histogram.
-
When using WIDTH_BUCKET pay attention to the minimum and maximum boundary values. Each bucket contains values equal to or greater than the base value of that bucket, so that age ranges of 0–20, 20–40, and so on, are actually 0–19.99 and 20–39.999.
-
WIDTH_BUCKET accepts the following data types: (FLOAT and/or INTEGER), (TIMESTAMP and/or DATE and/or TIMESTAMPTZ), or (INTERVAL and/or TIME).
Examples
The following example returns five possible values and has three buckets: 0 [Up to 100), 1 [100–300), 2 [300–500), 3 [500–700), and 4 [700 and up):
SELECT product_description, product_cost, WIDTH_BUCKET(product_cost, 100, 700, 3);
The following example creates a nine-bucket histogram on the annual_income column for customers in Connecticut who are female doctors. The results return the bucket number to an Income
column, divided into eleven buckets, including an underflow and an overflow. Note that if customers had annual incomes greater than the maximum value, they would be assigned to an overflow bucket, 10:
SELECT customer_name, annual_income, WIDTH_BUCKET (annual_income, 100000, 1000000, 9) AS "Income"
FROM public.customer_dimension WHERE customer_state='CT'
AND title='Dr.' AND customer_gender='Female' AND household_id < '1000'
ORDER BY "Income";
In the following result set, the reason there is a bucket 0 is because buckets are numbered from 1 to bucket_count
. Anything less than the given value of hist_min
goes in bucket 0, and anything greater than the given value of hist_max
goes in the bucket bucket_count+1
. In this example, bucket 9 is empty, and there is no overflow. The value 12,283 is less than 100,000, so it goes into the underflow bucket.
customer_name | annual_income | Income
--------------------+---------------+--------
Joanna A. Nguyen | 12283 | 0
Amy I. Nguyen | 109806 | 1
Juanita L. Taylor | 219002 | 2
Carla E. Brown | 240872 | 2
Kim U. Overstreet | 284011 | 2
Tiffany N. Reyes | 323213 | 3
Rebecca V. Martin | 324493 | 3
Betty . Roy | 476055 | 4
Midori B. Young | 462587 | 4
Martha T. Brown | 687810 | 6
Julie D. Miller | 616509 | 6
Julie Y. Nielson | 894910 | 8
Sarah B. Weaver | 896260 | 8
Jessica C. Nielson | 861066 | 8
(14 rows)