WIDTH_BUCKET
Constructs equiwidth histograms, in which the histogram range is divided into intervals (buckets) of identical sizes. In addition, values below the low bucket return 0, and values above the high bucket return bucket_count +1. Returns an integer value.
Behavior type
ImmutableSyntax
WIDTH_BUCKET ( expression, hist_min, hist_max, bucket_count )
Parameters
expression
- The expression for which the histogram is created. This expression must evaluate to a numeric or datetime value or to a value that can be implicitly converted to a numeric or datetime value. If
expression
evaluates to null, then theexpression
returns null. hist_min
- An expression that resolves to the low boundary of bucket 1. Must also evaluate to numeric or datetime values and cannot evaluate to null.
hist_max
- An expression that resolves to the high boundary of bucket bucket_count. Must also evaluate to a numeric or datetime value and cannot evaluate to null.
bucket_count
- An expression that resolves to a constant, indicating the number of buckets. This expression always evaluates to a positive INTEGER.
Notes
-
WIDTH_BUCKET divides a data set into buckets of equal width. For example, Age = 0–20, 20–40, 40–60, 60–80. This is known as an equiwidth histogram.
-
When using WIDTH_BUCKET pay attention to the minimum and maximum boundary values. Each bucket contains values equal to or greater than the base value of that bucket, so that age ranges of 0–20, 20–40, and so on, are actually 0–19.99 and 20–39.999.
-
WIDTH_BUCKET accepts the following data types: (FLOAT and/or INTEGER), (TIMESTAMP and/or DATE and/or TIMESTAMPTZ), or (INTERVAL and/or TIME).
Examples
The following example returns five possible values and has three buckets: 0 [Up to 100), 1 [100–300), 2 [300–500), 3 [500–700), and 4 [700 and up):
SELECT product_description, product_cost, WIDTH_BUCKET(product_cost, 100, 700, 3);
The following example creates a nine-bucket histogram on the annual_income column for customers in Connecticut who are female doctors. The results return the bucket number to an “Income” column, divided into eleven buckets, including an underflow and an overflow. Note that if customers had an annual incomes greater than the maximum value, they would be assigned to an overflow bucket, 10:
SELECT customer_name, annual_income, WIDTH_BUCKET (annual_income, 100000, 1000000, 9) AS "Income"
FROM public.customer_dimension WHERE customer_state='CT'
AND title='Dr.' AND customer_gender='Female' AND household_id < '1000'
ORDER BY "Income";
In the following result set, the reason there is a bucket 0 is because buckets are numbered from 1 to bucket_count
. Anything less than the given value of hist_min
goes in bucket 0, and anything greater than the given value of hist_max
goes in the bucket bucket_count+1
. In this example, bucket 9 is empty, and there is no overflow. The value 12,283 is less than 100,000, so it goes into the underflow bucket.
customer_name | annual_income | Income
--------------------+---------------+--------
Joanna A. Nguyen | 12283 | 0
Amy I. Nguyen | 109806 | 1
Juanita L. Taylor | 219002 | 2
Carla E. Brown | 240872 | 2
Kim U. Overstreet | 284011 | 2
Tiffany N. Reyes | 323213 | 3
Rebecca V. Martin | 324493 | 3
Betty . Roy | 476055 | 4
Midori B. Young | 462587 | 4
Martha T. Brown | 687810 | 6
Julie D. Miller | 616509 | 6
Julie Y. Nielson | 894910 | 8
Sarah B. Weaver | 896260 | 8
Jessica C. Nielson | 861066 | 8
(14 rows)