WIDTH_BUCKET

Constructs equiwidth histograms, in which the histogram range is divided into intervals (buckets) of identical sizes.

Constructs equiwidth histograms, in which the histogram range is divided into intervals (buckets) of identical sizes. In addition, values below the low bucket return 0, and values above the high bucket return bucket_count +1. Returns an integer value.

Behavior type

Immutable

Syntax

WIDTH_BUCKET ( expression, hist_min, hist_max, bucket_count )

Parameters

expression
The expression for which the histogram is created. This expression must evaluate to a numeric or datetime value or to a value that can be implicitly converted to a numeric or datetime value. If expression evaluates to null, then the expression returns null.
hist_min
An expression that resolves to the low boundary of bucket 1. Must also evaluate to numeric or datetime values and cannot evaluate to null.
hist_max
An expression that resolves to the high boundary of bucket bucket_count. Must also evaluate to a numeric or datetime value and cannot evaluate to null.
bucket_count
An expression that resolves to a constant, indicating the number of buckets. This expression always evaluates to a positive INTEGER.

Notes

  • WIDTH_BUCKET divides a data set into buckets of equal width. For example, Age = 0–20, 20–40, 40–60, 60–80. This is known as an equiwidth histogram.

  • When using WIDTH_BUCKET pay attention to the minimum and maximum boundary values. Each bucket contains values equal to or greater than the base value of that bucket, so that age ranges of 0–20, 20–40, and so on, are actually 0–19.99 and 20–39.999.

  • WIDTH_BUCKET accepts the following data types: (FLOAT and/or INTEGER), (TIMESTAMP and/or DATE and/or TIMESTAMPTZ), or (INTERVAL and/or TIME).

Examples

The following example returns five possible values and has three buckets: 0 [Up to 100), 1 [100–300), 2 [300–500), 3 [500–700), and 4 [700 and up):

SELECT product_description, product_cost, WIDTH_BUCKET(product_cost, 100, 700, 3);

The following example creates a nine-bucket histogram on the annual_income column for customers in Connecticut who are female doctors. The results return the bucket number to an “Income” column, divided into eleven buckets, including an underflow and an overflow. Note that if customers had an annual incomes greater than the maximum value, they would be assigned to an overflow bucket, 10:

SELECT customer_name, annual_income, WIDTH_BUCKET (annual_income, 100000, 1000000, 9) AS "Income"
FROM public.customer_dimension WHERE customer_state='CT'
AND title='Dr.' AND customer_gender='Female' AND household_id < '1000'
ORDER BY "Income";

In the following result set, the reason there is a bucket 0 is because buckets are numbered from 1 to bucket_count. Anything less than the given value of hist_min goes in bucket 0, and anything greater than the given value of hist_max goes in the bucket bucket_count+1. In this example, bucket 9 is empty, and there is no overflow. The value 12,283 is less than 100,000, so it goes into the underflow bucket.

   customer_name    | annual_income | Income
--------------------+---------------+--------
 Joanna A. Nguyen   |         12283 |      0
 Amy I. Nguyen      |        109806 |      1
 Juanita L. Taylor  |        219002 |      2
 Carla E. Brown     |        240872 |      2
 Kim U. Overstreet  |        284011 |      2
 Tiffany N. Reyes   |        323213 |      3
 Rebecca V. Martin  |        324493 |      3
 Betty . Roy        |        476055 |      4
 Midori B. Young    |        462587 |      4
 Martha T. Brown    |        687810 |      6
 Julie D. Miller    |        616509 |      6
 Julie Y. Nielson   |        894910 |      8
 Sarah B. Weaver    |        896260 |      8
 Jessica C. Nielson |        861066 |      8
(14 rows)

See also