WIDTH_BUCKET

Constructs equiwidth histograms, in which the histogram range is divided into intervals (buckets) of identical sizes.

Constructs equiwidth histograms, in which the histogram range is divided into intervals (buckets) of identical sizes. In addition, values below the low bucket return 0, and values above the high bucket return bucket-count +1. Returns an integer value.

Behavior type

Immutable

Syntax

WIDTH_BUCKET ( expression, hist-min, hist-max, bucket-count )

Arguments

expression
The expression for which the histogram is created. This expression must resolve to a numeric or datetime value or a value that can be implicitly converted to a numeric or datetime value. If *expression *evaluates to null, then the *expression *returns null.
hist-min
Resolves to the low boundary of bucket-count, a non-null numeric or datetime value.
hist-max
Resolves to the high boundary of bucket-count, a non-null numeric or datetime value.
bucket-count
Resolves to an INTEGER constant that indicates the number of buckets.

Notes

  • WIDTH_BUCKET divides a data set into buckets of equal width. For example, Age = 0–20, 20–40, 40–60, 60–80. This is known as an equiwidth histogram.

  • When using WIDTH_BUCKET pay attention to the minimum and maximum boundary values. Each bucket contains values equal to or greater than the base value of that bucket, so that age ranges of 0–20, 20–40, and so on, are actually 0–19.99 and 20–39.999.

  • WIDTH_BUCKET accepts the following data types: (FLOAT and/or INTEGER), (TIMESTAMP and/or DATE and/or TIMESTAMPTZ), or (INTERVAL and/or TIME).

Examples

The following example returns five possible values and has three buckets: 0 [Up to 100), 1 [100–300), 2 [300–500), 3 [500–700), and 4 [700 and up):

SELECT product_description, product_cost, WIDTH_BUCKET(product_cost, 100, 700, 3);

The following example creates a nine-bucket histogram on the annual_income column for customers in Connecticut who are female doctors. The results return the bucket number to an Income column, divided into eleven buckets, including an underflow and an overflow. Note that if customers had annual incomes greater than the maximum value, they would be assigned to an overflow bucket, 10:

SELECT customer_name, annual_income, WIDTH_BUCKET (annual_income, 100000, 1000000, 9) AS "Income"
FROM public.customer_dimension WHERE customer_state='CT'
AND title='Dr.' AND customer_gender='Female' AND household_id < '1000'
ORDER BY "Income";

In the following result set, the reason there is a bucket 0 is because buckets are numbered from 1 to bucket_count. Anything less than the given value of hist_min goes in bucket 0, and anything greater than the given value of hist_max goes in the bucket bucket_count+1. In this example, bucket 9 is empty, and there is no overflow. The value 12,283 is less than 100,000, so it goes into the underflow bucket.

   customer_name    | annual_income | Income
--------------------+---------------+--------
 Joanna A. Nguyen   |         12283 |      0
 Amy I. Nguyen      |        109806 |      1
 Juanita L. Taylor  |        219002 |      2
 Carla E. Brown     |        240872 |      2
 Kim U. Overstreet  |        284011 |      2
 Tiffany N. Reyes   |        323213 |      3
 Rebecca V. Martin  |        324493 |      3
 Betty . Roy        |        476055 |      4
 Midori B. Young    |        462587 |      4
 Martha T. Brown    |        687810 |      6
 Julie D. Miller    |        616509 |      6
 Julie Y. Nielson   |        894910 |      8
 Sarah B. Weaver    |        896260 |      8
 Jessica C. Nielson |        861066 |      8
(14 rows)

See also