This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

GROUP BY clause

Use the GROUP BY clause with aggregate functions in a SELECT statement to collect data across multiple records.

Use the GROUP BY clause with aggregate functions in a SELECT statement to collect data across multiple records. Vertica groups the results into one or more sets of rows that match an expression.

The GROUP BY clause without aggregates is similar to using SELECT DISTINCT.

ROLLUP is an extension to the GROUP BY clause. ROLLUP performs subtotal aggregations.

Syntax

GROUP BY [/*+GBYTYPE(algorithm)*/] { expression | aggregate-expression }[,...]

Arguments

/*+GBYTYPE(algorithm)*/
Specifies which algorithm has precedence for implementing this GROUP BY clause, over the algorithm the Vertica query optimizer might otherwise choose. You can set algorithm to one of the following values:
  • HASH: GROUPBY HASH algorithm

  • PIPE: GROUPBY PIPELINED algorithm

For more information about both algorithms, see GROUP BY implementation options.

expression
Any expression, including constants and column references in the tables specified in the FROM clause. For example:
column,... column, (expression)
aggregate-expression
An ordered list of columns, expressions, CUBE, GROUPING SETS, or ROLLUP aggregates.

You can include CUBE and ROLLUP aggregates within a GROUPING SETS aggregate. CUBE and ROLLUP aggregates can result in a large amount of output. In that case, use GROUPING SETS to return only certain results.

You cannot include any aggregates within a CUBE or ROLLUP expression.

You can append multiple GROUPING SETS, CUBE, or ROLLUP aggregates in the same query. For example:

  
GROUP BY a,b,c,d, ROLLUP(a,b)
GROUP BY a,b,c,d, CUBE((a,b),c,d)
GROUP BY a,b,c,d, CUBE(a,b), ROLLUP (c,d)
GROUP BY ROLLUP(a), CUBE(b), GROUPING SETS(c)
GROUP BY a,b,c,d, GROUPING SETS ((a,d),(b,c),CUBE(a,b))
GROUP BY a,b,c,d, GROUPING SETS ((a,d),(b,c),(a,b),(a),(b),())

Usage considerations

  • expression cannot include aggregate functions. However, you can use the GROUP BY clause with CUBE, GROUPING SETS, and ROLLUP to return summary values for each group.

  • When you create a GROUP BY clause, you must include all non-aggregated columns that appear in the SELECT list.

  • If the GROUP BY clause includes a WHERE clause, Vertica ignores all rows that do not satisfy the WHERE clause.

Examples

This example shows how to use the WHERE clause with GROUP BY. In this case, the example retrieves all employees whose last name begins with S, and ignores all rows that do not meet this criteria. The GROUP BY clause uses the ILIKE function to retrieve only last names beginning with S. The aggregate function SUM computes the total vacation days for each group.

=> SELECT employee_last_name, SUM(vacation_days)
   FROM employee_dimension
   WHERE employee_last_name ILIKE 'S%'
   GROUP BY employee_last_name;
 employee_last_name | SUM
--------------------+------
 Sanchez            | 2892
 Smith              | 2672
 Stein              | 2660
(3 rows)

The GROUP BY clause in the following example groups results by vendor region, and vendor region's biggest deal:


=> SELECT vendor_region, MAX(deal_size) AS "Biggest Deal"
   FROM vendor_dimension
   GROUP BY vendor_region;
 vendor_region | Biggest Deal
---------------+--------------
 East          |       990889
 MidWest       |       699163
 NorthWest     |        76101
 South         |       854136
 SouthWest     |       609807
 West          |       964005
(6 rows)

The following query modifies the previous one with a HAVING clause, which specifies to return only groups whose maximum deal size exceeds $900,000:

=> SELECT vendor_region, MAX(deal_size) as "Biggest Deal"
   FROM vendor_dimension
   GROUP BY vendor_region
   HAVING MAX(deal_size) > 900000;
 vendor_region | Biggest Deal
---------------+--------------
 East          |       990889
 West          |       964005
(2 rows)

You can use the GROUP BY clause with one-dimensional arrays of scalar types. In the following example, grants is an ARRAY[VARCHAR] and grant_values is an ARRAY[INT].

=> CREATE TABLE employees (id INT, department VARCHAR(50), grants ARRAY[VARCHAR], grant_values ARRAY[INT]);

=> COPY employees FROM STDIN;
42|Physics|[US-7376,DARPA-1567]|[65000,135000]
36|Physics|[US-7376,DARPA-1567]|[10000,25000]
33|Physics|[US-7376]|[30000]
36|Astronomy|[US-7376,DARPA-1567]|[5000,4000]
\.

=> SELECT department, grants, SUM(apply_sum(grant_values)) FROM employees GROUP BY grants, department;
 department |          grants          |  SUM
------------+--------------------------+--------
 Physics    | ["US-7376","DARPA-1567"] | 235000
 Astronomy  | ["US-7376","DARPA-1567"] |   9000
 Physics    | ["US-7376"]              |  30000
(3 rows)

The GROUP BY clause without aggregates is similar to using SELECT DISTINCT. For example, the following two queries return the same results:

=> SELECT DISTINCT household_id FROM customer_dimension;
=> SELECT household_id FROM customer_dimension GROUP BY household_id;

See also

1 - CUBE aggregate

Automatically performs all possible aggregations of the specified columns, as an extension to the GROUP BY clause.

Automatically performs all possible aggregations of the specified columns, as an extension to the GROUP BY clause.

You can use the ROLLUP clause with three grouping functions:

Syntax

GROUP BY group-expression[,...]

Parameters

group-expression
One or both of the following:
  • An expression that is not an aggregate or a grouping function that includes constants and column references in FROM-specified tables. For example:

    column1, (column2+1), column3+column4
    
  • A multilevel expression, one of the following:

    • ROLLUP

    • CUBE

    • GROUPING SETS

Restrictions

  • GROUP BY CUBE does not order data. If you want to sort data, use the ORDER BY clause. The ORDER BY clause must come after the GROUP BY clause.
  • You can use CUBE inside a GROUPING SETS expression, but not inside a ROLLUP expression or another CUBE expression.

Levels of CUBE aggregation

If n is the number of grouping columns, CUBE creates 2n levels of aggregations. For example:

CUBE (A, B, C) creates all possible groupings, resulting in eight groups:

  • (A, B, C)
  • (A, B)
  • (A, C)
  • (B, C)
  • (A)
  • (B)
  • (C)
  • ()

If you increase the number of CUBE columns, the number of CUBE groupings increases exponentially. The CUBE query may be resource intensive and produce combinations that are not of interest. In that case, consider using theGROUPING SETS aggregate, which allows you to choose specific groupings.

Examples

Using CUBE to return all groupings

Suppose you have a table that contains information about family expenses for books and electricity:

=> SELECT * FROM expenses ORDER BY Category, Year;
 Year |  Category   | Amount
------+-------------+--------
2005  | Books       |  39.98
2007  | Books       |  29.99
2008  | Books       |  29.99
2005  | Electricity | 109.99
2006  | Electricity | 109.99
2007  | Electricity | 229.98

To aggregate the data by both Category and Year using the CUBE aggregate:

=> SELECT Category, Year, SUM(Amount) FROM expenses
   GROUP BY CUBE(Category, Year) ORDER BY 1, 2, GROUPING_ID();
 Category    | Year |  SUM
-------------+------+--------
 Books       | 2005 |  39.98
 Books       | 2007 |  29.99
 Books       | 2008 |  29.99
 Books       |      |  99.96
 Electricity | 2005 | 109.99
 Electricity | 2006 | 109.99
 Electricity | 2007 | 229.98
 Electricity |      | 449.96
             | 2005 | 149.97
             | 2006 | 109.99
             | 2007 | 259.97
             | 2008 |  29.99
             |      | 549.92

The results include subtotals for each category and year, and a grand total ($549.92).

Using CUBE with the HAVING clause

This example shows how you can restrict the GROUP BY results, use the HAVING clause with the CUBE aggregate. This query returns only the category totals and the full total:

=> SELECT Category, Year, SUM(Amount) FROM expenses
   GROUP BY CUBE(Category,Year) HAVING GROUPING(Year)=1;
 Category    | Year |  SUM
-------------+------+--------
 Books       |      |  99.96
 Electricity |      | 449.96
             |      | 549.92

The next query returns only the aggregations for the two categories for each year. The GROUPING ID function specifies to omit the grand total ($549.92):

=> SELECT Category, Year, SUM (Amount) FROM expenses
   GROUP BY CUBE(Category,Year) HAVING GROUPING_ID(Category,Year)<2
   ORDER BY 1, 2, GROUPING_ID();
 Category   | Year |  SUM
------------+------+--------
 Books      | 2005 |  39.98
 Books      | 2007 |  29.99
 Books      | 2008 |  29.99
 Books      |      |  99.96
 Electrical | 2005 | 109.99
 Electrical | 2006 | 109.99
 Electrical | 2007 | 229.98
 Electrical |      | 449.96

See also

2 - GROUPING SETS aggregate

The GROUPING SETS aggregate is an extension to the GROUP BY clause that automatically performs subtotal aggregations on groupings that you specify.

The GROUPING SETS aggregate is an extension to the GROUP BY clause that automatically performs subtotal aggregations on groupings that you specify.

You can use the GROUPING SETS clause with three grouping functions:

To sort data, use the ORDER BY clause. The ORDER BY clause must follow the GROUP BY clause.

Syntax

GROUP BY group-expression[,...]

Parameters

group-expression
One or both of the following:
  • An expression that is not an aggregate or a grouping function that includes constants and column references in FROM-specified tables. For example:

    column1, (column2+1), column3+column4

  • A multilevel expression, one of the following:

    • ROLLUP

    • CUBE

    • GROUPING SETS

Defining the groupings

GROUPING SETS allows you to specify exactly which groupings you want in the results. You can also concatenate the groupings as follows:

The following example clauses result in the groupings shown.

This clause... Defines groupings...
...GROUP BY GROUPING SETS(A,B,C,D)... (A), (B), (C), (D)
...GROUP BY GROUPING SETS((A),(B),(C),(D))... (A), (B), (C), (D)
...GROUP BY GROUPING SETS((A,B,C,D))... (A, B, C, D)
...GROUP BY GROUPING SETS(A,B),GROUPING SETS(C,D)... (A, C), (B, C), (A, D), (B, C)
...GROUP BY GROUPING SETS((A,B)),GROUPING SETS(C,D)... (A, B, C), (A, B, D)
...GROUP BY GROUPING SETS(A,B),GROUPING SETS(ROLLUP(C,D))... (A,B), (A,B,C), (A,B,C,D)
...GROUP BY A,B,C,GROUPING SETS(ROLLUP(C, D))...

(A, B, C, D), (A, B, C), (A, B, C)

The clause contains two groups (A, B, C). In the HAVING clause, use the GROUP_ID function as a predicate, to eliminate the second grouping.

Example: selecting groupings

This example shows how to select only those groupings you want. Suppose you want to aggregate on columns only, and you do not need the grand total. The first query omits the total. In the second query, you add () to the GROUPING SETS list to get the total. Use the ORDER BY clause to sort the results by grouping:

=> SELECT Category, Year, SUM(Amount) FROM expenses
   GROUP BY GROUPING SETS((Category, Year), (Year))
   ORDER BY 1, 2, GROUPING_ID();
  Category  | Year |  SUM
------------+------+--------
 Books      | 2005 |  39.98
 Books      | 2007 |  29.99
 Books      | 2008 |  29.99
 Electrical | 2005 | 109.99
 Electrical | 2006 | 109.99
 Electrical | 2007 | 229.98
            | 2005 | 149.97
            | 2006 | 109.99
            | 2007 | 259.97
            | 2008 |  29.99
=> SELECT Category, Year, SUM(Amount) FROM expenses
   GROUP BY GROUPING SETS((Category, Year), (Year), ())
   ORDER BY 1, 2, GROUPING_ID();
  Category  | Year |  SUM
------------+------+--------
 Books      | 2005 |  39.98
 Books      | 2007 |  29.99
 Books      | 2008 |  29.99
 Electrical | 2005 | 109.99
 Electrical | 2006 | 109.99
 Electrical | 2007 | 229.98
            | 2005 | 149.97
            | 2006 | 109.99
            | 2007 | 259.97
            | 2008 |  29.99
            |      | 549.92

See also

3 - ROLLUP aggregate

Automatically performs subtotal aggregations as an extension to the GROUP BY clause.

Automatically performs subtotal aggregations as an extension to the GROUP BY clause. ROLLUP performs these aggregations across multiple dimensions, at different levels, within a single SQL query.

You can use the ROLLUP clause with three grouping functions:

Syntax

ROLLUP grouping-expression[,...]

Parameters

group-expression
One or both of the following:
  • An expression that is not an aggregate or a grouping function that includes constants and column references in FROM-specified tables. For example:

    column1, (column2+1), column3+column4

  • A multilevel expression, one of the following:

    • ROLLUP

    • CUBE

    • GROUPING SETS

Restrictions

GROUP BY ROLLUP does not sort results. To sort data, an ORDER BY clause must follow the GROUP BY clause.

Levels of aggregation

If n is the number of grouping columns, ROLLUP creates n+1 levels of subtotals and grand total. Because ROLLUP removes the right-most column at each step, specify column order carefully.

Suppose that ROLLUP(A, B, C) creates four groups:

  • (A, B, C)

  • (A, B)

  • (A)

  • ()

Because ROLLUP removes the right-most column at each step, there are no groups for (A, C) and (B, C).

If you enclose two or more columns in parentheses, GROUP BY treats them as a single entity. For example:

  • ROLLUP(A, B, C) creates four groups:

    
    (A, B, C)
    (A, B)
    (A)
    ()
    
  • ROLLUP((A, B), C) treats (A, B) as a single entity and creates three groups:

    (A, B, C)
    (A, B)
    ()
    

Example: aggregating the full data set

The following example shows how to use the GROUP BY clause to determine family expenses for electricity and books over several years. The SUM aggregate function computes the total amount of money spent in each category per year.

Suppose you have a table that contains information about family expenses for books and electricity:

=> SELECT * FROM expenses ORDER BY Category, Year;
 Year |  Category   | Amount
------+-------------+--------
2005  | Books       |  39.98
2007  | Books       |  29.99
2008  | Books       |  29.99
2005  | Electricity | 109.99
2006  | Electricity | 109.99
2007  | Electricity | 229.98

For the expenses table, ROLLUP computes the subtotals in each category between 2005–2007:

  • Books: $99.96

  • Electricity: $449.96

  • Grand total: $549.92.

Use the ORDER BY clause to sort the results:

=> SELECT Category, Year, SUM(Amount) FROM expenses
   GROUP BY ROLLUP(Category, Year) ORDER BY 1,2, GROUPING_ID();
 Category    | Year |  SUM
-------------+------+--------
 Books       | 2005 |  39.98
 Books       | 2007 |  29.99
 Books       | 2008 |  29.99
 Books       |      |  99.96
 Electricity | 2005 | 109.99
 Electricity | 2006 | 109.99
 Electricity | 2007 | 229.98
 Electricity |      | 449.96
             |      | 549.92

Example: using ROLLUP with the HAVING clause

This example shows how to use the HAVING clause with ROLLUP to restrict the GROUP BY results. The following query produces only those ROLLUP categories where year is subtotaled, based on the expression in the GROUPING function:

=> SELECT Category, Year, SUM(Amount) FROM expenses
   GROUP BY ROLLUP(Category,Year) HAVING GROUPING(Year)=1
   ORDER BY 1, 2, GROUPING_ID();
 Category    | Year |  SUM
-------------+------+--------
 Books       |      |  99.96
 Electricity |      | 449.96
             |      | 549.92

The next example rolls up on (Category, Year), but not on the full results. The GROUPING_ID function specifies to aggregate less than three levels:

=> SELECT Category, Year, SUM(Amount) FROM expenses
   GROUP BY ROLLUP(Category,Year) HAVING GROUPING_ID(Category,Year)<3
   ORDER BY 1, 2, GROUPING_ID();
 Category    | Year |  SUM
-------------+------+--------
 Books       | 2005 |  39.98
 Books       | 2007 |  29.99
 Books       | 2008 |  29.99
 Books       |      |  99.96
 Electricity | 2005 | 109.99
 Electricity | 2006 | 109.99
 Electricity | 2007 | 229.98
 Electricity |      | 449.96

See also