This is the multipage printable view of this section.
Click here to print.
Return to the regular view of this page.
GROUP BY clause
Use the GROUP BY clause with aggregate functions in a SELECT statement to collect data across multiple records.
Use the GROUP BY
clause with aggregate functions in a SELECT
statement to collect data across multiple records. Vertica groups the results into one or more sets of rows that match an expression.
The GROUP BY
clause without aggregates is similar to using SELECT DISTINCT
.
ROLLUP
is an extension to the GROUP BY
clause. ROLLUP
performs subtotal aggregations.
Syntax
GROUP BY [/*+GBYTYPE(algorithm)*/] { expression  aggregateexpression }[,...]
Arguments

/*+GBYTYPE(algorithm)*/
 Specifies which algorithm has precedence for implementing this
GROUP BY
clause, over the algorithm the Vertica query optimizer might otherwise choose. You can set algorithm
to one of the following values:
For more information about both algorithms, see GROUP BY implementation options.
expression
 Any expression, including constants and column references in the tables specified in the FROM clause. For example:
column,... column, (expression)
aggregateexpression
 An ordered list of columns, expressions,
CUBE, GROUPING SETS
, or ROLLUP
aggregates.
You can include CUBE
and ROLLUP
aggregates within a GROUPING SETS
aggregate. CUBE
and ROLLUP
aggregates can result in a large amount of output. In that case, use GROUPING SETS
to return only certain results.
You cannot include any aggregates within a CUBE
or ROLLUP
expression.
You can append multiple GROUPING SETS
, CUBE
, or ROLLUP
aggregates in the same query. For example:
GROUP BY a,b,c,d, ROLLUP(a,b)
GROUP BY a,b,c,d, CUBE((a,b),c,d)
GROUP BY a,b,c,d, CUBE(a,b), ROLLUP (c,d)
GROUP BY ROLLUP(a), CUBE(b), GROUPING SETS(c)
GROUP BY a,b,c,d, GROUPING SETS ((a,d),(b,c),CUBE(a,b))
GROUP BY a,b,c,d, GROUPING SETS ((a,d),(b,c),(a,b),(a),(b),())
Usage considerations

expression cannot include aggregate functions. However, you can use the GROUP BY clause with CUBE, GROUPING SETS, and ROLLUP
to return summary values for each group.

When you create a GROUP BY clause, you must include all nonaggregated columns that appear in the SELECT
list.

If the GROUP BY
clause includes a WHERE
clause, Vertica ignores all rows that do not satisfy the WHERE
clause.
Examples
This example shows how to use the WHERE
clause with GROUP BY
. In this case, the example retrieves all employees whose last name begins with S, and ignores all rows that do not meet this criteria. The GROUP BY
clause uses the ILIKE
function to retrieve only last names beginning with S. The aggregate function SUM
computes the total vacation days for each group.
=> SELECT employee_last_name, SUM(vacation_days)
FROM employee_dimension
WHERE employee_last_name ILIKE 'S%'
GROUP BY employee_last_name;
employee_last_name  SUM
+
Sanchez  2892
Smith  2672
Stein  2660
(3 rows)
The GROUP BY
clause in the following example groups results by vendor region, and vendor region's biggest deal:
=> SELECT vendor_region, MAX(deal_size) AS "Biggest Deal"
FROM vendor_dimension
GROUP BY vendor_region;
vendor_region  Biggest Deal
+
East  990889
MidWest  699163
NorthWest  76101
South  854136
SouthWest  609807
West  964005
(6 rows)
The following query modifies the previous one with a HAVING
clause, which specifies to return only groups whose maximum deal size exceeds $900,000:
=> SELECT vendor_region, MAX(deal_size) as "Biggest Deal"
FROM vendor_dimension
GROUP BY vendor_region
HAVING MAX(deal_size) > 900000;
vendor_region  Biggest Deal
+
East  990889
West  964005
(2 rows)
You can use the GROUP BY
clause with onedimensional arrays of scalar types. In the following example, grants is an ARRAY[VARCHAR] and grant_values is an ARRAY[INT].
=> SELECT department, grants, SUM(apply_sum(grant_values))
FROM employees
GROUP BY grants, department;
department  grants  SUM
++
Physics  ["US7376","DARPA1567"]  235000
Astronomy  ["US7376","DARPA1567"]  9000
Physics  ["US7376"]  30000
(3 rows)
The GROUP BY
clause without aggregates is similar to using SELECT DISTINCT
. For example, the following two queries return the same results:
=> SELECT DISTINCT household_id FROM customer_dimension;
=> SELECT household_id FROM customer_dimension GROUP BY household_id;
See also
1  CUBE aggregate
Automatically performs all possible aggregations of the specified columns, as an extension to the GROUP BY clause.
Automatically performs all possible aggregations of the specified columns, as an extension to the GROUP BY clause.
You can use the ROLLUP clause with three grouping functions:
Syntax
GROUP BY groupexpression[,...]
Parameters
groupexpression
``
 One or both of the following:

An expression that is not an aggregate or a grouping function that includes constants and column references in FROM
specified tables. For example:
column1, (column2+1), column3+column4

A multilevel expression, one of the following:

ROLLUP

CUBE

GROUPING SETS
Restrictions
 GROUP BY CUBE does not order data. If you want to sort data, use the ORDER BY clause. The ORDER BY clause must come after the GROUP BY clause.
 You can use CUBE inside a GROUPING SETS expression, but not inside a ROLLUP expression or another CUBE expression.
Levels of CUBE aggregation
If n is the number of grouping columns, CUBE creates 2^{n} levels of aggregations. For example:
CUBE (A, B, C) creates all possible groupings, resulting in eight groups:
 (A, B, C)
 (A, B)
 (A, C)
 (B, C)
 (A)
 (B)
 (C)
 ()
If you increase the number of CUBE columns, the number of CUBE groupings increases exponentially. The CUBE query may be resource intensive and produce combinations that are not of interest. In that case, consider using theGROUPING SETS aggregate, which allows you to choose specific groupings.
Examples
Using CUBE to return all groupings
Suppose you have a table that contains information about family expenses for books and electricity:
=> SELECT * FROM expenses ORDER BY Category, Year;
Year  Category  Amount
++
2005  Books  39.98
2007  Books  29.99
2008  Books  29.99
2005  Electricity  109.99
2006  Electricity  109.99
2007  Electricity  229.98
To aggregate the data by both Category and Year using the CUBE aggregate:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY CUBE(Category, Year) ORDER BY 1, 2, GROUPING_ID();
Category  Year  SUM
++
Books  2005  39.98
Books  2007  29.99
Books  2008  29.99
Books   99.96
Electricity  2005  109.99
Electricity  2006  109.99
Electricity  2007  229.98
Electricity   449.96
 2005  149.97
 2006  109.99
 2007  259.97
 2008  29.99
  549.92
The results include subtotals for each category and year, and a grand total ($549.92).
Using CUBE with the HAVING clause
This example shows how you can restrict the GROUP BY results, use the HAVING clause with the CUBE aggregate. This query returns only the category totals and the full total:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY CUBE(Category,Year) HAVING GROUPING(Year)=1;
Category  Year  SUM
++
Books   99.96
Electricity   449.96
  549.92
The next query returns only the aggregations for the two categories for each year. The GROUPING ID function specifies to omit the grand total ($549.92):
=> SELECT Category, Year, SUM (Amount) FROM expenses
GROUP BY CUBE(Category,Year) HAVING GROUPING_ID(Category,Year)<2
ORDER BY 1, 2, GROUPING_ID();
Category  Year  SUM
++
Books  2005  39.98
Books  2007  29.99
Books  2008  29.99
Books   99.96
Electrical  2005  109.99
Electrical  2006  109.99
Electrical  2007  229.98
Electrical   449.96
See also
2  GROUPING SETS aggregate
The GROUPING SETS aggregate is an extension to the GROUP BY clause that automatically performs subtotal aggregations on groupings that you specify.
The GROUPING SETS
aggregate is an extension to the
GROUP BY
clause that automatically performs subtotal aggregations on groupings that you specify.
You can use the GROUPING SETS
clause with three grouping functions:
To sort data, use the
ORDER BY
clause. The ORDER BY
clause must follow the GROUP BY
clause.
Syntax
GROUP BY groupexpression[,...]
Parameters
groupexpression
``
 One or both of the following:

An expression that is not an aggregate or a grouping function that includes constants and column references in FROM
specified tables. For example:
column1, (column2+1), column3+column4

A multilevel expression, one of the following:

ROLLUP

CUBE

GROUPING SETS
Defining the groupings
GROUPING SETS allows you to specify exactly which groupings you want in the results. You can also concatenate the groupings as follows:
The following example clauses result in the groupings shown.
This clause... 
Defines groupings... 
...GROUP BY GROUPING SETS(A,B,C,D)... 
(A), (B), (C), (D) 
...GROUP BY GROUPING SETS((A),(B),(C),(D))... 
(A), (B), (C), (D) 
...GROUP BY GROUPING SETS((A,B,C,D))... 
(A, B, C, D) 
...GROUP BY GROUPING SETS(A,B),GROUPING SETS(C,D)... 
(A, C), (B, C), (A, D), (B, C) 
...GROUP BY GROUPING SETS((A,B)),GROUPING SETS(C,D)... 
(A, B, C), (A, B, D) 
...GROUP BY GROUPING SETS(A,B),GROUPING SETS(ROLLUP(C,D))... 
(A,B), (A,B,C), (A,B,C,D) 
...GROUP BY A,B,C,GROUPING SETS(ROLLUP(C, D))... 
(A, B, C, D), (A, B, C), (A, B, C)
The clause contains two groups (A, B, C). In the HAVING clause, use the GROUP_ID function as a predicate, to eliminate the second grouping.

Example: selecting groupings
This example shows how to select only those groupings you want. Suppose you want to aggregate on columns only, and you do not need the grand total. The first query omits the total. In the second query, you add () to the GROUPING SETS list to get the total. Use the ORDER BY clause to sort the results by grouping:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY GROUPING SETS((Category, Year), (Year))
ORDER BY 1, 2, GROUPING_ID();
Category  Year  SUM
++
Books  2005  39.98
Books  2007  29.99
Books  2008  29.99
Electrical  2005  109.99
Electrical  2006  109.99
Electrical  2007  229.98
 2005  149.97
 2006  109.99
 2007  259.97
 2008  29.99
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY GROUPING SETS((Category, Year), (Year), ())
ORDER BY 1, 2, GROUPING_ID();
Category  Year  SUM
++
Books  2005  39.98
Books  2007  29.99
Books  2008  29.99
Electrical  2005  109.99
Electrical  2006  109.99
Electrical  2007  229.98
 2005  149.97
 2006  109.99
 2007  259.97
 2008  29.99
  549.92
See also
3  ROLLUP aggregate
Automatically performs subtotal aggregations as an extension to the GROUP BY clause.
Automatically performs subtotal aggregations as an extension to the GROUP BY clause. ROLLUP
performs these aggregations across multiple dimensions, at different levels, within a single SQL query.
You can use the ROLLUP
clause with three grouping functions:
Syntax
ROLLUP
groupingexpression
[,...]
Parameters
groupexpression
 One or both of the following:

An expression that is not an aggregate or a grouping function that includes constants and column references in FROM
specified tables. For example:
column1, (column2+1), column3+column4

A multilevel expression, one of the following:

ROLLUP

CUBE

GROUPING SETS
Restrictions
GROUP BY ROLLUP
does not sort results. To sort data, an ORDER BY
clause must follow the GROUP BY
clause.
Levels of aggregation
If n is the number of grouping columns, ROLLUP
creates n+1 levels of subtotals and grand total. Because ROLLUP
removes the rightmost column at each step, specify column order carefully.
Suppose that ROLLUP(A, B, C)
creates four groups:
Because ROLLUP
removes the rightmost column at each step, there are no groups for (A, C)
and (B, C)
.
If you enclose two or more columns in parentheses, GROUP BY
treats them as a single entity. For example:

ROLLUP(A, B, C)
creates four groups:
(A, B, C)
(A, B)
(A)
()

ROLLUP((A, B), C)
treats (A, B)
as a single entity and creates three groups:
(A, B, C)
(A, B)
()
Example: aggregating the full data set
The following example shows how to use the GROUP BY
clause to determine family expenses for electricity and books over several years. The SUM
aggregate function computes the total amount of money spent in each category per year.
Suppose you have a table that contains information about family expenses for books and electricity:
=> SELECT * FROM expenses ORDER BY Category, Year;
Year  Category  Amount
++
2005  Books  39.98
2007  Books  29.99
2008  Books  29.99
2005  Electricity  109.99
2006  Electricity  109.99
2007  Electricity  229.98
For the expenses
table, ROLLUP
computes the subtotals in each category between 2005–2007:

Books: $99.96

Electricity: $449.96

Grand total: $549.92.
Use the ORDER BY
clause to sort the results:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY ROLLUP(Category, Year) ORDER BY 1,2, GROUPING_ID();
Category  Year  SUM
++
Books  2005  39.98
Books  2007  29.99
Books  2008  29.99
Books   99.96
Electricity  2005  109.99
Electricity  2006  109.99
Electricity  2007  229.98
Electricity   449.96
  549.92
Example: using ROLLUP with the HAVING clause
This example shows how to use the HAVING clause with ROLLUP
to restrict the GROUP BY
results. The following query produces only those ROLLUP
categories where year
is subtotaled, based on the expression in the GROUPING
function:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY ROLLUP(Category,Year) HAVING GROUPING(Year)=1
ORDER BY 1, 2, GROUPING_ID();
Category  Year  SUM
++
Books   99.96
Electricity   449.96
  549.92
The next example rolls up on (Category
, Year
), but not on the full results. The GROUPING_ID
function specifies to aggregate less than three levels:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY ROLLUP(Category,Year) HAVING GROUPING_ID(Category,Year)<3
ORDER BY 1, 2, GROUPING_ID();
Category  Year  SUM
++
Books  2005  39.98
Books  2007  29.99
Books  2008  29.99
Books   99.96
Electricity  2005  109.99
Electricity  2006  109.99
Electricity  2007  229.98
Electricity   449.96
See also