This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
GROUP BY clause
Use the GROUP BY clause with aggregate functions in a SELECT statement to collect data across multiple records.
Use the GROUP BY
clause with aggregate functions in a SELECT
statement to collect data across multiple records. Vertica groups the results into one or more sets of rows that match an expression.
The GROUP BY
clause without aggregates is similar to using SELECT DISTINCT
.
ROLLUP
is an extension to the GROUP BY
clause. ROLLUP
performs subtotal aggregations.
Syntax
GROUP BY [/*+GBYTYPE(algorithm)*/] { expression | aggregate-expression }[,...]
Arguments
/*+
GBYTYPE
(
algorithm
)*/
- Specifies which algorithm has precedence for implementing this
GROUP BY
clause, over the algorithm the Vertica query optimizer might otherwise choose. You can set algorithm
to one of the following values:
For more information about both algorithms, see GROUP BY implementation options.
expression
- Any expression, including constants and column references in the tables specified in the FROM clause. For example:
column,... column, (expression)
aggregate-expression
- An ordered list of columns, expressions,
CUBE, GROUPING SETS
, or ROLLUP
aggregates.
You can include CUBE
and ROLLUP
aggregates within a GROUPING SETS
aggregate. CUBE
and ROLLUP
aggregates can result in a large amount of output. In that case, use GROUPING SETS
to return only certain results.
You cannot include any aggregates within a CUBE
or ROLLUP
expression.
You can append multiple GROUPING SETS
, CUBE
, or ROLLUP
aggregates in the same query. For example:
GROUP BY a,b,c,d, ROLLUP(a,b)
GROUP BY a,b,c,d, CUBE((a,b),c,d)
GROUP BY a,b,c,d, CUBE(a,b), ROLLUP (c,d)
GROUP BY ROLLUP(a), CUBE(b), GROUPING SETS(c)
GROUP BY a,b,c,d, GROUPING SETS ((a,d),(b,c),CUBE(a,b))
GROUP BY a,b,c,d, GROUPING SETS ((a,d),(b,c),(a,b),(a),(b),())
Usage considerations
-
expression cannot include aggregate functions. However, you can use the GROUP BY clause with CUBE, GROUPING SETS, and ROLLUP
to return summary values for each group.
-
When you create a GROUP BY clause, you must include all non-aggregated columns that appear in the SELECT
list.
-
If the GROUP BY
clause includes a WHERE
clause, Vertica ignores all rows that do not satisfy the WHERE
clause.
Examples
This example shows how to use the WHERE
clause with GROUP BY
. In this case, the example retrieves all employees whose last name begins with S, and ignores all rows that do not meet this criteria. The GROUP BY
clause uses the ILIKE
function to retrieve only last names beginning with S. The aggregate function SUM
computes the total vacation days for each group.
=> SELECT employee_last_name, SUM(vacation_days)
FROM employee_dimension
WHERE employee_last_name ILIKE 'S%'
GROUP BY employee_last_name;
employee_last_name | SUM
--------------------+------
Sanchez | 2892
Smith | 2672
Stein | 2660
(3 rows)
The GROUP BY
clause in the following example groups results by vendor region, and vendor region's biggest deal:
=> SELECT vendor_region, MAX(deal_size) AS "Biggest Deal"
FROM vendor_dimension
GROUP BY vendor_region;
vendor_region | Biggest Deal
---------------+--------------
East | 990889
MidWest | 699163
NorthWest | 76101
South | 854136
SouthWest | 609807
West | 964005
(6 rows)
The following query modifies the previous one with a HAVING
clause, which specifies to return only groups whose maximum deal size exceeds $900,000:
=> SELECT vendor_region, MAX(deal_size) as "Biggest Deal"
FROM vendor_dimension
GROUP BY vendor_region
HAVING MAX(deal_size) > 900000;
vendor_region | Biggest Deal
---------------+--------------
East | 990889
West | 964005
(2 rows)
You can use the GROUP BY
clause with one-dimensional arrays of scalar types. In the following example, grants is an ARRAY[VARCHAR] and grant_values is an ARRAY[INT].
=> CREATE TABLE employees (id INT, department VARCHAR(50), grants ARRAY[VARCHAR], grant_values ARRAY[INT]);
=> COPY employees FROM STDIN;
42|Physics|[US-7376,DARPA-1567]|[65000,135000]
36|Physics|[US-7376,DARPA-1567]|[10000,25000]
33|Physics|[US-7376]|[30000]
36|Astronomy|[US-7376,DARPA-1567]|[5000,4000]
\.
=> SELECT department, grants, SUM(apply_sum(grant_values)) FROM employees GROUP BY grants, department;
department | grants | SUM
------------+--------------------------+--------
Physics | ["US-7376","DARPA-1567"] | 235000
Astronomy | ["US-7376","DARPA-1567"] | 9000
Physics | ["US-7376"] | 30000
(3 rows)
The GROUP BY
clause without aggregates is similar to using SELECT DISTINCT
. For example, the following two queries return the same results:
=> SELECT DISTINCT household_id FROM customer_dimension;
=> SELECT household_id FROM customer_dimension GROUP BY household_id;
See also
1 - CUBE aggregate
Automatically performs all possible aggregations of the specified columns, as an extension to the GROUP BY clause.
Automatically performs all possible aggregations of the specified columns, as an extension to the GROUP BY clause.
You can use the ROLLUP clause with three grouping functions:
Syntax
GROUP BY group-expression[,...]
Parameters
group-expression
- One or both of the following:
-
An expression that is not an aggregate or a grouping function that includes constants and column references in FROM
-specified tables. For example:
column1, (column2+1), column3+column4
-
A multilevel expression, one of the following:
-
ROLLUP
-
CUBE
-
GROUPING SETS
Restrictions
- GROUP BY CUBE does not order data. If you want to sort data, use the ORDER BY clause. The ORDER BY clause must come after the GROUP BY clause.
- You can use CUBE inside a GROUPING SETS expression, but not inside a ROLLUP expression or another CUBE expression.
Levels of CUBE aggregation
If n is the number of grouping columns, CUBE creates 2n levels of aggregations. For example:
CUBE (A, B, C) creates all possible groupings, resulting in eight groups:
- (A, B, C)
- (A, B)
- (A, C)
- (B, C)
- (A)
- (B)
- (C)
- ()
If you increase the number of CUBE columns, the number of CUBE groupings increases exponentially. The CUBE query may be resource intensive and produce combinations that are not of interest. In that case, consider using theGROUPING SETS aggregate, which allows you to choose specific groupings.
Examples
Using CUBE to return all groupings
Suppose you have a table that contains information about family expenses for books and electricity:
=> SELECT * FROM expenses ORDER BY Category, Year;
Year | Category | Amount
------+-------------+--------
2005 | Books | 39.98
2007 | Books | 29.99
2008 | Books | 29.99
2005 | Electricity | 109.99
2006 | Electricity | 109.99
2007 | Electricity | 229.98
To aggregate the data by both Category and Year using the CUBE aggregate:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY CUBE(Category, Year) ORDER BY 1, 2, GROUPING_ID();
Category | Year | SUM
-------------+------+--------
Books | 2005 | 39.98
Books | 2007 | 29.99
Books | 2008 | 29.99
Books | | 99.96
Electricity | 2005 | 109.99
Electricity | 2006 | 109.99
Electricity | 2007 | 229.98
Electricity | | 449.96
| 2005 | 149.97
| 2006 | 109.99
| 2007 | 259.97
| 2008 | 29.99
| | 549.92
The results include subtotals for each category and year, and a grand total ($549.92).
Using CUBE with the HAVING clause
This example shows how you can restrict the GROUP BY results, use the HAVING clause with the CUBE aggregate. This query returns only the category totals and the full total:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY CUBE(Category,Year) HAVING GROUPING(Year)=1;
Category | Year | SUM
-------------+------+--------
Books | | 99.96
Electricity | | 449.96
| | 549.92
The next query returns only the aggregations for the two categories for each year. The GROUPING ID function specifies to omit the grand total ($549.92):
=> SELECT Category, Year, SUM (Amount) FROM expenses
GROUP BY CUBE(Category,Year) HAVING GROUPING_ID(Category,Year)<2
ORDER BY 1, 2, GROUPING_ID();
Category | Year | SUM
------------+------+--------
Books | 2005 | 39.98
Books | 2007 | 29.99
Books | 2008 | 29.99
Books | | 99.96
Electrical | 2005 | 109.99
Electrical | 2006 | 109.99
Electrical | 2007 | 229.98
Electrical | | 449.96
See also
2 - GROUPING SETS aggregate
The GROUPING SETS aggregate is an extension to the GROUP BY clause that automatically performs subtotal aggregations on groupings that you specify.
The GROUPING SETS
aggregate is an extension to the
GROUP BY
clause that automatically performs subtotal aggregations on groupings that you specify.
You can use the GROUPING SETS
clause with three grouping functions:
To sort data, use the
ORDER BY
clause. The ORDER BY
clause must follow the GROUP BY
clause.
Syntax
GROUP BY group-expression[,...]
Parameters
group-expression
- One or both of the following:
-
An expression that is not an aggregate or a grouping function that includes constants and column references in FROM
-specified tables. For example:
column1, (column2+1), column3+column4
-
A multilevel expression, one of the following:
-
ROLLUP
-
CUBE
-
GROUPING SETS
Defining the groupings
GROUPING SETS allows you to specify exactly which groupings you want in the results. You can also concatenate the groupings as follows:
The following example clauses result in the groupings shown.
This clause... |
Defines groupings... |
...GROUP BY GROUPING SETS(A,B,C,D)... |
(A), (B), (C), (D) |
...GROUP BY GROUPING SETS((A),(B),(C),(D))... |
(A), (B), (C), (D) |
...GROUP BY GROUPING SETS((A,B,C,D))... |
(A, B, C, D) |
...GROUP BY GROUPING SETS(A,B),GROUPING SETS(C,D)... |
(A, C), (B, C), (A, D), (B, C) |
...GROUP BY GROUPING SETS((A,B)),GROUPING SETS(C,D)... |
(A, B, C), (A, B, D) |
...GROUP BY GROUPING SETS(A,B),GROUPING SETS(ROLLUP(C,D))... |
(A,B), (A,B,C), (A,B,C,D) |
...GROUP BY A,B,C,GROUPING SETS(ROLLUP(C, D))... |
(A, B, C, D), (A, B, C), (A, B, C)
The clause contains two groups (A, B, C). In the HAVING clause, use the GROUP_ID function as a predicate, to eliminate the second grouping.
|
Example: selecting groupings
This example shows how to select only those groupings you want. Suppose you want to aggregate on columns only, and you do not need the grand total. The first query omits the total. In the second query, you add () to the GROUPING SETS list to get the total. Use the ORDER BY clause to sort the results by grouping:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY GROUPING SETS((Category, Year), (Year))
ORDER BY 1, 2, GROUPING_ID();
Category | Year | SUM
------------+------+--------
Books | 2005 | 39.98
Books | 2007 | 29.99
Books | 2008 | 29.99
Electrical | 2005 | 109.99
Electrical | 2006 | 109.99
Electrical | 2007 | 229.98
| 2005 | 149.97
| 2006 | 109.99
| 2007 | 259.97
| 2008 | 29.99
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY GROUPING SETS((Category, Year), (Year), ())
ORDER BY 1, 2, GROUPING_ID();
Category | Year | SUM
------------+------+--------
Books | 2005 | 39.98
Books | 2007 | 29.99
Books | 2008 | 29.99
Electrical | 2005 | 109.99
Electrical | 2006 | 109.99
Electrical | 2007 | 229.98
| 2005 | 149.97
| 2006 | 109.99
| 2007 | 259.97
| 2008 | 29.99
| | 549.92
See also
3 - ROLLUP aggregate
Automatically performs subtotal aggregations as an extension to the GROUP BY clause.
Automatically performs subtotal aggregations as an extension to the GROUP BY clause. ROLLUP
performs these aggregations across multiple dimensions, at different levels, within a single SQL query.
You can use the ROLLUP
clause with three grouping functions:
Syntax
ROLLUP
grouping-expression
[,...]
Parameters
group-expression
- One or both of the following:
-
An expression that is not an aggregate or a grouping function that includes constants and column references in FROM
-specified tables. For example:
column1, (column2+1), column3+column4
-
A multilevel expression, one of the following:
-
ROLLUP
-
CUBE
-
GROUPING SETS
Restrictions
GROUP BY ROLLUP
does not sort results. To sort data, an ORDER BY
clause must follow the GROUP BY
clause.
Levels of aggregation
If n is the number of grouping columns, ROLLUP
creates n+1 levels of subtotals and grand total. Because ROLLUP
removes the right-most column at each step, specify column order carefully.
Suppose that ROLLUP(A, B, C)
creates four groups:
Because ROLLUP
removes the right-most column at each step, there are no groups for (A, C)
and (B, C)
.
If you enclose two or more columns in parentheses, GROUP BY
treats them as a single entity. For example:
-
ROLLUP(A, B, C)
creates four groups:
(A, B, C)
(A, B)
(A)
()
-
ROLLUP((A, B), C)
treats (A, B)
as a single entity and creates three groups:
(A, B, C)
(A, B)
()
Example: aggregating the full data set
The following example shows how to use the GROUP BY
clause to determine family expenses for electricity and books over several years. The SUM
aggregate function computes the total amount of money spent in each category per year.
Suppose you have a table that contains information about family expenses for books and electricity:
=> SELECT * FROM expenses ORDER BY Category, Year;
Year | Category | Amount
------+-------------+--------
2005 | Books | 39.98
2007 | Books | 29.99
2008 | Books | 29.99
2005 | Electricity | 109.99
2006 | Electricity | 109.99
2007 | Electricity | 229.98
For the expenses
table, ROLLUP
computes the subtotals in each category between 2005–2007:
-
Books: $99.96
-
Electricity: $449.96
-
Grand total: $549.92.
Use the ORDER BY
clause to sort the results:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY ROLLUP(Category, Year) ORDER BY 1,2, GROUPING_ID();
Category | Year | SUM
-------------+------+--------
Books | 2005 | 39.98
Books | 2007 | 29.99
Books | 2008 | 29.99
Books | | 99.96
Electricity | 2005 | 109.99
Electricity | 2006 | 109.99
Electricity | 2007 | 229.98
Electricity | | 449.96
| | 549.92
Example: using ROLLUP with the HAVING clause
This example shows how to use the HAVING clause with ROLLUP
to restrict the GROUP BY
results. The following query produces only those ROLLUP
categories where year
is subtotaled, based on the expression in the GROUPING
function:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY ROLLUP(Category,Year) HAVING GROUPING(Year)=1
ORDER BY 1, 2, GROUPING_ID();
Category | Year | SUM
-------------+------+--------
Books | | 99.96
Electricity | | 449.96
| | 549.92
The next example rolls up on (Category
, Year
), but not on the full results. The GROUPING_ID
function specifies to aggregate less than three levels:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY ROLLUP(Category,Year) HAVING GROUPING_ID(Category,Year)<3
ORDER BY 1, 2, GROUPING_ID();
Category | Year | SUM
-------------+------+--------
Books | 2005 | 39.98
Books | 2007 | 29.99
Books | 2008 | 29.99
Books | | 99.96
Electricity | 2005 | 109.99
Electricity | 2006 | 109.99
Electricity | 2007 | 229.98
Electricity | | 449.96
See also