GROUP BY clause
Use the GROUP BY
clause with aggregate functions in a SELECT
statement to collect data across multiple records. Vertica groups the results into one or more sets of rows that match an expression.
The GROUP BY
clause without aggregates is similar to using SELECT DISTINCT
.
ROLLUP
is an extension to the GROUP BY
clause. ROLLUP
performs subtotal aggregations.
Syntax
GROUP BY [/*+GBYTYPE(algorithm)*/] { expression | aggregate-expression }[,...]
Arguments
-
/*+GBYTYPE(algorithm)*/
- Specifies which algorithm has precedence for implementing this
GROUP BY
clause, over the algorithm the Vertica query optimizer might otherwise choose. You can setalgorithm
to one of the following values:-
HASH
:GROUPBY HASH
algorithm -
PIPE
:GROUPBY PIPELINED
algorithm
For more information about both algorithms, see GROUP BY implementation options.
-
expression
- Any expression, including constants and column references in the tables specified in the FROM clause. For example:
column,... column, (expression)
aggregate-expression
- An ordered list of columns, expressions,
CUBE, GROUPING SETS
, orROLLUP
aggregates.You can include
CUBE
andROLLUP
aggregates within aGROUPING SETS
aggregate.CUBE
andROLLUP
aggregates can result in a large amount of output. In that case, useGROUPING SETS
to return only certain results.You cannot include any aggregates within a
CUBE
orROLLUP
expression.You can append multiple
GROUPING SETS
,CUBE
, orROLLUP
aggregates in the same query. For example:GROUP BY a,b,c,d, ROLLUP(a,b) GROUP BY a,b,c,d, CUBE((a,b),c,d) GROUP BY a,b,c,d, CUBE(a,b), ROLLUP (c,d) GROUP BY ROLLUP(a), CUBE(b), GROUPING SETS(c) GROUP BY a,b,c,d, GROUPING SETS ((a,d),(b,c),CUBE(a,b)) GROUP BY a,b,c,d, GROUPING SETS ((a,d),(b,c),(a,b),(a),(b),())
Usage considerations
-
expression cannot include aggregate functions. However, you can use the GROUP BY clause with CUBE, GROUPING SETS, and
ROLLUP
to return summary values for each group. -
When you create a GROUP BY clause, you must include all non-aggregated columns that appear in the
SELECT
list. -
If the
GROUP BY
clause includes aWHERE
clause, Vertica ignores all rows that do not satisfy theWHERE
clause.
Examples
This example shows how to use the WHERE
clause with GROUP BY
. In this case, the example retrieves all employees whose last name begins with S, and ignores all rows that do not meet this criteria. The GROUP BY
clause uses the ILIKE
function to retrieve only last names beginning with S. The aggregate function SUM
computes the total vacation days for each group.
=> SELECT employee_last_name, SUM(vacation_days)
FROM employee_dimension
WHERE employee_last_name ILIKE 'S%'
GROUP BY employee_last_name;
employee_last_name | SUM
--------------------+------
Sanchez | 2892
Smith | 2672
Stein | 2660
(3 rows)
The GROUP BY
clause in the following example groups results by vendor region, and vendor region's biggest deal:
=> SELECT vendor_region, MAX(deal_size) AS "Biggest Deal"
FROM vendor_dimension
GROUP BY vendor_region;
vendor_region | Biggest Deal
---------------+--------------
East | 990889
MidWest | 699163
NorthWest | 76101
South | 854136
SouthWest | 609807
West | 964005
(6 rows)
The following query modifies the previous one with a HAVING
clause, which specifies to return only groups whose maximum deal size exceeds $900,000:
=> SELECT vendor_region, MAX(deal_size) as "Biggest Deal"
FROM vendor_dimension
GROUP BY vendor_region
HAVING MAX(deal_size) > 900000;
vendor_region | Biggest Deal
---------------+--------------
East | 990889
West | 964005
(2 rows)
You can use the GROUP BY
clause with one-dimensional arrays of scalar types. In the following example, grants is an ARRAY[VARCHAR] and grant_values is an ARRAY[INT].
=> SELECT department, grants, SUM(apply_sum(grant_values))
FROM employees
GROUP BY grants, department;
department | grants | SUM
------------+--------------------------+--------
Physics | ["US-7376","DARPA-1567"] | 235000
Astronomy | ["US-7376","DARPA-1567"] | 9000
Physics | ["US-7376"] | 30000
(3 rows)
The GROUP BY
clause without aggregates is similar to using SELECT DISTINCT
. For example, the following two queries return the same results:
=> SELECT DISTINCT household_id FROM customer_dimension;
=> SELECT household_id FROM customer_dimension GROUP BY household_id;