GROUPBY HASH input is not sorted by the group columns, so Vertica builds a hash table on those group columns in order to process the aggregates and group by expressions.
GROUPBY PIPELINED requires that inputs be presorted on the columns specified in the group, which means that Vertica need only retain data in the current group in memory. GROUPBY PIPELINED operations are preferred because they are generally faster and require less memory than GROUPBY HASH. GROUPBY PIPELINED is especially useful for queries that process large numbers of high-cardinality group by columns or DISTINCT aggregates.
If possible, the query optimizer chooses the faster algorithm GROUPBY PIPELINED over GROUPBY HASH.
Here's an example of how GROUPBY HASH operations look in EXPLAIN output.
Here's an example of how GROUPBY HASH operations look in EXPLAIN output.
=> EXPLAIN SELECT COUNT(DISTINCT annual_income)
FROM customer_dimension
WHERE customer_region='NorthWest';
The output shows that the optimizer chose the less efficient GROUPBY HASH path, which means the projection was not presorted on the annual_income column. If such a projection is available, the optimizer would choose the GROUPBY PIPELINED algorithm.