这是本节的多页打印视图。 点击此处打印.

返回本页常规视图.

GROUP BY 路径

GROUP BY 操作具有两种算法:

  • GROUPBY HASH 输入不按组列排序,因此 Vertica 用这些组列构建一个哈希表,以便处理聚合和 Group By 表达式。

  • GROUPBY PIPELINED 需要在组中指定的列上对输入进行预分类,这意味着 Vertica 仅需要在内存中保留当前组中的数据。GROUPBY PIPELINED 操作是首选算法,因为它们通常比 GROUPBY HASH 更快并且需要的内存更少。GROUPBY PIPELINED 对按列或 DISTINCT 聚合处理大量高基数组的查询尤为有用。

如果可能,查询优化器会选择更快的算法 GROUPBY PIPELINED 而非 GROUPBY HASH。

1 - GROUPBY HASH 查询计划

下面是一个关于 GROUPBY HASH 操作在 EXPLAIN 输出中的样式的示例。

=> EXPLAIN SELECT COUNT(DISTINCT annual_income)
     FROM customer_dimension
     WHERE customer_region='NorthWest';

该输出显示优化器选择了效率更低的 GROUPBY HASH 方法,这意味着投影未在 annual_income 列上预分类。如果此类投影可用,优化器将选择 GROUPBY PIPELINED 算法。

Access Path:
+-GROUPBY NOTHING [Cost: 256, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
|  Aggregates: count(DISTINCT customer_dimension.annual_income)
| +---> GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 253, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| |      Group By: customer_dimension.annual_income
| | +---> STORAGE ACCESS for customer_dimension [Cost: 227, Rows: 50K (NO STATISTICS)] (PATH ID: 3)
| | |      Projection: public.customer_dimension_super
| | |      Materialize: customer_dimension.annual_income
| | |      Filter: (customer_dimension.customer_region = 'NorthWest'
...

2 - GROUPBY PIPELINED 查询计划

如果投影已按 customer_gender 列排序,优化器将选择更快的 GROUPBY PIPELINED 操作:

 => EXPLAIN SELECT COUNT(distinct customer_gender) from customer_dimension;
Access Path:
 +-GROUPBY NOTHING [Cost: 22, Rows: 1] (PATH ID: 1)
 |  Aggregates: count(DISTINCT customer_dimension.customer_gender)
 |  Execute on: Query Initiator
 | +---> GROUPBY PIPELINED [Cost: 20, Rows: 10K] (PATH ID: 2)
 | |      Group By: customer_dimension.customer_gender
 | |      Execute on: Query Initiator
 | | +---> STORAGE ACCESS for customer_dimension [Cost: 17, Rows: 50K (3 RLE)] (PATH ID: 3)
 | | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | | |      Materialize: customer_dimension.customer_gender
 | | |      Execute on: Query Initiator

同样,使用相等谓词(如在以下查询中)将保留 GROUPBY PIPELINED

=> EXPLAIN SELECT COUNT(DISTINCT annual_income)    FROM customer_dimension
   WHERE customer_gender = 'Female';

 Access Path: +-GROUPBY NOTHING [Cost: 161, Rows: 1] (PATH ID: 1)
 |  Aggregates: count(DISTINCT customer_dimension.annual_income)
 | +---> GROUPBY PIPELINED [Cost: 158, Rows: 10K] (PATH ID: 2)
 | |      Group By: customer_dimension.annual_income
 | | +---> STORAGE ACCESS for customer_dimension [Cost: 144, Rows: 47K] (PATH ID: 3)
 | | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | | |      Materialize: customer_dimension.annual_income
 | | |      Filter: (customer_dimension.customer_gender = 'Female')