这是本节的多页打印视图。
点击此处打印.
返回本页常规视图.
GROUP BY 路径
GROUP BY 操作具有两种算法:
-
GROUPBY HASH 输入不按组列排序,因此 Vertica 用这些组列构建一个哈希表,以便处理聚合和 Group By 表达式。
-
GROUPBY PIPELINED 需要在组中指定的列上对输入进行预分类,这意味着 Vertica 仅需要在内存中保留当前组中的数据。GROUPBY PIPELINED 操作是首选算法,因为它们通常比 GROUPBY HASH 更快并且需要的内存更少。GROUPBY PIPELINED 对按列或 DISTINCT
聚合处理大量高基数组的查询尤为有用。
如果可能,查询优化器会选择更快的算法 GROUPBY PIPELINED 而非 GROUPBY HASH。
1 - GROUPBY HASH 查询计划
下面是一个关于 GROUPBY HASH
操作在 EXPLAIN
输出中的样式的示例。
=> EXPLAIN SELECT COUNT(DISTINCT annual_income)
FROM customer_dimension
WHERE customer_region='NorthWest';
该输出显示优化器选择了效率更低的 GROUPBY HASH
方法,这意味着投影未在 annual_income
列上预分类。如果此类投影可用,优化器将选择 GROUPBY PIPELINED
算法。
Access Path:
+-GROUPBY NOTHING [Cost: 256, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
| Aggregates: count(DISTINCT customer_dimension.annual_income)
| +---> GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 253, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| | Group By: customer_dimension.annual_income
| | +---> STORAGE ACCESS for customer_dimension [Cost: 227, Rows: 50K (NO STATISTICS)] (PATH ID: 3)
| | | Projection: public.customer_dimension_super
| | | Materialize: customer_dimension.annual_income
| | | Filter: (customer_dimension.customer_region = 'NorthWest'
...
2 - GROUPBY PIPELINED 查询计划
如果投影已按 customer_gender
列排序,优化器将选择更快的 GROUPBY PIPELINED
操作:
=> EXPLAIN SELECT COUNT(distinct customer_gender) from customer_dimension;
Access Path:
+-GROUPBY NOTHING [Cost: 22, Rows: 1] (PATH ID: 1)
| Aggregates: count(DISTINCT customer_dimension.customer_gender)
| Execute on: Query Initiator
| +---> GROUPBY PIPELINED [Cost: 20, Rows: 10K] (PATH ID: 2)
| | Group By: customer_dimension.customer_gender
| | Execute on: Query Initiator
| | +---> STORAGE ACCESS for customer_dimension [Cost: 17, Rows: 50K (3 RLE)] (PATH ID: 3)
| | | Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
| | | Materialize: customer_dimension.customer_gender
| | | Execute on: Query Initiator
同样,使用相等谓词(如在以下查询中)将保留 GROUPBY PIPELINED
:
=> EXPLAIN SELECT COUNT(DISTINCT annual_income) FROM customer_dimension
WHERE customer_gender = 'Female';
Access Path: +-GROUPBY NOTHING [Cost: 161, Rows: 1] (PATH ID: 1)
| Aggregates: count(DISTINCT customer_dimension.annual_income)
| +---> GROUPBY PIPELINED [Cost: 158, Rows: 10K] (PATH ID: 2)
| | Group By: customer_dimension.annual_income
| | +---> STORAGE ACCESS for customer_dimension [Cost: 144, Rows: 47K] (PATH ID: 3)
| | | Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
| | | Materialize: customer_dimension.annual_income
| | | Filter: (customer_dimension.customer_gender = 'Female')
提示
如果 EXPLAIN
报告 GROUPBY HASH
,请修改投影设计,强制其使用 GROUPBY PIPELINED
。