Data Redistribution 路径
优化器可通过两种方式重新分布联接数据:
-
广播
-
重新分段
广播
广播操作会将中间结果的完整副本发送到群集中的所有节点。在以下情况下,可对联接使用广播。
-
一个表与其他表相比很小(通常指内部表)。
-
Vertica 可避免其他大型上游重新分段操作。
-
外部联接或子查询的语义需要复制联接一侧的内容。
例如:
=> EXPLAIN SELECT * FROM T1 LEFT JOIN T2 ON T1.a > T2.y;
Access Path:
+-JOIN HASH [LeftOuter] [Cost: 40K, Rows: 10K (NO STATISTICS)] (PATH ID: 1) Inner (BROADCAST)
| Join Filter: (T1.a > T2.y)
| Materialize at Output: T1.b
| Execute on: All Nodes
| +-- Outer -> STORAGE ACCESS for T1 [Cost: 151, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| | Projection: public.T1_b0
| | Materialize: T1.a
| | Execute on: All Nodes
| +-- Inner -> STORAGE ACCESS for T2 [Cost: 302, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
| | Projection: public.T2_b0
| | Materialize: T2.x, T2.y
| | Execute on: All Nodes
重新分段
重新分段操作会获取现有投影或中间关系,并在所有群集节点之间均匀地对数据进行重新分段。在重新分段操作结束时,输入关系中的每一行仅位于一个节点上。如果本地联接数据尚未进行分段,则重新分段将是对 Vertica 中的分布式联接最常使用的操作。有关更多详细信息,请参阅相同分段。
例如:
=> CREATE TABLE T1 (a INT, b INT) SEGMENTED BY HASH(a) ALL NODES;
=> CREATE TABLE T2 (x INT, y INT) SEGMENTED BY HASH(x) ALL NODES;
=> EXPLAIN SELECT * FROM T1 JOIN T2 ON T1.a = T2.y;
------------------------------ QUERY PLAN DESCRIPTION: ------------------------------
Access Path:
+-JOIN HASH [Cost: 639, Rows: 10K (NO STATISTICS)] (PATH ID: 1) Inner (RESEGMENT)
| Join Cond: (T1.a = T2.y)
| Materialize at Output: T1.b
| Execute on: All Nodes
| +-- Outer -> STORAGE ACCESS for T1 [Cost: 151, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| | Projection: public.T1_b0
| | Materialize: T1.a
| | Execute on: All Nodes
| +-- Inner -> STORAGE ACCESS for T2 [Cost: 302, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
| | Projection: public.T2_b0
| | Materialize: T2.x, T2.y
| | Execute on: All Nodes