Data Redistribution 路径

优化器可通过两种方式重新分布联接数据:

  • 广播

  • 重新分段

广播

广播操作会将中间结果的完整副本发送到群集中的所有节点。在以下情况下,可对联接使用广播。

  • 一个表与其他表相比很小(通常指内部表)。

  • Vertica 可避免其他大型上游重新分段操作。

  • 外部联接或子查询的语义需要复制联接一侧的内容。

例如:

=> EXPLAIN SELECT * FROM T1 LEFT JOIN T2 ON T1.a > T2.y;
 Access Path:
 +-JOIN HASH [LeftOuter] [Cost: 40K, Rows: 10K (NO STATISTICS)] (PATH ID: 1) Inner (BROADCAST)
 |  Join Filter: (T1.a > T2.y)
 |  Materialize at Output: T1.b
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for T1 [Cost: 151, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.T1_b0
 | |      Materialize: T1.a
 | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for T2 [Cost: 302, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
 | |      Projection: public.T2_b0
 | |      Materialize: T2.x, T2.y
 | |      Execute on: All Nodes

重新分段

重新分段操作会获取现有投影或中间关系,并在所有群集节点之间均匀地对数据进行重新分段。在重新分段操作结束时,输入关系中的每一行仅位于一个节点上。如果本地联接数据尚未进行分段,则重新分段将是对 Vertica 中的分布式联接最常使用的操作。有关更多详细信息,请参阅相同分段

例如:

=> CREATE TABLE T1 (a INT, b INT) SEGMENTED BY HASH(a) ALL NODES;
=> CREATE TABLE T2 (x INT, y INT) SEGMENTED BY HASH(x) ALL NODES;
=> EXPLAIN SELECT * FROM T1 JOIN T2 ON T1.a = T2.y;

 ------------------------------ QUERY PLAN DESCRIPTION: ------------------------------
 Access Path:
 +-JOIN HASH [Cost: 639, Rows: 10K (NO STATISTICS)] (PATH ID: 1) Inner (RESEGMENT)
 |  Join Cond: (T1.a = T2.y)
 |  Materialize at Output: T1.b
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for T1 [Cost: 151, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.T1_b0
 | |      Materialize: T1.a
 | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for T2 [Cost: 302, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
 | |      Projection: public.T2_b0
 | |      Materialize: T2.x, T2.y
 | |      Execute on: All Nodes