Join path

Just like a join query, which references two or more tables, the Join step in a query plan has two input branches:.

Just like a join query, which references two or more tables, the Join step in a query plan has two input branches:

  • The left input, which is the outer table of the join

  • The right input, which is the inner table of the join

In the following query, the T1 table is the left input because it is on the left side of the JOIN keyword, and the T2 table is the right input, because it is on the right side of the JOIN keyword:

SELECT * FROM T1 JOIN T2 ON T1.x = T2.x;

Outer versus inner join

Query performance is better if the small table is used as the inner input to the join. The query optimizer automatically reorders the inputs to joins to ensure that this is the case unless the join in question is an outer join.

The following example shows a query and its plan for a left outer join:

=> EXPLAIN SELECT CD.annual_income,OSI.sale_date_key
-> FROM online_sales.online_sales_fact OSI
-> LEFT OUTER JOIN customer_dimension CD ON CD.customer_key = OSI.customer_key;
 Access Path:
 +-JOIN HASH [LeftOuter] [Cost: 4K, Rows: 5M] (PATH ID: 1)
 |  Join Cond: (CD.customer_key = OSI.customer_key)
 |  Materialize at Output: OSI.sale_date_key
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 2)
 | |      Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
 | |      Materialize: OSI.customer_key
 | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for CD [Cost: 264, Rows: 50K] (PATH ID: 3)
 | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | |      Materialize: CD.annual_income, CD.customer_key
 | |      Execute on: All Nodes

The following example shows a query and its plan for a full outer join:

=> EXPLAIN SELECT CD.annual_income,OSI.sale_date_key
-> FROM online_sales.online_sales_fact OSI
-> FULL OUTER JOIN customer_dimension CD ON CD.customer_key = OSI.customer_key;
 Access Path:
 +-JOIN HASH [FullOuter] [Cost: 18K, Rows: 5M] (PATH ID: 1) Outer (RESEGMENT) Inner (FILTER)
 |  Join Cond: (CD.customer_key = OSI.customer_key)
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 2)
 | |      Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
 | |      Materialize: OSI.sale_date_key, OSI.customer_key
 | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for CD [Cost: 264, Rows: 50K] (PATH ID: 3)
 | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | |      Materialize: CD.annual_income, CD.customer_key
 | |      Execute on: All Nodes

Hash and merge joins

Vertica has two join algorithms to choose from: merge join and hash join. The optimizer automatically chooses the most appropriate algorithm, given the query and projections in a system.

For the following query, the optimizer chooses a hash join.

=> EXPLAIN SELECT CD.annual_income,OSI.sale_date_key
-> FROM online_sales.online_sales_fact OSI
-> INNER JOIN customer_dimension CD ON CD.customer_key = OSI.customer_key;
 Access Path:
 +-JOIN HASH [Cost: 4K, Rows: 5M] (PATH ID: 1)
 |  Join Cond: (CD.customer_key = OSI.customer_key)
 |  Materialize at Output: OSI.sale_date_key
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 2)
 | |      Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
 | |      Materialize: OSI.customer_key
 | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for CD [Cost: 264, Rows: 50K] (PATH ID: 3)
 | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | |      Materialize: CD.annual_income, CD.customer_key
 | |      Execute on: All Nodes

In the next example, the optimizer chooses a merge join. The optimizer's first pass performs a merge join because the inputs are presorted, and then it performs a hash join.

=> EXPLAIN SELECT count(*) FROM online_sales.online_sales_fact OSI
-> INNER JOIN customer_dimension CD ON CD.customer_key = OSI.customer_key
-> INNER JOIN product_dimension PD ON PD.product_key = OSI.product_key;
 Access Path:
 +-GROUPBY NOTHING [Cost: 8K, Rows: 1] (PATH ID: 1)
 |  Aggregates: count(*)
 |  Execute on: All Nodes
 | +---> JOIN HASH [Cost: 7K, Rows: 5M] (PATH ID: 2)
 | |      Join Cond: (PD.product_key = OSI.product_key)
 | |      Materialize at Input: OSI.product_key
 | |      Execute on: All Nodes
 | | +-- Outer -> JOIN MERGEJOIN(inputs presorted) [Cost: 4K, Rows: 5M] (PATH ID: 3)
 | | |      Join Cond: (CD.customer_key = OSI.customer_key)
 | | |      Execute on: All Nodes
 | | | +-- Outer -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 4)
 | | | |      Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
 | | | |      Materialize: OSI.customer_key
 | | | |      Execute on: All Nodes
 | | | +-- Inner -> STORAGE ACCESS for CD [Cost: 132, Rows: 50K] (PATH ID: 5)
 | | | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | | | |      Materialize: CD.customer_key
 | | | |      Execute on: All Nodes
 | | +-- Inner -> STORAGE ACCESS for PD [Cost: 152, Rows: 60K] (PATH ID: 6)
 | | |      Projection: public.product_dimension_DBD_2_rep_vmartdb_design_vmartdb_design_node0001
 | | |      Materialize: PD.product_key
 | | |      Execute on: All Nodes

Inequality joins

Vertica processes joins with equality predicates very efficiently. The query plan shows equality join predicates as join condition (Join Cond).

=> EXPLAIN SELECT CD.annual_income, OSI.sale_date_key
-> FROM online_sales.online_sales_fact OSI
-> INNER JOIN customer_dimension CD
-> ON CD.customer_key = OSI.customer_key;
 Access Path:
 +-JOIN HASH [Cost: 4K, Rows: 5M] (PATH ID: 1)
 |  Join Cond: (CD.customer_key = OSI.customer_key)
 |  Materialize at Output: OSI.sale_date_key
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 2)
 | |      Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
 | |      Materialize: OSI.customer_key
 | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for CD [Cost: 264, Rows: 50K] (PATH ID: 3)
 | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | |      Materialize: CD.annual_income, CD.customer_key
 | |      Execute on: All Nodes

However, inequality joins are treated like cross joins and can run less efficiently, which you can see by the change in cost between the two queries:

=> EXPLAIN SELECT CD.annual_income, OSI.sale_date_key
-> FROM online_sales.online_sales_fact OSI
-> INNER JOIN customer_dimension CD
-> ON CD.customer_key < OSI.customer_key;
 Access Path:
 +-JOIN HASH [Cost: 98M, Rows: 5M] (PATH ID: 1)
 |  Join Filter: (CD.customer_key < OSI.customer_key)
 |  Materialize at Output: CD.annual_income
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for CD [Cost: 132, Rows: 50K] (PATH ID: 2)
 | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | |      Materialize: CD.customer_key
 | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 3)
 | |      Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
 | |      Materialize: OSI.sale_date_key, OSI.customer_key
 | |      Execute on: All Nodes

Event series joins

Event series joins are denoted by the INTERPOLATED path.

=> EXPLAIN SELECT * FROM hTicks h FULL OUTER JOIN aTicks a -> ON (h.time INTERPOLATE PREVIOUS
 Access Path:
 +-JOIN  (INTERPOLATED) [FullOuter] [Cost: 31, Rows: 4 (NO STATISTICS)] (PATH ID: 1)
   Outer (SORT ON JOIN KEY) Inner (SORT ON JOIN KEY)
 |  Join Cond: (h."time" = a."time")
 |  Execute on: Query Initiator
 | +-- Outer -> STORAGE ACCESS for h [Cost: 15, Rows: 4 (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.hTicks_node0004
 | |      Materialize: h.stock, h."time", h.price
 | |      Execute on: Query Initiator
 | +-- Inner -> STORAGE ACCESS for a [Cost: 15, Rows: 4 (NO STATISTICS)] (PATH ID: 3)
 | |      Projection: public.aTicks_node0004
 | |      Materialize: a.stock, a."time", a.price
 | |      Execute on: Query Initiator