Join path
Just like a join query, which references two or more tables, the Join
step in a query plan has two input branches:
-
The left input, which is the outer table of the join
-
The right input, which is the inner table of the join
In the following query, the T1
table is the left input because it is on the left side of the JOIN keyword, and the T2
table is the right input, because it is on the right side of the JOIN keyword:
SELECT * FROM T1 JOIN T2 ON T1.x = T2.x;
Outer versus inner join
Query performance is better if the small table is used as the inner input to the join. The query optimizer automatically reorders the inputs to joins to ensure that this is the case unless the join in question is an outer join.
Note
If the configuration parameterEnableForceOuter
is set to 1, you can control join inputs for specific tables through
ALTER TABLE..FORCE OUTER
. For details, see Controlling join inputs.
The following example shows a query and its plan for a left outer join:
=> EXPLAIN SELECT CD.annual_income,OSI.sale_date_key
-> FROM online_sales.online_sales_fact OSI
-> LEFT OUTER JOIN customer_dimension CD ON CD.customer_key = OSI.customer_key;
Access Path:
+-JOIN HASH [LeftOuter] [Cost: 4K, Rows: 5M] (PATH ID: 1)
| Join Cond: (CD.customer_key = OSI.customer_key)
| Materialize at Output: OSI.sale_date_key
| Execute on: All Nodes
| +-- Outer -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 2)
| | Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
| | Materialize: OSI.customer_key
| | Execute on: All Nodes
| +-- Inner -> STORAGE ACCESS for CD [Cost: 264, Rows: 50K] (PATH ID: 3)
| | Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
| | Materialize: CD.annual_income, CD.customer_key
| | Execute on: All Nodes
The following example shows a query and its plan for a full outer join:
=> EXPLAIN SELECT CD.annual_income,OSI.sale_date_key
-> FROM online_sales.online_sales_fact OSI
-> FULL OUTER JOIN customer_dimension CD ON CD.customer_key = OSI.customer_key;
Access Path:
+-JOIN HASH [FullOuter] [Cost: 18K, Rows: 5M] (PATH ID: 1) Outer (RESEGMENT) Inner (FILTER)
| Join Cond: (CD.customer_key = OSI.customer_key)
| Execute on: All Nodes
| +-- Outer -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 2)
| | Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
| | Materialize: OSI.sale_date_key, OSI.customer_key
| | Execute on: All Nodes
| +-- Inner -> STORAGE ACCESS for CD [Cost: 264, Rows: 50K] (PATH ID: 3)
| | Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
| | Materialize: CD.annual_income, CD.customer_key
| | Execute on: All Nodes
Hash and merge joins
Vertica has two join algorithms to choose from: merge join and hash join. The optimizer automatically chooses the most appropriate algorithm, given the query and projections in a system.
For the following query, the optimizer chooses a hash join.
=> EXPLAIN SELECT CD.annual_income,OSI.sale_date_key
-> FROM online_sales.online_sales_fact OSI
-> INNER JOIN customer_dimension CD ON CD.customer_key = OSI.customer_key;
Access Path:
+-JOIN HASH [Cost: 4K, Rows: 5M] (PATH ID: 1)
| Join Cond: (CD.customer_key = OSI.customer_key)
| Materialize at Output: OSI.sale_date_key
| Execute on: All Nodes
| +-- Outer -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 2)
| | Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
| | Materialize: OSI.customer_key
| | Execute on: All Nodes
| +-- Inner -> STORAGE ACCESS for CD [Cost: 264, Rows: 50K] (PATH ID: 3)
| | Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
| | Materialize: CD.annual_income, CD.customer_key
| | Execute on: All Nodes
Tip
If you get a hash join when you are expecting a merge join, it means that at least one of the projections is not sorted on the join column (for example,customer_key
in the preceding query). To facilitate a merge join, you might need to create different projections that are sorted on the join columns.
In the next example, the optimizer chooses a merge join. The optimizer's first pass performs a merge join because the inputs are presorted, and then it performs a hash join.
=> EXPLAIN SELECT count(*) FROM online_sales.online_sales_fact OSI
-> INNER JOIN customer_dimension CD ON CD.customer_key = OSI.customer_key
-> INNER JOIN product_dimension PD ON PD.product_key = OSI.product_key;
Access Path:
+-GROUPBY NOTHING [Cost: 8K, Rows: 1] (PATH ID: 1)
| Aggregates: count(*)
| Execute on: All Nodes
| +---> JOIN HASH [Cost: 7K, Rows: 5M] (PATH ID: 2)
| | Join Cond: (PD.product_key = OSI.product_key)
| | Materialize at Input: OSI.product_key
| | Execute on: All Nodes
| | +-- Outer -> JOIN MERGEJOIN(inputs presorted) [Cost: 4K, Rows: 5M] (PATH ID: 3)
| | | Join Cond: (CD.customer_key = OSI.customer_key)
| | | Execute on: All Nodes
| | | +-- Outer -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 4)
| | | | Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
| | | | Materialize: OSI.customer_key
| | | | Execute on: All Nodes
| | | +-- Inner -> STORAGE ACCESS for CD [Cost: 132, Rows: 50K] (PATH ID: 5)
| | | | Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
| | | | Materialize: CD.customer_key
| | | | Execute on: All Nodes
| | +-- Inner -> STORAGE ACCESS for PD [Cost: 152, Rows: 60K] (PATH ID: 6)
| | | Projection: public.product_dimension_DBD_2_rep_vmartdb_design_vmartdb_design_node0001
| | | Materialize: PD.product_key
| | | Execute on: All Nodes
Inequality joins
Vertica processes joins with equality predicates very efficiently. The query plan shows equality join predicates as join condition (Join Cond
).
=> EXPLAIN SELECT CD.annual_income, OSI.sale_date_key
-> FROM online_sales.online_sales_fact OSI
-> INNER JOIN customer_dimension CD
-> ON CD.customer_key = OSI.customer_key;
Access Path:
+-JOIN HASH [Cost: 4K, Rows: 5M] (PATH ID: 1)
| Join Cond: (CD.customer_key = OSI.customer_key)
| Materialize at Output: OSI.sale_date_key
| Execute on: All Nodes
| +-- Outer -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 2)
| | Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
| | Materialize: OSI.customer_key
| | Execute on: All Nodes
| +-- Inner -> STORAGE ACCESS for CD [Cost: 264, Rows: 50K] (PATH ID: 3)
| | Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
| | Materialize: CD.annual_income, CD.customer_key
| | Execute on: All Nodes
However, inequality joins are treated like cross joins and can run less efficiently, which you can see by the change in cost between the two queries:
=> EXPLAIN SELECT CD.annual_income, OSI.sale_date_key
-> FROM online_sales.online_sales_fact OSI
-> INNER JOIN customer_dimension CD
-> ON CD.customer_key < OSI.customer_key;
Access Path:
+-JOIN HASH [Cost: 98M, Rows: 5M] (PATH ID: 1)
| Join Filter: (CD.customer_key < OSI.customer_key)
| Materialize at Output: CD.annual_income
| Execute on: All Nodes
| +-- Outer -> STORAGE ACCESS for CD [Cost: 132, Rows: 50K] (PATH ID: 2)
| | Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
| | Materialize: CD.customer_key
| | Execute on: All Nodes
| +-- Inner -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 3)
| | Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
| | Materialize: OSI.sale_date_key, OSI.customer_key
| | Execute on: All Nodes
Event series joins
Event series joins are denoted by the INTERPOLATED
path.
=> EXPLAIN SELECT * FROM hTicks h FULL OUTER JOIN aTicks a -> ON (h.time INTERPOLATE PREVIOUS
Access Path:
+-JOIN (INTERPOLATED) [FullOuter] [Cost: 31, Rows: 4 (NO STATISTICS)] (PATH ID: 1)
Outer (SORT ON JOIN KEY) Inner (SORT ON JOIN KEY)
| Join Cond: (h."time" = a."time")
| Execute on: Query Initiator
| +-- Outer -> STORAGE ACCESS for h [Cost: 15, Rows: 4 (NO STATISTICS)] (PATH ID: 2)
| | Projection: public.hTicks_node0004
| | Materialize: h.stock, h."time", h.price
| | Execute on: Query Initiator
| +-- Inner -> STORAGE ACCESS for a [Cost: 15, Rows: 4 (NO STATISTICS)] (PATH ID: 3)
| | Projection: public.aTicks_node0004
| | Materialize: a.stock, a."time", a.price
| | Execute on: Query Initiator