This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Half join and cross join semantics
The Vertica optimizer uses several keywords in directed queries to recreate cross join and half join subqueries.
The Vertica optimizer uses several keywords in directed queries to recreate cross join and half join subqueries. It also supports an additional set of keywords to express complex cross joins and half joins. You can also use these keywords in queries that you execute directly in vsql.
Caution
These keywords do not conform with standard SQL; they are intended for use only by the Vertica optimizer.
For details, see the following topics:
1 - Half-join subquery semantics
The Vertica optimizer uses several keywords in directed queries to recreate half-join subqueries with certain search operators, such as ANY or NOT IN:.
The Vertica optimizer uses several keywords in directed queries to recreate half-join subqueries with certain search operators, such as ANY or NOT IN.
SEMI JOIN
Recreates a query that contains a subquery preceded by an
IN
,
EXIST
, or
ANY
operator and executes a semi-join.
Input query
SELECT product_description FROM product_dimension
WHERE product_dimension.product_key IN (SELECT qty_in_stock from inventory_fact);
Query plan
QUERY PLAN DESCRIPTION:
------------------------------
explain SELECT product_description FROM product_dimension WHERE product_dimension.product_key IN (SELECT qty_in_stock from inventory_fact);
Access Path:
+-JOIN HASH [Semi] [Cost: 1K, Rows: 30K] (PATH ID: 1) Outer (FILTER) Inner (RESEGMENT)
| Join Cond: (product_dimension.product_key = VAL(2))
| Materialize at Output: product_dimension.product_description
| Execute on: All Nodes
| +-- Outer -> STORAGE ACCESS for product_dimension [Cost: 152, Rows: 60K] (PATH ID: 2)
| | Projection: public.product_dimension
| | Materialize: product_dimension.product_key
| | Execute on: All Nodes
| | Runtime Filter: (SIP1(HashJoin): product_dimension.product_key)
| +-- Inner -> SELECT [Cost: 248, Rows: 300K] (PATH ID: 3)
| | Execute on: All Nodes
| | +---> STORAGE ACCESS for inventory_fact [Cost: 248, Rows: 300K] (PATH ID: 4)
| | | Projection: public.inventory_fact_b0
| | | Materialize: inventory_fact.qty_in_stock
| | | Execute on: All Nodes
Optimizer-generated annotated query
SELECT /*+ syntactic_join */ product_dimension.product_description AS product_description
FROM (public.product_dimension AS product_dimension/*+projs('public.product_dimension')*/
SEMI JOIN /*+Distrib(F,R),JType(H)*/ (SELECT inventory_fact.qty_in_stock AS qty_in_stock
FROM public.inventory_fact AS inventory_fact/*+projs('public.inventory_fact')*/) AS subQ_1
ON (product_dimension.product_key = subQ_1.qty_in_stock))
NULLAWARE ANTI JOIN
Recreates a query that contains a subquery preceded by a
NOT IN
or !=ALL
operator, and executes a null-aware anti-join.
Input query
SELECT product_description FROM product_dimension
WHERE product_dimension.product_key NOT IN (SELECT qty_in_stock from inventory_fact);
Query plan
QUERY PLAN DESCRIPTION:
------------------------------
EXPLAIN SELECT product_description FROM product_dimension WHERE product_dimension.product_key not IN (SELECT qty_in_sto
ck from inventory_fact);
Access Path:
+-JOIN HASH [Anti][NotInAnti] [Cost: 7K, Rows: 30K] (PATH ID: 1) Inner (BROADCAST)
| Join Cond: (product_dimension.product_key = VAL(2))
| Materialize at Output: product_dimension.product_description
| Execute on: Query Initiator
| +-- Outer -> STORAGE ACCESS for product_dimension [Cost: 152, Rows: 60K] (PATH ID: 2)
| | Projection: public.product_dimension_DBD_2_rep_VMartDesign
| | Materialize: product_dimension.product_key
| | Execute on: Query Initiator
| +-- Inner -> SELECT [Cost: 248, Rows: 300K] (PATH ID: 3)
| | Execute on: All Nodes
| | +---> STORAGE ACCESS for inventory_fact [Cost: 248, Rows: 300K] (PATH ID: 4)
| | | Projection: public.inventory_fact_DBD_9_seg_VMartDesign_b0
| | | Materialize: inventory_fact.qty_in_stock
| | | Execute on: All Nodes
Optimizer-generated annotated query
SELECT /*+ syntactic_join */ product_dimension.product_description AS product_description
FROM (public.product_dimension AS product_dimension/*+projs('public.product_dimension')*/
NULLAWARE ANTI JOIN /*+Distrib(L,B),JType(H)*/ (SELECT inventory_fact.qty_in_stock AS qty_in_stock
FROM public.inventory_fact AS inventory_fact/*+projs('public.inventory_fact')*/) AS subQ_1
ON (product_dimension.product_key = subQ_1.qty_in_stock))
SEMIALL JOIN
Recreates a query that contains a subquery preceded by an
ALL
operator, and executes a semi-all join.
Input query
SELECT product_key, product_description FROM product_dimension
WHERE product_dimension.product_key > ALL (SELECT product_key from inventory_fact);
Query plan
QUERY PLAN DESCRIPTION:
------------------------------
explain SELECT product_key, product_description FROM product_dimension WHERE product_dimension.product_key > ALL (SELECT product_key from inventory_fact);
Access Path:
+-JOIN HASH [Semi][All] [Cost: 7M, Rows: 30K] (PATH ID: 1) Outer (FILTER) Inner (BROADCAST)
| Join Filter: (product_dimension.product_key > VAL(2))
| Materialize at Output: product_dimension.product_description
| Execute on: All Nodes
| +-- Outer -> STORAGE ACCESS for product_dimension [Cost: 152, Rows: 60K] (PATH ID: 2)
| | Projection: public.product_dimension
| | Materialize: product_dimension.product_key
| | Execute on: All Nodes
| +-- Inner -> SELECT [Cost: 248, Rows: 300K] (PATH ID: 3)
| | Execute on: All Nodes
| | +---> STORAGE ACCESS for inventory_fact [Cost: 248, Rows: 300K] (PATH ID: 4)
| | | Projection: public.inventory_fact_b0
| | | Materialize: inventory_fact.product_key
| | | Execute on: All Nodes
Optimizer-generated annotated query
SELECT /*+ syntactic_join */ product_dimension.product_key AS product_key, product_dimension.product_description AS product_description
FROM (public.product_dimension AS product_dimension/*+projs('public.product_dimension')*/
SEMIALL JOIN /*+Distrib(F,B),JType(H)*/ (SELECT inventory_fact.product_key AS product_key FROM public.inventory_fact AS inventory_fact/*+projs('public.inventory_fact')*/) AS subQ_1
ON (product_dimension.product_key > subQ_1.product_key))
ANTI JOIN
Recreates a query that contains a subquery preceded by a
NOT EXISTS
operator, and executes an anti-join.
Input query
SELECT product_key, product_description FROM product_dimension
WHERE NOT EXISTS (SELECT inventory_fact.product_key FROM inventory_fact
WHERE inventory_fact.product_key=product_dimension.product_key);
Query plan
QUERY PLAN DESCRIPTION:
------------------------------
explain SELECT product_key, product_description FROM product_dimension WHERE NOT EXISTS (SELECT inventory_fact.product_
key FROM inventory_fact WHERE inventory_fact.product_key=product_dimension.product_key);
Access Path:
+-JOIN HASH [Anti] [Cost: 703, Rows: 30K] (PATH ID: 1) Outer (FILTER)
| Join Cond: (VAL(1) = product_dimension.product_key)
| Materialize at Output: product_dimension.product_description
| Execute on: All Nodes
| +-- Outer -> STORAGE ACCESS for product_dimension [Cost: 152, Rows: 60K] (PATH ID: 2)
| | Projection: public.product_dimension_DBD_2_rep_VMartDesign
| | Materialize: product_dimension.product_key
| | Execute on: All Nodes
| +-- Inner -> SELECT [Cost: 248, Rows: 300K] (PATH ID: 3)
| | Execute on: All Nodes
| | +---> STORAGE ACCESS for inventory_fact [Cost: 248, Rows: 300K] (PATH ID: 4)
| | | Projection: public.inventory_fact_DBD_9_seg_VMartDesign_b0
| | | Materialize: inventory_fact.product_key
| | | Execute on: All Nodes
Optimizer-generated annotated query
SELECT /*+ syntactic_join */ product_dimension.product_key AS product_key, product_dimension.product_description AS product_description
FROM (public.product_dimension AS product_dimension/*+projs('public.product_dimension')*/
ANTI JOIN /*+Distrib(F,L),JType(H)*/ (SELECT inventory_fact.product_key AS "inventory_fact.product_key"
FROM public.inventory_fact AS inventory_fact/*+projs('public.inventory_fact')*/) AS subQ_1
ON (subQ_1."inventory_fact.product_key" = product_dimension.product_key))
2 - Complex join semantics
The optimizer uses a set of keywords to express complex cross joins and half joins.
The optimizer uses a set of keywords to express complex cross joins and half joins. All complex joins are indicated by the keyword COMPLEX
, which is inserted before the keyword JOIN
—for example, CROSS COMPLEX JOIN
. Semantics for complex half joins have an additional requirement, which is detailed below.
Complex cross join
Vertica uses the keyword phrase CROSS COMPLEX JOIN
to describe all complex cross joins. For example:
Input query
SELECT
(SELECT max(sales_quantity) FROM store.store_sales_fact) *
(SELECT max(sales_quantity) FROM online_sales.online_sales_fact);
Query plan
QUERY PLAN DESCRIPTION:
------------------------------
EXPLAIN SELECT
(SELECT max(sales_quantity) FROM store.store_sales_fact) *
(SELECT max(sales_quantity) FROM online_sales.online_sales_fact);
Access Path:
+-JOIN (CROSS JOIN) [Cost: 4K, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
| Execute on: Query Initiator
| +-- Outer -> JOIN (CROSS JOIN) [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 2)
| | Execute on: Query Initiator
| | +-- Outer -> STORAGE ACCESS for dual [Cost: 10, Rows: 1] (PATH ID: 3)
| | | Projection: v_catalog.dual_p
| | | Materialize: dual.dummy
| | | Execute on: Query Initiator
| | +-- Inner -> SELECT [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 4)
| | | Execute on: Query Initiator
| | | +---> GROUPBY NOTHING [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 5)
| | | | Aggregates: max(store_sales_fact.sales_quantity)
| | | | Execute on: All Nodes
| | | | +---> STORAGE ACCESS for store_sales_fact [Cost: 1K, Rows: 5M (NO STATISTICS)] (PATH ID: 6)
| | | | | Projection: store.store_sales_fact_super
| | | | | Materialize: store_sales_fact.sales_quantity
| | | | | Execute on: All Nodes
| +-- Inner -> SELECT [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 7)
| | Execute on: Query Initiator
| | +---> GROUPBY NOTHING [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 8)
| | | Aggregates: max(online_sales_fact.sales_quantity)
| | | Execute on: All Nodes
| | | +---> STORAGE ACCESS for online_sales_fact [Cost: 1K, Rows: 5M (NO STATISTICS)] (PATH ID: 9)
| | | | Projection: online_sales.online_sales_fact_super
| | | | Materialize: online_sales_fact.sales_quantity
| | | | Execute on: All Nodes
Optimizer-generated annotated query
The following annotated query returns the same results as the input query shown earlier. As with all optimizer-generated annotated queries, you can execute this query directly in vsql, either as written or with modifications:
SELECT /*+syntactic_join,verbatim*/ (subQ_1.max * subQ_2.max) AS "?column?"
FROM ((v_catalog.dual AS dual CROSS COMPLEX JOIN /*+Distrib(L,L),JType(H)*/
(SELECT max(store_sales_fact.sales_quantity) AS max
FROM store.store_sales_fact AS store_sales_fact/*+projs('store.store_sales_fact')*/) AS subQ_1 )
CROSS COMPLEX JOIN /*+Distrib(L,L),JType(H)*/ (SELECT max(online_sales_fact.sales_quantity) AS max
FROM online_sales.online_sales_fact AS online_sales_fact/*+projs('online_sales.online_sales_fact')*/) AS subQ_2 )
Complex half join
Complex half joins are expressed by one of the following keywords:
An additional requirement applies to all complex half joins: each subquery's SELECT
list ends with a dummy column (labeled as false
) that invokes the Vertica meta-function complex_join_marker()
. As the subquery processes each row, complex_join_marker()
returns true
or false
to indicate the row's inclusion or exclusion from the result set. The result set returns with this flag to the outer query, which can use the flag from this and other subqueries to filter its own result set.
For example, the query optimizer rewrites the following input query as a NULLAWARE ANTI COMPLEX JOIN
. The join returns all rows from the subquery with their complex_join_marker()
flag set to the appropriate Boolean value.
Input query
SELECT product_dimension.product_description FROM public.product_dimension
WHERE (NOT (product_dimension.product_key NOT IN (SELECT inventory_fact.qty_in_stock FROM public.inventory_fact)));
Query plan
QUERY PLAN DESCRIPTION:
------------------------------
EXPLAIN SELECT product_dimension.product_description FROM public.product_dimension
WHERE (NOT (product_dimension.product_key NOT IN (SELECT inventory_fact.qty_in_stock FROM public.inventory_fact)));
Access Path:
+-JOIN HASH [Anti][NotInAnti] [Cost: 3K, Rows: 30K] (PATH ID: 1) Inner (BROADCAST)
| Join Cond: (product_dimension.product_key = VAL(2))
| Materialize at Output: product_dimension.product_description
| Filter: (NOT VAL(2))
| Execute on: All Nodes
| +-- Outer -> STORAGE ACCESS for product_dimension [Cost: 56, Rows: 60K] (PATH ID: 2)
| | Projection: public.product_dimension_super
| | Materialize: product_dimension.product_key
| | Execute on: All Nodes
| +-- Inner -> SELECT [Cost: 248, Rows: 300K] (PATH ID: 3)
| | Execute on: All Nodes
| | +---> STORAGE ACCESS for inventory_fact [Cost: 248, Rows: 300K] (PATH ID: 4)
| | | Projection: public.inventory_fact_super
| | | Materialize: inventory_fact.qty_in_stock
| | | Execute on: All Nodes
Optimizer-generated annotated query
The following annotated query returns the same results as the input query shown earlier. As with all optimizer-generated annotated queries, you can execute this query directly in vsql, either as written or with modifications. For example, you can control the outer query's output by modifying how its predicate evaluates the flag subQ_1."false"
.
SELECT /*+syntactic_join,verbatim*/ product_dimension.product_description AS product_description
FROM (public.product_dimension AS product_dimension/*+projs('public.product_dimension')*/
NULLAWARE ANTI COMPLEX JOIN /*+Distrib(L,B),JType(H)*/
(SELECT inventory_fact.qty_in_stock AS qty_in_stock, complex_join_marker() AS "false"
FROM public.inventory_fact AS inventory_fact/*+projs('public.inventory_fact')*/) AS subQ_1
ON (product_dimension.product_key = subQ_1.qty_in_stock)) WHERE (NOT subQ_1."false")