Complex join semantics

The optimizer uses a set of keywords to express complex cross joins and half joins.

The optimizer uses a set of keywords to express complex cross joins and half joins. All complex joins are indicated by the keyword COMPLEX, which is inserted before the keyword JOIN—for example, CROSS COMPLEX JOIN. Semantics for complex half joins have an additional requirement, which is detailed below.

Complex cross join

Vertica uses the keyword phrase CROSS COMPLEX JOIN to describe all complex cross joins. For example:

Input query

SELECT
  (SELECT max(sales_quantity) FROM store.store_sales_fact) *
  (SELECT max(sales_quantity) FROM online_sales.online_sales_fact);

Query plan


 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT
   (SELECT max(sales_quantity) FROM store.store_sales_fact) *
   (SELECT max(sales_quantity) FROM online_sales.online_sales_fact);

 Access Path:
 +-JOIN  (CROSS JOIN) [Cost: 4K, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
 |  Execute on: Query Initiator
 | +-- Outer -> JOIN  (CROSS JOIN) [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 2)
 | |      Execute on: Query Initiator
 | | +-- Outer -> STORAGE ACCESS for dual [Cost: 10, Rows: 1] (PATH ID: 3)
 | | |      Projection: v_catalog.dual_p
 | | |      Materialize: dual.dummy
 | | |      Execute on: Query Initiator
 | | +-- Inner -> SELECT [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 4)
 | | |      Execute on: Query Initiator
 | | | +---> GROUPBY NOTHING [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 5)
 | | | |      Aggregates: max(store_sales_fact.sales_quantity)
 | | | |      Execute on: All Nodes
 | | | | +---> STORAGE ACCESS for store_sales_fact [Cost: 1K, Rows: 5M (NO STATISTICS)] (PATH ID: 6)
 | | | | |      Projection: store.store_sales_fact_super
 | | | | |      Materialize: store_sales_fact.sales_quantity
 | | | | |      Execute on: All Nodes
 | +-- Inner -> SELECT [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 7)
 | |      Execute on: Query Initiator
 | | +---> GROUPBY NOTHING [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 8)
 | | |      Aggregates: max(online_sales_fact.sales_quantity)
 | | |      Execute on: All Nodes
 | | | +---> STORAGE ACCESS for online_sales_fact [Cost: 1K, Rows: 5M (NO STATISTICS)] (PATH ID: 9)
 | | | |      Projection: online_sales.online_sales_fact_super
 | | | |      Materialize: online_sales_fact.sales_quantity
 | | | |      Execute on: All Nodes

Optimizer-generated annotated query

The following annotated query returns the same results as the input query shown earlier. As with all optimizer-generated annotated queries, you can execute this query directly in vsql, either as written or with modifications:

SELECT /*+syntactic_join,verbatim*/ (subQ_1.max * subQ_2.max) AS "?column?"
FROM ((v_catalog.dual AS dual CROSS COMPLEX JOIN /*+Distrib(L,L),JType(H)*/
(SELECT max(store_sales_fact.sales_quantity) AS max
FROM store.store_sales_fact AS store_sales_fact/*+projs('store.store_sales_fact')*/) AS subQ_1 )
CROSS COMPLEX JOIN /*+Distrib(L,L),JType(H)*/ (SELECT max(online_sales_fact.sales_quantity) AS max
FROM online_sales.online_sales_fact AS online_sales_fact/*+projs('online_sales.online_sales_fact')*/) AS subQ_2 )

Complex half join

Complex half joins are expressed by one of the following keywords:

  • SEMI COMPLEX JOIN

  • NULLAWARE ANTI COMPLEX JOIN

  • SEMIALL COMPLEX JOIN

  • ANTI COMPLEX JOIN

An additional requirement applies to all complex half joins: each subquery's SELECT list ends with a dummy column (labeled as false) that invokes the Vertica meta-function complex_join_marker(). As the subquery processes each row, complex_join_marker() returns true or false to indicate the row's inclusion or exclusion from the result set. The result set returns with this flag to the outer query, which can use the flag from this and other subqueries to filter its own result set.

For example, the query optimizer rewrites the following input query as a NULLAWARE ANTI COMPLEX JOIN. The join returns all rows from the subquery with their complex_join_marker() flag set to the appropriate Boolean value.

Input query

SELECT product_dimension.product_description FROM public.product_dimension
   WHERE (NOT (product_dimension.product_key NOT IN (SELECT inventory_fact.qty_in_stock FROM public.inventory_fact)));

Query plan

 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT product_dimension.product_description FROM public.product_dimension
   WHERE (NOT (product_dimension.product_key NOT IN (SELECT inventory_fact.qty_in_stock FROM public.inventory_fact)));

 Access Path:
 +-JOIN HASH [Anti][NotInAnti] [Cost: 3K, Rows: 30K] (PATH ID: 1) Inner (BROADCAST)
 |  Join Cond: (product_dimension.product_key = VAL(2))
 |  Materialize at Output: product_dimension.product_description
 |  Filter: (NOT VAL(2))
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for product_dimension [Cost: 56, Rows: 60K] (PATH ID: 2)
 | |      Projection: public.product_dimension_super
 | |      Materialize: product_dimension.product_key
 | |      Execute on: All Nodes
 | +-- Inner -> SELECT [Cost: 248, Rows: 300K] (PATH ID: 3)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for inventory_fact [Cost: 248, Rows: 300K] (PATH ID: 4)
 | | |      Projection: public.inventory_fact_super
 | | |      Materialize: inventory_fact.qty_in_stock
 | | |      Execute on: All Nodes

Optimizer-generated annotated query

The following annotated query returns the same results as the input query shown earlier. As with all optimizer-generated annotated queries, you can execute this query directly in vsql, either as written or with modifications. For example, you can control the outer query's output by modifying how its predicate evaluates the flag subQ_1."false".

SELECT /*+syntactic_join,verbatim*/ product_dimension.product_description AS product_description
FROM (public.product_dimension AS product_dimension/*+projs('public.product_dimension')*/
NULLAWARE ANTI COMPLEX JOIN /*+Distrib(L,B),JType(H)*/
   (SELECT inventory_fact.qty_in_stock AS qty_in_stock, complex_join_marker() AS "false"
    FROM public.inventory_fact AS inventory_fact/*+projs('public.inventory_fact')*/) AS subQ_1
ON (product_dimension.product_key = subQ_1.qty_in_stock)) WHERE (NOT subQ_1."false")