Controlling join inputs

By default, the optimizer uses its own internal logic to determine whether to join one table to another as an inner or outer input.

By default, the optimizer uses its own internal logic to determine whether to join one table to another as an inner or outer input. Occasionally, the optimizer might choose the larger table as the inner input to a join. Doing so can incur performance and concurrency issues.

If the configuration parameter configuration parameter EnableForceOuter is set to 1, you can control join inputs for specific tables through ALTER TABLE..FORCE OUTER. The FORCE OUTER option modifies a table's force_outer setting in the system table TABLES. When implementing a join, Vertica compares the force_outer settings of the participating tables:

  • If table settings are unequal, Vertica uses them to set the join inputs:

    • A table with a low force_outer setting relative to other tables is joined to them as an inner input.

    • A table with a high force_outer setting relative to other tables is joined to them as an outer input.

  • If all table settings are equal, Vertica ignores them and assembles the join on its own.

The force_outer column is initially set to 5 for all newly-defined tables. You can use ALTER TABLE..FORCE OUTER to reset force_outer to a value equal to or greater than 0. For example, you might change the force_outer settings of tables abc and xyz to 3 and 8, respectively:

=> ALTER TABLE abc FORCE OUTER 3;
=> ALTER TABLE xyz FORCE OUTER 8;

Given these settings, the optimizer joins abc as the inner input to any table with a force_outer value greater than 3. The optimizer joins xyz as the outer input to any table with a force_outer value less than 8.

Projection inheritance

When you query a projection directly, it inherits the force_outer setting of its anchor table. The query then uses this setting when joined to another projection.

Enabling forced join inputs

The configuration parameter EnableForceOuter determines whether Vertica uses a table's force_outer value to implement a join. By default, this parameter is set to 0, and forced join inputs are disabled. You can enable forced join inputs at session and database scopes, through ALTER SESSION and ALTER DATABASE, respectively:

=> ALTER SESSION SET EnableForceOuter = { 0 | 1 };
=> ALTER DATABASE db-name SET EnableForceOuter =  { 0 | 1 };

If EnableForceOuter is set to 0, ALTER TABLE..FORCE OUTER statements return with this warning:

Viewing forced join inputs

EXPLAIN-generated query plans indicate whether the configuration parameter EnableForceOuter is on. A join query might include tables whose force_outer settings are less or greater than the default value of 5. In this case, the query plan includes a Force outer level field for the relevant join inputs.

For example, the following query joins tables store.store_sales and public.products, where both tables have the same force_outer setting (5). EnableForceOuter is on, as indicated in the generated query plan:

=> EXPLAIN SELECT s.store_key, p.product_description, s.sales_quantity, s.sale_date
   FROM store.store_sales s JOIN public.products p ON s.product_key=p.product_key
   WHERE s.sale_date='2014-12-01' ORDER BY s.store_key, s.sale_date;

 EnableForceOuter is on
 Access Path:
 +-SORT [Cost: 7K, Rows: 100K (NO STATISTICS)] (PATH ID: 1)
 |  Order: sales.store_key ASC, sales.sale_date ASC
 |  Execute on: All Nodes
 | +---> JOIN HASH [Cost: 5K, Rows: 100K (NO STATISTICS)] (PATH ID: 2) Outer (BROADCAST)(LOCAL ROUND ROBIN)
 | |      Join Cond: (sales.product_key = products.product_key)
 | |      Execute on: All Nodes
 | | +-- Outer -> STORAGE ACCESS for sales [Cost: 2K, Rows: 100K (NO STATISTICS)] (PATH ID: 3)
 | | |      Projection: store.store_sales_b0
 | | |      Materialize: sales.sale_date, sales.store_key, sales.product_key, sales.sales_quantity
 | | |      Filter: (sales.sale_date = '2014-12-01'::date)
 | | |      Execute on: All Nodes
 | | +-- Inner -> STORAGE ACCESS for products [Cost: 177, Rows: 60K (NO STATISTICS)] (PATH ID: 4)
 | | |      Projection: public.products_b0
 | | |      Materialize: products.product_key, products.product_description
 | | |      Execute on: All Nodes

The following ALTER TABLE statement resets the force_outer setting of public.products to 1:

=> ALTER TABLE public.products FORCE OUTER 1;
ALTER TABLE

The regenerated query plan for the same join now includes a Force outer level field and specifies public.products as the inner input:

=> EXPLAIN SELECT s.store_key, p.product_description, s.sales_quantity, s.sale_date
   FROM store.store_sales s JOIN public.products p ON s.product_key=p.product_key
   WHERE s.sale_date='2014-12-01' ORDER BY s.store_key, s.sale_date;

 EnableForceOuter is on
 Access Path:
 +-SORT [Cost: 7K, Rows: 100K (NO STATISTICS)] (PATH ID: 1)
 |  Order: sales.store_key ASC, sales.sale_date ASC
 |  Execute on: All Nodes
 | +---> JOIN HASH [Cost: 5K, Rows: 100K (NO STATISTICS)] (PATH ID: 2) Outer (BROADCAST)(LOCAL ROUND ROBIN)
 | |      Join Cond: (sales.product_key = products.product_key)
 | |      Execute on: All Nodes
 | | +-- Outer -> STORAGE ACCESS for sales [Cost: 2K, Rows: 100K (NO STATISTICS)] (PATH ID: 3)
 | | |      Projection: store.store_sales_b0
 | | |      Materialize: sales.sale_date, sales.store_key, sales.product_key, sales.sales_quantity
 | | |      Filter: (sales.sale_date = '2014-12-01'::date)
 | | |      Execute on: All Nodes
 | | +-- Inner -> STORAGE ACCESS for products [Cost: 177, Rows: 60K (NO STATISTICS)] (PATH ID: 4)
 | | |      Projection: public.products_b0
 | | |      Force outer level: 1
 | | |      Materialize: products.product_key, products.product_description
 | | |      Execute on: All Nodes

If you change the force_outer setting of public.products to 8, Vertica creates a different query plan that specifies public.products as the outer input:

=> ALTER TABLE public.products FORCE OUTER 8;
ALTER TABLE

=> EXPLAIN SELECT s.store_key, p.product_description, s.sales_quantity, s.sale_date
   FROM store.store_sales s JOIN public.products p ON s.product_key=p.product_key
   WHERE s.sale_date='2014-12-01' ORDER BY s.store_key, s.sale_date;


EnableForceOuter is on
 Access Path:
 +-SORT [Cost: 7K, Rows: 100K (NO STATISTICS)] (PATH ID: 1)
 |  Order: sales.store_key ASC, sales.sale_date ASC
 |  Execute on: All Nodes
 | +---> JOIN HASH [Cost: 5K, Rows: 100K (NO STATISTICS)] (PATH ID: 2) Inner (BROADCAST)
 | |      Join Cond: (sales.product_key = products.product_key)
 | |      Materialize at Output: products.product_description
 | |      Execute on: All Nodes
 | | +-- Outer -> STORAGE ACCESS for products [Cost: 20, Rows: 60K (NO STATISTICS)] (PATH ID: 3)
 | | |      Projection: public.products_b0
 | | |      Force outer level: 8
 | | |      Materialize: products.product_key
 | | |      Execute on: All Nodes
 | | |      Runtime Filter: (SIP1(HashJoin): products.product_key)
 | | +-- Inner -> STORAGE ACCESS for sales [Cost: 2K, Rows: 100K (NO STATISTICS)] (PATH ID: 4)
 | | |      Projection: store.store_sales_b0
 | | |      Materialize: sales.sale_date, sales.store_key, sales.product_key, sales.sales_quantity
 | | |      Filter: (sales.sale_date = '2014-12-01'::date)
 | | |      Execute on: All Nodes

Restrictions

Vertica ignores force_outer settings when it performs the following operations:

  • Outer joins: Vertica generally respects OUTER JOIN clauses regardless of the force_outer settings of the joined tables.

  • MERGE statement joins.

  • Queries that include the SYNTACTIC_JOIN hint.

  • Half-join queries such as SEMI JOIN.

  • Joins to subqueries, where the subquery is always processed as having a force_outer setting of 5 regardless of the force_outer settings of the tables that are joined in the subquery. This setting determines a subquery's designation as inner or outer input relative to other join inputs. If two subqueries are joined, the optimizer determines which one is the inner input, and which one the outer.