Controlling join inputs
By default, the optimizer uses its own internal logic to determine whether to join one table to another as an inner or outer input. Occasionally, the optimizer might choose the larger table as the inner input to a join. Doing so can incur performance and concurrency issues.
If the configuration parameter configuration parameter EnableForceOuter
is set to 1, you can control join inputs for specific tables through
ALTER TABLE..FORCE OUTER
. The FORCE OUTER
option modifies a table's force_outer
setting in the system table
TABLES
. When implementing a join, Vertica compares the force_outer
settings of the participating tables:
-
If table settings are unequal, Vertica uses them to set the join inputs:
-
A table with a low
force_outer
setting relative to other tables is joined to them as an inner input. -
A table with a high
force_outer
setting relative to other tables is joined to them as an outer input.
-
-
If all table settings are equal, Vertica ignores them and assembles the join on its own.
The force_outer
column is initially set to 5 for all newly-defined tables. You can use
ALTER TABLE..FORCE OUTER
to reset force_outer
to a value equal to or greater than 0. For example, you might change the force_outer
settings of tables abc
and xyz
to 3 and 8, respectively:
=> ALTER TABLE abc FORCE OUTER 3;
=> ALTER TABLE xyz FORCE OUTER 8;
Given these settings, the optimizer joins abc
as the inner input to any table with a force_outer
value greater than 3. The optimizer joins xyz
as the outer input to any table with a force_outer
value less than 8.
Projection inheritance
When you query a projection directly, it inherits the force_outer
setting of its anchor table. The query then uses this setting when joined to another projection.
Enabling forced join inputs
The configuration parameter EnableForceOuter
determines whether Vertica uses a table's force_outer
value to implement a join. By default, this parameter is set to 0, and forced join inputs are disabled. You can enable forced join inputs at session and database scopes, through
ALTER SESSION
and
ALTER DATABASE
, respectively:
=> ALTER SESSION SET EnableForceOuter = { 0 | 1 };
=> ALTER DATABASE db-name SET EnableForceOuter = { 0 | 1 };
If EnableForceOuter
is set to 0, ALTER TABLE..FORCE OUTER
statements return with this warning:
Note
WARNING 0: Set configuration parameter EnableForceOuter for the current session or the database in order to use
force_outer value
Viewing forced join inputs
EXPLAIN
-generated query plans indicate whether the configuration parameter EnableForceOuter
is on. A join query might include tables whose force_outer
settings are less or greater than the default value of 5. In this case, the query plan includes a Force outer level
field for the relevant join inputs.
For example, the following query joins tables store.store_sales
and public.products
, where both tables have the same force_outer
setting (5). EnableForceOuter
is on, as indicated in the generated query plan:
=> EXPLAIN SELECT s.store_key, p.product_description, s.sales_quantity, s.sale_date
FROM store.store_sales s JOIN public.products p ON s.product_key=p.product_key
WHERE s.sale_date='2014-12-01' ORDER BY s.store_key, s.sale_date;
EnableForceOuter is on
Access Path:
+-SORT [Cost: 7K, Rows: 100K (NO STATISTICS)] (PATH ID: 1)
| Order: sales.store_key ASC, sales.sale_date ASC
| Execute on: All Nodes
| +---> JOIN HASH [Cost: 5K, Rows: 100K (NO STATISTICS)] (PATH ID: 2) Outer (BROADCAST)(LOCAL ROUND ROBIN)
| | Join Cond: (sales.product_key = products.product_key)
| | Execute on: All Nodes
| | +-- Outer -> STORAGE ACCESS for sales [Cost: 2K, Rows: 100K (NO STATISTICS)] (PATH ID: 3)
| | | Projection: store.store_sales_b0
| | | Materialize: sales.sale_date, sales.store_key, sales.product_key, sales.sales_quantity
| | | Filter: (sales.sale_date = '2014-12-01'::date)
| | | Execute on: All Nodes
| | +-- Inner -> STORAGE ACCESS for products [Cost: 177, Rows: 60K (NO STATISTICS)] (PATH ID: 4)
| | | Projection: public.products_b0
| | | Materialize: products.product_key, products.product_description
| | | Execute on: All Nodes
The following ALTER TABLE
statement resets the force_outer
setting of public.products
to 1:
=> ALTER TABLE public.products FORCE OUTER 1;
ALTER TABLE
The regenerated query plan for the same join now includes a Force outer level
field and specifies public.products
as the inner input:
=> EXPLAIN SELECT s.store_key, p.product_description, s.sales_quantity, s.sale_date
FROM store.store_sales s JOIN public.products p ON s.product_key=p.product_key
WHERE s.sale_date='2014-12-01' ORDER BY s.store_key, s.sale_date;
EnableForceOuter is on
Access Path:
+-SORT [Cost: 7K, Rows: 100K (NO STATISTICS)] (PATH ID: 1)
| Order: sales.store_key ASC, sales.sale_date ASC
| Execute on: All Nodes
| +---> JOIN HASH [Cost: 5K, Rows: 100K (NO STATISTICS)] (PATH ID: 2) Outer (BROADCAST)(LOCAL ROUND ROBIN)
| | Join Cond: (sales.product_key = products.product_key)
| | Execute on: All Nodes
| | +-- Outer -> STORAGE ACCESS for sales [Cost: 2K, Rows: 100K (NO STATISTICS)] (PATH ID: 3)
| | | Projection: store.store_sales_b0
| | | Materialize: sales.sale_date, sales.store_key, sales.product_key, sales.sales_quantity
| | | Filter: (sales.sale_date = '2014-12-01'::date)
| | | Execute on: All Nodes
| | +-- Inner -> STORAGE ACCESS for products [Cost: 177, Rows: 60K (NO STATISTICS)] (PATH ID: 4)
| | | Projection: public.products_b0
| | | Force outer level: 1
| | | Materialize: products.product_key, products.product_description
| | | Execute on: All Nodes
If you change the force_outer
setting of public.products
to 8, Vertica creates a different query plan that specifies public.products
as the outer input:
=> ALTER TABLE public.products FORCE OUTER 8;
ALTER TABLE
=> EXPLAIN SELECT s.store_key, p.product_description, s.sales_quantity, s.sale_date
FROM store.store_sales s JOIN public.products p ON s.product_key=p.product_key
WHERE s.sale_date='2014-12-01' ORDER BY s.store_key, s.sale_date;
EnableForceOuter is on
Access Path:
+-SORT [Cost: 7K, Rows: 100K (NO STATISTICS)] (PATH ID: 1)
| Order: sales.store_key ASC, sales.sale_date ASC
| Execute on: All Nodes
| +---> JOIN HASH [Cost: 5K, Rows: 100K (NO STATISTICS)] (PATH ID: 2) Inner (BROADCAST)
| | Join Cond: (sales.product_key = products.product_key)
| | Materialize at Output: products.product_description
| | Execute on: All Nodes
| | +-- Outer -> STORAGE ACCESS for products [Cost: 20, Rows: 60K (NO STATISTICS)] (PATH ID: 3)
| | | Projection: public.products_b0
| | | Force outer level: 8
| | | Materialize: products.product_key
| | | Execute on: All Nodes
| | | Runtime Filter: (SIP1(HashJoin): products.product_key)
| | +-- Inner -> STORAGE ACCESS for sales [Cost: 2K, Rows: 100K (NO STATISTICS)] (PATH ID: 4)
| | | Projection: store.store_sales_b0
| | | Materialize: sales.sale_date, sales.store_key, sales.product_key, sales.sales_quantity
| | | Filter: (sales.sale_date = '2014-12-01'::date)
| | | Execute on: All Nodes
Restrictions
Vertica ignores force_outer
settings when it performs the following operations:
-
Outer joins: Vertica generally respects
OUTER JOIN
clauses regardless of theforce_outer
settings of the joined tables. -
MERGE
statement joins. -
Queries that include the
SYNTACTIC_JOIN
hint. -
Half-join queries such as
SEMI JOIN
. -
Joins to subqueries, where the subquery is always processed as having a
force_outer
setting of 5 regardless of theforce_outer
settings of the tables that are joined in the subquery. This setting determines a subquery's designation as inner or outer input relative to other join inputs. If two subqueries are joined, the optimizer determines which one is the inner input, and which one the outer.