This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Directed queries

Directed queries encapsulate information that the optimizer can use to create a query plan.

Directed queries encapsulate information that the optimizer can use to create a query plan. Directed queries can serve the following goals:

  • Preserve current query plans before a scheduled upgrade. In most instances, queries perform more efficiently after a Vertica upgrade. In the few cases where this is not so, you can use directed queries that you created before upgrading, to recreate query plans from the earlier version.
  • Enable you to create query plans that improve optimizer performance. Occasionally, you might want to influence the optimizer to make better choices in executing a given query. For example, you can choose a different projection, or force a different join order. In this case, you can use a directed query to create a query plan that preempts any plan that the optimizer might otherwise create.
  • Redirect an input query to a query that uses different semantics—for example, map a join query to a SELECT statement that queries a flattened table.

Directed query components

A directed query pairs two components:

  • Input query: A query that triggers use of this directed query when it is active.
  • Annotated query: A SQL statement with embedded optimizer hints, which instruct the optimizer how to create a query plan for the specified input query. These hints specify important query plan elements, such as join order and projection choices.

Vertica provides two methods for creating directed queries:

  • The optimizer can generate an annotated query from a given input query and pair the two as a directed query.
  • You can write your own annotated query and pair it with an input query.

For a description of both methods, see Creating directed queries.

1 - Creating directed queries

CREATE DIRECTED QUERY associates an input query with a query annotated with optimizer hints.

CREATE DIRECTED QUERY associates an input query with a query annotated with optimizer hints. It stores the association under a unique identifier.

CREATE DIRECTED QUERY has two variants:

  • CREATE DIRECTED QUERY OPTIMIZER directs the query optimizer to generate annotated SQL from the specified input query. The annotated query contains hints that the optimizer can use to recreate its current query plan for that input query.
  • CREATE DIRECTED QUERY CUSTOM specifies an annotated query supplied by the user. Vertica associates the annotated query with the input query specified by the last SAVE QUERY statement.

In both cases, Vertica associates the annotated query and input query, and registers their association in the system table DIRECTED_QUERIES under query_name.

The two approaches can be used together: you can use the annotated SQL that the optimizer creates as the basis for creating your own (custom) directed queries.

1.1 - Optimizer-generated directed queries

CREATE DIRECTED QUERY OPTIMIZER passes an input query to the optimizer, which generates an annotated query from its own query plan.

CREATE DIRECTED QUERY OPTIMIZER passes an input query to the optimizer, which generates an annotated query from its own query plan. It then pairs the input and annotated queries and saves them as a directed query. This directed query can be used to handle other queries that are identical except for the predicate strings on which query results are filtered.

You can use optimizer-generated directed queries to capture query plans before you upgrade. Doing so can be especially useful if you detect diminished performance of a given query after the upgrade. In this case, you can use the corresponding directed query to recreate an earlier query plan, and compare its performance to the plan generated by the current optimizer.

You can also create multiple optimizer-generated directed queries from the most frequently executed queries, by invoking the meta-function SAVE_PLANS. For details, see Bulk-Creation of Directed Queries.

Example

The following SQL statements create and activate the directed query findEmployeesCityJobTitle_OPT:


=> CREATE DIRECTED QUERY OPTIMIZER 'findEmployeesCityJobTitle_OPT'
     SELECT employee_first_name, employee_last_name FROM public.employee_dimension
     WHERE employee_city='Boston' and job_title='Cashier' ORDER BY employee_last_name, employee_first_name;
CREATE DIRECTED QUERY

=> ACTIVATE DIRECTED QUERY findEmployeesCityJobTitle_OPT;
ACTIVATE DIRECTED QUERY

After this directed query plan is activated, the optimizer uses it to generate a query plan for all subsequent invocations of this input query, and others like it. You can view the optimizer-generated annotated query by calling GET DIRECTED QUERY or querying system table DIRECTED_QUERIES:

=> SELECT input_query, annotated_query FROM V_CATALOG.DIRECTED_QUERIES
      WHERE query_name = 'findEmployeesCityJobTitle_OPT';
-[ RECORD 1 ]---+----------------------------------------------------------------------------
input_query     | SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension
WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/))
ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name
annotated_query | SELECT /*+verbatim*/ employee_dimension.employee_first_name AS employee_first_name, employee_dimension.employee_last_name AS employee_last_name FROM public.employee_dimension AS employee_dimension/*+projs('public.employee_dimension')*/
WHERE (employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)
ORDER BY 2 ASC, 1 ASC

In this case, the annotated query includes the following hints:

  • /*+verbatim*/ specifies to execute the annotated query exactly as written and produce a query plan accordingly.
  • /*+projs('public.Emp_Dimension')*/ directs the optimizer to create a query plan that uses the projection public.Emp_Dimension.
  • /*+:v(n)*/ (alias of /*+IGNORECONST(n)*/) is included several times in the annotated and input queries. These hints qualify two constants in the query predicates: Boston and Cashier. Each :v hint has an integer argument n that pairs corresponding constants in the input and annotated query queries: *+:v(1)*/ for Boston, and /*+:v(2)*/ for Cashier. The hints tell the optimizer to disregard these constants when it decides whether to apply this directed query to other input queries that are similar. Thus, ignore constant hints can let you use the same directed query for different input queries.

The following query uses different values for the columns employee_city and job_title, but is otherwise identical to the original input query of directed query EmployeesCityJobTitle_OPT:

=> SELECT employee_first_name, employee_last_name FROM public.employee_dimension
     WHERE employee_city = 'San Francisco' and job_title = 'Branch Manager' ORDER BY employee_last_name, employee_first_name;

If the directed query EmployeesCityJobTitle_OPT is active, the optimizer can use it for this query:

=> EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension
      WHERE employee_city='San Francisco' AND job_title='Branch Manager' ORDER BY employee_last_name, employee_first_name;
 ...
 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------
 EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension WHERE employee_city='San Francisco' AND job_title='Branch Manager' ORDER BY employee_last_name, employee_first_name;

 The following active directed query(query name: findEmployeesCityJobTitle_OPT) is being executed:
 SELECT /*+verbatim*/  employee_dimension.employee_first_name, employee_dimension.employee_last_name
 FROM public.employee_dimension employee_dimension/*+projs('public.employee_dimension')*/
 WHERE ((employee_dimension.employee_city = 'San Francisco'::varchar(13)) AND (employee_dimension.job_title = 'Branch Manager'::varchar(14)))
 ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name

 Access Path:
 +-SORT [Cost: 222, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
 |  Order: employee_dimension.employee_last_name ASC, employee_dimension.employee_first_name ASC
 |  Execute on: All Nodes
 | +---> STORAGE ACCESS for employee_dimension [Cost: 60, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.employee_dimension_super
 | |      Materialize: employee_dimension.employee_first_name, employee_dimension.employee_last_name
 | |      Filter: (employee_dimension.employee_city = 'San Francisco')
 | |      Filter: (employee_dimension.job_title = 'Branch Manager')
 | |      Execute on: All Nodes
 ...

Bulk-creation of directed queries

The meta-function SAVE_PLANS lets you create multiple optimizer-generated directed queries from the most frequently executed queries. SAVE_PLANS works as follows:

  1. Iterates over all queries in the data collector table dc_requests_issued and selects the most-frequently requested queries, up to the maximum specified by its query-budget argument. If the meta-function's since-date argument is also set, then SAVE_PLANS iterates only over queries that were issued on or after the specified date.

    As SAVE_PLANS iterates over dc_requests_issued, it tests queries against various restrictions. In general, directed queries support only SELECT statements as input. Within this broad requirement, input queries are subject to other restrictions.

  2. Calls CREATE DIRECTED QUERY OPTIMIZER on all qualifying input queries, which creates a directed query for each unique input query as described above.

  3. Saves metadata on the new set of directed queries to system table DIRECTED_QUERIES, where all directed queries of that set share the same SAVE_PLANS_VERSION integer. This integer is computed from the highest SAVE_PLANS_VERSION + 1.

You can later use SAVE_PLANS_VERSION identifiers to bulk activate, deactivate, and drop directed queries. For example:

=> SELECT save_plans (40);
                                                 save_plans
-------------------------------------------------------------------------------------------------------------
 9 directed query supported queries out of 40 most frequently run queries were saved under the save_plans_version 3.

To view the saved queries, run:

        SELECT * FROM directed_queries WHERE save_plans_version = '3';

To drop the saved queries, run:

        DROP DIRECTED QUERY WHERE save_plans_version = '3';
(1 row)

=> SELECT input_query::VARCHAR(60) FROM directed_queries WHERE save_plans_version = 3 AND input_query ILIKE '%line_search%';
                         input_query
--------------------------------------------------------------
 SELECT public.line_search_logistic2(udtf1.deviance, udtf1.G
 SELECT public.line_search_logistic2(udtf1.deviance, udtf1.G
(2 rows)

=> ACTIVATE DIRECTED QUERY WHERE save_plans_version = 3 AND input_query ILIKE '%line_search%';
ACTIVATE DIRECTED QUERY
=> SELECT query_name, input_query::VARCHAR(60), is_active FROM directed_queries WHERE save_plans_version = 3 AND input_query ILIKE '%line_search%';
       query_name       |                         input_query                          | is_active
------------------------+--------------------------------------------------------------+-----------
 save_plans_nolabel_3_3 | SELECT public.line_search_logistic2(udtf1.deviance, udtf1.G  | t
 save_plans_nolabel_6_3 | SELECT public.line_search_logistic2(udtf1.deviance, udtf1.G  | t
(2 rows)

1.2 - Custom directed queries

CREATE DIRECTED QUERY CUSTOM specifies an annotated query and pairs it to an input query previously saved by SAVE QUERY.

CREATE DIRECTED QUERY CUSTOM specifies an annotated query and pairs it to an input query previously saved by SAVE QUERY. You must issue both statements in the same user session.

For example, you might want a query to use a specific projection:

  1. Specify the query with SAVE QUERY:

    => SAVE QUERY SELECT employee_first_name, employee_last_name FROM employee_dimension
        WHERE employee_city='Boston' AND job_title='Cashier';
    SAVE QUERY
    
  2. Create a custom directed query with CREATE DIRECTED QUERY CUSTOM, which specifies an annotated query and associates it with the saved query. The annotated query includes a /*+projs*/ hint, which instructs the optimizer to use the projection public.emp_dimension_unseg when users call the saved query:

    
    => CREATE DIRECTED QUERY CUSTOM 'findBostonCashiers_CUSTOM'
       SELECT employee_first_name, employee_last_name
       FROM employee_dimension /*+Projs('public.emp_dimension_unseg')*/
       WHERE employee_city='Boston' AND job_title='Cashier';
    CREATE DIRECTED QUERY
    
  3. Activate the directed query:

    => ACTIVATE DIRECTED QUERY findBostonCashiers_CUSTOM;
    ACTIVATE DIRECTED QUERY
    
  4. After activation, the optimizer uses this directed query to generate a query plan for all subsequent invocations of its input query. The following EXPLAIN output verifies the optimizer's use of this directed query and the projection it specifies:

    
    => EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension
       WHERE employee_city='Boston' AND job_title='Cashier';
    
    QUERY PLAN
    ------------------------------
    QUERY PLAN DESCRIPTION:
    ------------------------------
    EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension where employee_city='Boston' AND job_title='Cashier';
    
     The following active directed query(query name: findBostonCashiers_CUSTOM) is being executed:
     SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name
      FROM public.employee_dimension/*+Projs('public.emp_dimension_unseg')*/
      WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6)) AND (employee_dimension.job_title = 'Cashier'::varchar(7)))
    
     Access Path:
     +-STORAGE ACCESS for employee_dimension [Cost: 158, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
     |  Projection: public.emp_dimension_unseg
     |  Materialize: employee_dimension.employee_first_name, employee_dimension.employee_last_name
     |  Filter: (employee_dimension.employee_city = 'Boston')
     |  Filter: (employee_dimension.job_title = 'Cashier')
     |  Execute on: Query Initiator
    

See also

Rewriting Join Queries

1.3 - Using optimizer-generated and custom directed queries together

You can use the annotated SQL that the optimizer creates as the basis for creating your own custom directed queries.

You can use the annotated SQL that the optimizer creates as the basis for creating your own custom directed queries. This approach can be especially useful in evaluating the plan that the optimizer creates to handle a given query, and testing plan modifications.

For example, you might want to modify how the optimizer implements the following query:

=> SELECT COUNT(customer_name) Total, customer_region Region
    FROM (store_sales s JOIN customer_dimension c ON c.customer_key = s.customer_key)
    JOIN product_dimension p ON s.product_key = p.product_key
    WHERE p.category_description ilike '%Medical%'
      AND p.product_description ilike '%antibiotics%'
      AND c.customer_age <= 30 AND YEAR(s.sales_date)=2017
    GROUP BY customer_region;

When you run EXPLAIN on this query, you discover that the optimizer uses projection customers_proj_age for the customer_dimension table. This projection is sorted on column customer_age. Consequently, the optimizer hash-joins the tables store_sales and customer_dimension on customer_key.

After analyzing customer_dimension table data, you observe that most customers are under 30, so it makes more sense to use projection customer_proj_id for the customer_dimension table, which is sorted on customer_key:

You can create a directed query that encapsulates this change as follows:

  1. Obtain optimizer-generated annotations on the query with EXPLAIN ANNOTATED:

    => \o annotatedQuery
    => EXPLAIN ANNOTATED SELECT COUNT(customer_name) Total, customer_region Region
         FROM (store_sales s JOIN customer_dimension c ON c.customer_key = s.customer_key)
         JOIN product_dimension p ON s.product_key = p.product_key
         WHERE p.category_description ilike '%Medical%'
           AND p.product_description ilike '%antibiotics%'
           AND c.customer_age <= 30 AND YEAR(s.sales_date)=2017
         GROUP BY customer_region;
    => \o
    => \! cat annotatedQuery
    ...
    SELECT /*+syntactic_join,verbatim*/ count(c.customer_name) AS Total, c.customer_region AS Region
     FROM ((public.store_sales AS s/*+projs('public.store_sales_super')*/
        JOIN /*+Distrib(L,B),JType(H)*/ public.customer_dimension AS c/*+projs('public.customers_proj_age')*/
          ON (c.customer_key = s.customer_key))
        JOIN /*+Distrib(L,B),JType(M)*/ public.product_dimension AS p/*+projs('public.product_dimension')*/
          ON (s.product_key = p.product_key))
     WHERE ((date_part('year'::varchar(4), (s.sales_date)::timestamp(0)))::int = 2017)
         AND (c.customer_age <= 30)
         AND ((p.category_description)::varchar(32) ~~* '%Medical%'::varchar(9))
         AND (p.product_description ~~* '%antibiotics%'::varchar(13))
     GROUP BY  /*+GByType(Hash)*/ 2
    (4 rows)
    
  2. Modify the annotated query:

    
    SELECT /*+syntactic_join,verbatim*/ count(c.customer_name) AS Total, c.customer_region AS Region
     FROM ((public.store_sales AS s/*+projs('public.store_sales_super')*/
        JOIN /*+Distrib(L,B),JType(H)*/ public.customer_dimension AS c/*+projs('public.customer_proj_id')*/
          ON (c.customer_key = s.customer_key))
        JOIN /*+Distrib(L,B),JType(H)*/ public.product_dimension AS p/*+projs('public.product_dimension')*/
          ON (s.product_key = p.product_key))
     WHERE ((date_part('year'::varchar(4), (s.sales_date)::timestamp(0)))::int = 2017)
         AND (c.customer_age <= 30)
         AND ((p.category_description)::varchar(32) ~~* '%Medical%'::varchar(9))
         AND (p.product_description ~~* '%antibiotics%'::varchar(13))
     GROUP BY  /*+GByType(Hash)*/ 2
    
  3. Use the modified annotated query to create the desired directed query:

    • Save the desired input query with SAVE QUERY:

      
      => SAVE QUERY SELECT COUNT(customer_name) Total, customer_region Region
          FROM (store_sales s JOIN customer_dimension c ON c.customer_key = s.customer_key)
          JOIN product_dimension p ON s.product_key = p.product_key
          WHERE p.category_description ilike '%Medical%'
            AND p.product_description ilike '%antibiotics%'
            AND c.customer_age <= 30 AND YEAR(s.sales_date)=2017
          GROUP BY customer_region;
      
    • Create a custom directed query that associates the saved input query with the modified annotated query:

      
      => CREATE DIRECTED QUERY CUSTOM 'getCustomersUnder31'
         SELECT /*+syntactic_join,verbatim*/ count(c.customer_name) AS Total, c.customer_region AS Region
       FROM ((public.store_sales AS s/*+projs('public.store_sales_super')*/
          JOIN /*+Distrib(L,B),JType(H)*/ public.customer_dimension AS c/*+projs('public.customer_proj_id')*/
            ON (c.customer_key = s.customer_key))
          JOIN /*+Distrib(L,B),JType(H)*/ public.product_dimension AS p/*+projs('public.product_dimension')*/
            ON (s.product_key = p.product_key))
       WHERE ((date_part('year'::varchar(4), (s.sales_date)::timestamp(0)))::int = 2017)
           AND (c.customer_age <= 30)
           AND ((p.category_description)::varchar(32) ~~* '%Medical%'::varchar(9))
           AND (p.product_description ~~* '%antibiotics%'::varchar(13))
       GROUP BY  /*+GByType(Hash)*/ 2;
      CREATE DIRECTED QUERY
      
  4. Activate this directed query:

    => ACTIVATE DIRECTED QUERY getCustomersUnder31;
    ACTIVATE DIRECTED QUERY
    

When the optimizer processes a query that matches this directed query's input query, it uses the directed query's annotated query to generate a query plan:

=> EXPLAIN SELECT COUNT(customer_name) Total, customer_region Region
     FROM (store_sales s JOIN customer_dimension c ON c.customer_key = s.customer_key)
     JOIN product_dimension p ON s.product_key = p.product_key
     WHERE p.category_description ilike '%Medical%'
       AND p.product_description ilike '%antibiotics%'
       AND c.customer_age <= 30 AND YEAR(s.sales_date)=2017
     GROUP BY customer_region;

 The following active directed query(query name: getCustomersUnder31) is being executed:
...

2 - Setting hints in annotated queries

The hints in a directed query's annotated query provide the optimizer with instructions how to execute an input query.

The hints in a directed query's annotated query provide the optimizer with instructions how to execute an input query. Annotated queries support the following hints:

Other hints in annotated queries such as DIRECT or LABEL have no effect.

You can use hints in a vsql query the same as in an annotated query, with two exceptions: :v (IGNORECONSTANT) and VERBATIM.

3 - Ignoring constants in directed queries

Optimizer-generated directed queries generally include one or more :v (alias of IGNORECONSTANT) hints, which mark predicate string constants that you want the optimizer to ignore when it decides whether to use a directed query for a given input query.

Optimizer-generated directed queries generally include one or more :v (alias of IGNORECONSTANT) hints, which mark predicate string constants that you want the optimizer to ignore when it decides whether to use a directed query for a given input query. :v hints enable multiple queries to use the same directed query, provided the queries are identical in all respects except their predicate strings.

For example, the following two queries are identical , except for the string constants Boston|San Francisco and Cashier|Branch Manager, which are specified for columns employee_city and job_title, respectively:


=> SELECT employee_first_name, employee_last_name FROM public.employee_dimension
     WHERE employee_city='Boston' and job_title ='Cashier' ORDER BY employee_last_name, employee_first_name;

=> SELECT employee_first_name, employee_last_name FROM public.employee_dimension
     WHERE employee_city = 'San Francisco' and job_title = 'Branch Manager' ORDER BY employee_last_name, employee_first_name;

In this case, an optimizer-generated directed query that you create from one query can be used for both:


=> CREATE DIRECTED QUERY OPTIMIZER 'findEmployeesCityJobTitle_OPT'
     SELECT employee_first_name, employee_last_name FROM public.employee_dimension
     WHERE employee_city='Boston' and job_title='Cashier' ORDER BY employee_last_name, employee_first_name;
CREATE DIRECTED QUERY

=> ACTIVATE DIRECTED QUERY findEmployeesCityJobTitle_OPT;
ACTIVATE DIRECTED QUERY

The directed query's input and annotated queries both include :v hints:

=> SELECT input_query, annotated_query FROM V_CATALOG.DIRECTED_QUERIES
      WHERE query_name = 'findEmployeesCityJobTitle_OPT';
-[ RECORD 1 ]---+----------------------------------------------------------------------------
input_query     | SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension
WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/))
ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name
annotated_query | SELECT /*+verbatim*/ employee_dimension.employee_first_name AS employee_first_name, employee_dimension.employee_last_name AS employee_last_name FROM public.employee_dimension AS employee_dimension/*+projs('public.employee_dimension')*/
WHERE (employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)
ORDER BY 2 ASC, 1 ASC

The hint arguments in the input and annotated queries pair two predicate constants:

  • `/*+:v(1)*/` pairs input and annotated query settings for employee_city.

  • /*+:v(2)*/ pairs input and annotated query settings for job_title.

The :v hints tell the optimizer to ignore values for these two columns when it decides whether it can use this directed query for a given input query.

For example, the following query uses different values for employee_city and job_title, but is otherwise identical to the query used to create the directed query EmployeesCityJobTitle_OPT:

=> SELECT employee_first_name, employee_last_name FROM public.employee_dimension
     WHERE employee_city = 'San Francisco' and job_title = 'Branch Manager' ORDER BY employee_last_name, employee_first_name;

If the directed query EmployeesCityJobTitle_OPT is active, the optimizer can use it in its query plan for this query:

=> EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension
      WHERE employee_city='San Francisco' AND job_title='Branch Manager' ORDER BY employee_last_name, employee_first_name;
 ...
 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------
 EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension WHERE employee_city='San Francisco' AND job_title='Branch Manager' ORDER BY employee_last_name, employee_first_name;

 The following active directed query(query name: findEmployeesCityJobTitle_OPT) is being executed:
 SELECT /*+verbatim*/  employee_dimension.employee_first_name, employee_dimension.employee_last_name
 FROM public.employee_dimension employee_dimension/*+projs('public.employee_dimension')*/
 WHERE ((employee_dimension.employee_city = 'San Francisco'::varchar(13)) AND (employee_dimension.job_title = 'Branch Manager'::varchar(14)))
 ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name

 Access Path:
 +-SORT [Cost: 222, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
 |  Order: employee_dimension.employee_last_name ASC, employee_dimension.employee_first_name ASC
 |  Execute on: All Nodes
 | +---> STORAGE ACCESS for employee_dimension [Cost: 60, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.employee_dimension_super
 | |      Materialize: employee_dimension.employee_first_name, employee_dimension.employee_last_name
 | |      Filter: (employee_dimension.employee_city = 'San Francisco')
 | |      Filter: (employee_dimension.job_title = 'Branch Manager')
 | |      Execute on: All Nodes
 ...

Mapping one-to-many :v hints

The examples shown so far demonstrate one-to-one pairings of :v hints. You can also use :v hints to map one input constant to multiple constants in an annotated query. This approach can be especially useful when you want to provide the optimizer with explicit instructions how to execute a query that joins tables.

For example, the following query joins two tables:

SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 8;

In this case, the optimizer can infer that S.a and T.b have the same value and implements the join accordingly:

<a name="simpleJoinExample"></a>=> CREATE DIRECTED QUERY OPTIMIZER simpleJoin SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 8;
CREATE DIRECTED QUERY
=> SELECT input_query, annotated_query FROM directed_queries WHERE query_name = 'simpleJoin';
-[ RECORD 1 ]---+---------------------------------------------------------------------------------------------------------------------
input_query     | SELECT S.a, T.b FROM (public.S JOIN public.T ON ((S.a = T.b))) WHERE (S.a = 8 /*+:v(1)*/)
annotated_query | SELECT /*+syntactic_join,verbatim*/ S.a AS a, T.b AS b
FROM (public.S AS S/*+projs('public.S')*/ JOIN /*+Distrib(L,L),JType(M)*/ public.T AS T/*+projs('public.T')*/  ON (S.a = T.b))
WHERE (S.a = 8 /*+:v(1)*/) AND (T.b = 8 /*+:v(1)*/)
(1 row)

=> ACTIVATE DIRECTED QUERY simpleJoin;
ACTIVATED DIRECTED QUERY

Now, given the following input query:

SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 3;

the optimizer disregards the join predicate constants and uses the directed query simpleJoin in its query plan:

 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 3;

 The following active directed query(query name: simpleJoin) is being executed:
 SELECT /*+syntactic_join,verbatim*/  S.a, T.b FROM (public.S S/*+projs('public.S')*/ JOIN /*+Distrib('L', 'L'), JType('
M')*/public.T T/*+projs('public.T')*/ ON ((S.a = T.b))) WHERE ((S.a = 3) AND (T.b = 3))

 Access Path:
 +-JOIN MERGEJOIN(inputs presorted) [Cost: 21, Rows: 4 (NO STATISTICS)] (PATH ID: 1)
 |  Join Cond: (S.a = T.b)
 |  Execute on: Query Initiator
 | +-- Outer -> STORAGE ACCESS for S [Cost: 12, Rows: 4 (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.S_b0
 | |      Materialize: S.a
 | |      Filter: (S.a = 3)
 | |      Execute on: Query Initiator
 | |      Runtime Filter: (SIP1(MergeJoin): S.a)
 | +-- Inner -> STORAGE ACCESS for T [Cost: 8, Rows: 3 (NO STATISTICS)] (PATH ID: 3)
 | |      Projection: public.T_b0
 | |      Materialize: T.b
 | |      Filter: (T.b = 3)
 | |      Execute on: Query Initiator
 ...

Conserving predicate constants in directed queries

By default, optimizer-generated directed queries set :v hints on predicate constants. You can override this behavior by marking predicate constants that must not be ignored with :c hints. For example, the following statement creates a directed query that can be used only for input queries where the join predicate constant is the same as in the original input query—8:

=> CREATE DIRECTED QUERY OPTIMIZER simpleJoin_KeepPredicateConstant SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 8 /*+:c*/;
CREATE DIRECTED QUERY
=> ACTIVATE DIRECTED QUERY simpleJoin_KeepPredicateConstant;

The following query on system table DIRECTED_QUERIES compares directed queries simpleJoin(created in an earlier example) and simpleJoin_KeepPredicateConstant. Unlike simpleJoin, the join predicate of the input and annotated queries for simpleJoin_KeepPredicateConstant omit :v hints:



=> SELECT query_name, input_query, annotated_query FROM directed_queries WHERE query_name ILIKE'%simpleJoin%';
-[ RECORD 1 ]---+
query_name      | simpleJoin
input_query     | SELECT S.a, T.b FROM (public.S JOIN public.T ON ((S.a = T.b))) WHERE (S.a = 8 /*+:v(1)*/)
annotated_query | SELECT /*+syntactic_join,verbatim*/ S.a AS a, T.b AS b
FROM (public.S AS S/*+projs('public.S')*/ JOIN /*+Distrib(L,L),JType(M)*/ public.T AS T/*+projs('public.T')*/  ON (S.a = T.b))
WHERE (S.a = 8 /*+:v(1)*/) AND (T.b = 8 /*+:v(1)*/)
-[ RECORD 2 ]---+
query_name      | simpleJoin_KeepPredicateConstant
input_query     | SELECT S.a, T.b FROM (public.S JOIN public.T ON ((S.a = T.b))) WHERE (S.a = 8)
annotated_query | SELECT /*+syntactic_join,verbatim*/ S.a AS a, T.b AS b
FROM (public.S AS S/*+projs('public.S')*/ JOIN /*+Distrib(L,L),JType(M)*/ public.T AS T/*+projs('public.T')*/  ON (S.a = T.b))
WHERE (S.a = 8) AND (T.b = 8)

If you deactivate directed query simpleJoin (which would otherwise have precedence over simpleJoin_KeepPredicateConstant because it was created earlier), Vertica only applies simpleJoin_KeepPredicateConstant on input queries where the join predicate constant is the same as in the original input query. For example, compare the following two query plans:

=> EXPLAIN SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 8;
 ...
 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 8;

 The following active directed query(query name: simpleJoin_KeepPredicateConstant) is being executed:
 SELECT /*+syntactic_join,verbatim*/  S.a, T.b FROM (public.S S/*+projs('public.S')*/ JOIN /*+Distrib('L', 'L'), JType('
M')*/public.T T/*+projs('public.T')*/ ON ((S.a = T.b))) WHERE ((S.a = 8) AND (T.b = 8))

 Access Path:
 +-JOIN MERGEJOIN(inputs presorted) [Cost: 21, Rows: 4 (NO STATISTICS)] (PATH ID: 1)
 |  Join Cond: (S.a = T.b)
 ...
=> EXPLAIN SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 3
 ...
 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 3;

 Access Path:
 +-JOIN MERGEJOIN(inputs presorted) [Cost: 21, Rows: 4 (NO STATISTICS)] (PATH ID: 1)
 |  Join Cond: (S.a = T.b)
 ...

4 - Rewriting queries

You can use directed queries to change the semantics of a given query—that is, substitute one query for another.

You can use directed queries to change the semantics of a given query—that is, substitute one query for another. This can be especially important when you have little or no control over the content and format of input queries that your Vertica database processes. You can map these queries to directed queries that rewrite the original input for optimal execution.

The following sections describe two use cases:

Rewriting join queries

Many of your input queries join multiple tables. You've determined that in many cases, it would be more efficient to denormalize much of your data in several flattened tables and query those tables directly. You cannot revise the input queries themselves. However, you can use directed queries to redirect join queries to the flattened table data.

For example, the following query aggregates regional sales of wine products by joining three tables in the VMart database:

=> SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
    FROM store.store_sales_fact SF
    JOIN store.store_dimension SD ON SF.store_key=SD.store_key
    JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
  WHERE P.product_description ILIKE '%wine%'
  GROUP BY ROLLUP (SD.store_region, SD.store_city)
  ORDER BY Region,Total DESC;

You can consolidate the joined table data in a single flattened table, and query this table instead. By doing so, you can access the same data faster. You can create the flattened table with the following DDL statements:

=> CREATE TABLE store.store_sales_wide AS SELECT * FROM store.store_sales_fact;
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_name VARCHAR(64)
     SET USING (SELECT store_name FROM store.store_dimension WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_city varchar(64)
    SET USING (SELECT store_city FROM store.store_dimension  WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_state char(2)
    SET USING (SELECT store_state char FROM store.store_dimension WHERE store.store_sales_wide.store_key=store.store_dimension.store_key)
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_region varchar(64)
    SET USING (SELECT store_region FROM store.store_dimension  WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
=> ALTER TABLE store.store_sales_wide ADD column product_description VARCHAR(128)
    SET USING (SELECT product_description FROM public.product_dimension
    WHERE store_sales_wide.product_key||store_sales_wide.product_version = product_dimension.product_key||product_dimension.product_version);
=> ALTER TABLE store.store_sales_wide ADD COLUMN sku_number char(32)
    SET USING (SELECT sku_number char FROM product_dimension
    WHERE store_sales_wide.product_key||store_sales_wide.product_version = product_dimension.product_key||product_dimension.product_version);

=> SELECT REFRESH_COLUMNS ('store.store_sales_wide','', 'rebuild');

After creating this table and refreshing its SET USING columns, you can rewrite the earlier query as follows:

=> SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
    FROM store.store_sales_fact SF
    JOIN store.store_dimension SD ON SF.store_key=SD.store_key
    JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
  WHERE P.product_description ILIKE '%wine%'
  GROUP BY ROLLUP (SD.store_region, SD.store_city)
  ORDER BY Region,Total DESC;
  Region   |       City       |  Total
-----------+------------------+---------
 East      |                  | 1679788
 East      | Boston           |  138494
 East      | Elizabeth        |  138071
 East      | Sterling Heights |  137719
 East      | Allentown        |  137122
 East      | New Haven        |  117751
 East      | Lowell           |  102670
 East      | Washington       |   84595
 East      | Charlotte        |   83255
 East      | Waterbury        |   81516
 East      | Erie             |   80784
 East      | Stamford         |   59935
 East      | Hartford         |   59843
 East      | Baltimore        |   55873
 East      | Clarksville      |   54117
 East      | Nashville        |   53757
 East      | Manchester       |   53290
 East      | Columbia         |   52799
 East      | Memphis          |   52648
 East      | Philadelphia     |   29711
 East      | Portsmouth       |   29316
 East      | New York         |   27033
 East      | Cambridge        |   26111
 East      | Alexandria       |   23378
 MidWest   |                  | 1073224
 MidWest   | Lansing          |  145616
 MidWest   | Livonia          |  129349
--More--

Querying the flattened table is more efficient; however, you still must account for input queries that continue to use the earlier join syntax. You can do so by creating a custom directed query, which redirects these input queries to syntax that targets the flattened table:

  1. Save the input query:

    => SAVE QUERY SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
        FROM store.store_sales_fact SF
        JOIN store.store_dimension SD ON SF.store_key=SD.store_key
        JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
      WHERE P.product_description ILIKE '%wine%'
      GROUP BY ROLLUP (SD.store_region, SD.store_city)
      ORDER BY Region,Total DESC;
    SAVE QUERY
    
  2. Map the saved query to a directed query with the desired syntax, and activate the directed query:

    => CREATE DIRECTED QUERY CUSTOM 'RegionalSalesWine'
        SELECT store_region AS Region,
          store_city AS City,
          SUM(gross_profit_dollar_amount) AS Total
        FROM store.store_sales_wide
        WHERE product_description ILIKE '%wine%'
        GROUP BY ROLLUP (region, city)
        ORDER BY Region,Total DESC;
    CREATE DIRECTED QUERY
    
    => ACTIVATE DIRECTED QUERY RegionalSalesWine;
    ACTIVATE DIRECTED QUERY
    

When directed query RegionalSalesWine is active, the query optimizer maps all queries that match the original input format to the directed query, as shown in the following query plan:

=> EXPLAIN SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
     FROM store.store_sales_fact SF
     JOIN store.store_dimension SD ON SF.store_key=SD.store_key
     JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
   WHERE P.product_description ILIKE '%wine%'
   GROUP BY ROLLUP (SD.store_region, SD.store_city)
   ORDER BY Region,Total DESC;
 ...
 The following active directed query(query name: RegionalSalesWine) is being executed:
 SELECT store_sales_wide.store_region AS Region, store_sales_wide.store_city AS City, sum(store_sales_wide.gross_profit_dollar_amount) AS Total
  FROM store.store_sales_wide WHERE (store_sales_wide.product_description ~~* '%wine%'::varchar(6))
  GROUP BY GROUPING SETS((store_sales_wide.store_region, store_sales_wide.store_city), (store_sales_wide.store_region),())
  ORDER BY store_sales_wide.store_region, sum(store_sales_wide.gross_profit_dollar_amount) DESC

 Access Path:
 +-SORT [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
 |  Order: store_sales_wide.store_region ASC, sum(store_sales_wide.gross_profit_dollar_amount) DESC
 |  Execute on: All Nodes
 | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Aggregates: sum(store_sales_wide.gross_profit_dollar_amount)
 | |      Group By: store_sales_wide.store_region, store_sales_wide.store_city
 | |      Grouping Sets: (store_sales_wide.store_region, store_sales_wide.store_city, <SVAR>), (store_sales_wide.store_region, <SVAR>), (<SVAR>)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for store_sales_wide [Cost: 864, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
 | | |      Projection: store.store_sales_wide_b0
 | | |      Materialize: store_sales_wide.gross_profit_dollar_amount, store_sales_wide.store_city, store_sales_wide.store_region
 | | |      Filter: (store_sales_wide.product_description ~~* '%wine%')
 | | |      Execute on: All Nodes

To compare the costs of executing the directed query and executing the original input query, deactivate the directed query and use EXPLAIN on the original input query. The optimizer reverts to creating a plan for the input query that incurs significantly greater cost—188K versus 2K:

=> DEACTIVATE DIRECTED QUERY RegionalSalesWine;
DEACTIVATE DIRECTED QUERY
=> EXPLAIN SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
     FROM store.store_sales_fact SF
     JOIN store.store_dimension SD ON SF.store_key=SD.store_key
     JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
   WHERE P.product_description ILIKE '%wine%'
   GROUP BY ROLLUP (SD.store_region, SD.store_city)
   ORDER BY Region,Total DESC;
 ...
  Access Path:
 +-SORT [Cost: 188K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
 |  Order: SD.store_region ASC, sum(SF.gross_profit_dollar_amount) DESC
 |  Execute on: All Nodes
 | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 188K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Aggregates: sum(SF.gross_profit_dollar_amount)
 | |      Group By: SD.store_region, SD.store_city
 | |      Grouping Sets: (SD.store_region, SD.store_city, <SVAR>), (SD.store_region, <SVAR>), (<SVAR>)
 | |      Execute on: All Nodes
 | | +---> JOIN HASH [Cost: 12K, Rows: 5M (NO STATISTICS)] (PATH ID: 3) Inner (BROADCAST)
 | | |      Join Cond: (concat((SF.product_key)::varchar, (SF.product_version)::varchar) = concat((P.product_key)::varchar, (P.product_version)::varchar))
 | | |      Materialize at Input: SF.product_key, SF.product_version
 | | |      Materialize at Output: SF.gross_profit_dollar_amount
 | | |      Execute on: All Nodes
 | | | +-- Outer -> JOIN HASH [Cost: 2K, Rows: 5M (NO STATISTICS)] (PATH ID: 4) Inner (BROADCAST)
 | | | |      Join Cond: (SF.store_key = SD.store_key)
 | | | |      Execute on: All Nodes
 | | | | +-- Outer -> STORAGE ACCESS for SF [Cost: 1K, Rows: 5M (NO STATISTICS)] (PATH ID: 5)
 | | | | |      Projection: store.store_sales_fact_super
 | | | | |      Materialize: SF.store_key
 | | | | |      Execute on: All Nodes
 | | | | |      Runtime Filters: (SIP2(HashJoin): SF.store_key), (SIP1(HashJoin): concat((SF.product_key)::varchar, (SF.product_version)::varchar))
 | | | | +-- Inner -> STORAGE ACCESS for SD [Cost: 13, Rows: 250 (NO STATISTICS)] (PATH ID: 6)
 | | | | |      Projection: store.store_dimension_super
 | | | | |      Materialize: SD.store_key, SD.store_city, SD.store_region
 | | | | |      Execute on: All Nodes
 | | | +-- Inner -> STORAGE ACCESS for P [Cost: 201, Rows: 60K (NO STATISTICS)] (PATH ID: 7)
 | | | |      Projection: public.product_dimension_super
 | | | |      Materialize: P.product_key, P.product_version
 | | | |      Filter: (P.product_description ~~* '%wine%')
 | | | |      Execute on: All Nodes

Creating query templates

You can use directed queries to implement multiple queries that are identical except for the predicate strings on which query results are filtered. For example, directed query RegionalSalesWine only handles input queries that filter on product_description values that contain the string wine. You can create a modified version of this directed query that matches the syntax of multiple input queries, which differ only in their input values—for example, tuna.

Create this query template in the following steps:

  1. Create two optimizer-generated directed queries:

    • From the original query on the joined tables:

      => CREATE DIRECTED QUERY OPTIMIZER RegionalSalesProducts_JoinTables
           SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
             FROM store.store_sales_fact SF
             JOIN store.store_dimension SD ON SF.store_key=SD.store_key
             JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
             WHERE P.product_description ILIKE '%wine%'
           GROUP BY ROLLUP (SD.store_region, SD.store_city)
           ORDER BY Region,Total DESC;
      CREATE DIRECTED QUERY
      
    • From the query on the flattened table:

      => CREATE DIRECTED QUERY OPTIMIZER RegionalSalesProduct
          SELECT store_region AS Region, store_city AS City, SUM(gross_profit_dollar_amount) AS Total
          FROM store.store_sales_wide
          WHERE product_description ILIKE '%wine%'
          GROUP BY ROLLUP (region, city)
          ORDER BY Region,Total DESC;
      CREATE DIRECTED QUERY
      
  2. Query system table DIRECTED_QUERIES and copy the input query for directed query RegionalSalesProducts_JoinTables:

    SELECT input_query FROM directed_queries WHERE query_name = 'RegionalSalesProducts_JoinTables';
    
  3. Use the copied input query with SAVE QUERY:

    SAVE QUERY SELECT SD.store_region AS Region, SD.store_city AS City, sum(SF.gross_profit_dollar_amount) AS Total
      FROM ((store.store_sales_fact SF
      JOIN store.store_dimension SD ON ((SF.store_key = SD.store_key)))
      JOIN public.product_dimension P ON ((concat((SF.product_key)::varchar, (SF.product_version)::varchar) = concat((P.product_key)::varchar, (P.product_version)::varchar))))
      WHERE (P.product_description ~~* '%wine%'::varchar(6) /*+:v(1)*/)
      GROUP BY GROUPING SETS((SD.store_region, SD.store_city), (SD.store_region), ())
      ORDER BY SD.store_region, sum(SF.gross_profit_dollar_amount) DESC
    (1 row)
    
  4. Query system table DIRECTED_QUERIES and copy the annotated query for directed query RegionalSalesProducts_FlatTables:

    SELECT input_query FROM directed_queries WHERE query_name = 'RegionalSalesProducts_JoinTables';
    
  5. Use the copied annotated query to create a custom directed query:

    => CREATE DIRECTED QUERY CUSTOM RegionalSalesProduct  SELECT /*+verbatim*/ store_sales_wide.store_region AS Region, store_sales_wide.store_city AS City, sum(store_sales_wide.gross_profit_dollar_amount) AS Total
         FROM store.store_sales_wide AS store_sales_wide/*+projs('store.store_sales_wide')*/
         WHERE (store_sales_wide.product_description ~~* '%wine%'::varchar(6) /*+:v(1)*/)
         GROUP BY  /*+GByType(Hash)*/ GROUPING SETS((1, 2), (1), ())
         ORDER BY 1 ASC, 3 DESC;
    CREATE DIRECTED QUERY
    
  6. Activate the directed query:

    ACTIVATE DIRECTED QUERY RegionalSalesProduct;
    

After activating this directed query, Vertica can use it for input queries that match the template, differing only in the predicate value for product_description:


=> EXPLAIN SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
         FROM store.store_sales_fact SF
         JOIN store.store_dimension SD ON SF.store_key=SD.store_key
         JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
       WHERE P.product_description ILIKE '%tuna%'
       GROUP BY ROLLUP (SD.store_region, SD.store_city)
       ORDER BY Region,Total DESC;
 ...
 The following active directed query(query name: RegionalSalesProduct) is being executed:
 SELECT /*+verbatim*/  store_sales_wide.store_region AS Region, store_sales_wide.store_city AS City, sum(store_sales_wide.gross_profit_dollar_amount) AS Total
  FROM store.store_sales_wide store_sales_wide/*+projs('store.store_sales_wide')*/
  WHERE (store_sales_wide.product_description ~~* '%tuna%'::varchar(6))
  GROUP BY /*+GByType(Hash)*/  GROUPING SETS((store_sales_wide.store_region, store_sales_wide.store_city), (store_sales_wide.store_region), ())
  ORDER BY store_sales_wide.store_region, sum(store_sales_wide.gross_profit_dollar_amount) DESC

 Access Path:
 +-SORT [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
 |  Order: store_sales_wide.store_region ASC, sum(store_sales_wide.gross_profit_dollar_amount) DESC
 |  Execute on: All Nodes
 | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Aggregates: sum(store_sales_wide.gross_profit_dollar_amount)
 | |      Group By: store_sales_wide.store_region, store_sales_wide.store_city
 | |      Grouping Sets: (store_sales_wide.store_region, store_sales_wide.store_city, <SVAR>), (store_sales_wide.store_region, <SVAR>), (<SVAR>)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for store_sales_wide [Cost: 864, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
 | | |      Projection: store.store_sales_wide_b0
 | | |      Materialize: store_sales_wide.gross_profit_dollar_amount, store_sales_wide.store_city, store_sales_wide.store_region
 | | |      Filter: (store_sales_wide.product_description ~~* '%tuna%')
 | | |      Execute on: All Nodes

When you execute this query, it returns with the following results:

  Region   |       City       |  Total
-----------+------------------+---------
 East      |                  | 1564582
 East      | Elizabeth        |  131328
 East      | Allentown        |  129704
 East      | Boston           |  128188
 East      | Sterling Heights |  125877
 East      | Lowell           |  112133
 East      | New Haven        |  101161
 East      | Waterbury        |   85401
 East      | Washington       |   76127
 East      | Erie             |   73002
 East      | Charlotte        |   67850
 East      | Memphis          |   53650
 East      | Clarksville      |   53416
 East      | Hartford         |   52583
 East      | Columbia         |   51950
 East      | Nashville        |   50031
 East      | Manchester       |   48607
 East      | Baltimore        |   48108
 East      | Stamford         |   47302
 East      | New York         |   30840
 East      | Portsmouth       |   26485
 East      | Alexandria       |   26391
 East      | Philadelphia     |   23092
 East      | Cambridge        |   21356
 MidWest   |                  |  980209
 MidWest   | Lansing          |  130044
 MidWest   | Livonia          |  118740
--More--

5 - Managing directed queries

Vertica provides a number of ways to manage directed queries:.

Vertica provides a number of ways to manage directed queries:

5.1 - Getting directed queries

You can obtain catalog information about directed queries in two ways:.

You can obtain catalog information about directed queries in two ways:

GET DIRECTED QUERY

GET DIRECTED QUERY queries the system table DIRECTED_QUERIES on the specified input query. It returns details of all directed queries that map to the input query.

The following GET DIRECTED QUERY statement returns the directed query that maps to the following input query, findEmployeesCityJobTitle_OPT:

=> GET DIRECTED QUERY SELECT employee_first_name, employee_last_name from employee_dimension where employee_city='Boston' AND job_title='Cashier' order by employee_last_name, employee_first_name;
-[ RECORD 1 ]---+
query_name      | findEmployeesCityJobTitle_OPT
is_active       | t
vertica_version | Vertica Analytic Database v11.0.1-20210815
comment         | Optimizer-generated directed query
creation_date   | 2021-08-20 14:53:42.323963
annotated_query | SELECT /*+verbatim*/  employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension employee_dimension/*+projs('public.employee_dimension')*/ WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)) ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name

DIRECTED_QUERIES

You can query the system table DIRECTED_QUERIES directly. For example:

=> SELECT query_name, is_active FROM V_CATALOG.DIRECTED_QUERIES WHERE query_name ILIKE '%findEmployeesCityJobTitle%';
          query_name           | is_active
-------------------------------+-----------
 findEmployeesCityJobTitle_OPT | t
(1 row)

5.2 - Identifying active directed queries

Multiple directed queries can map to the same input query.

Multiple directed queries can map to the same input query. The is_active column in system table DIRECTED_QUERIES identifies directed queries that are active. If multiple directed queries are active for the same input query, the optimizer uses the first one to be created. In that case, you can use EXPLAIN to identify which directed query is active.

The following query on DIRECTED_QUERIES finds all active directed queries where the input query contains the name of the queried table employee_dimension:

=> SELECT * FROM directed_queries WHERE input_query ILIKE ('%employee_dimension%') AND is_active='t';
-[ RECORD 1 ]------+
query_name         | findEmployeesCityJobTitle_OPT
is_active          | t
vertica_version    | Vertica Analytic Database v11.0.1-20210815
comment            | Optimizer-generated directed query
save_plans_version | 0
username           | dbadmin
creation_date      | 2021-08-20 14:53:42.323963
since_date         |
input_query        | SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)) ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name
annotated_query    | SELECT /*+verbatim*/  employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension employee_dimension/*+projs('public.employee_dimension')*/ WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)) ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name

If you run EXPLAIN on the input query, it returns with a query plan that confirms use of findEmployeesCityJobTitle_OPT as the active directed query:

=> EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension WHERE employee_city='Boston' AND job_title ='Cashier' ORDER BY employee_last_name, employee_first_name;

 The following active directed query(query name: findEmployeesCityJobTitle_OPT) is being executed:
 SELECT /*+verbatim*/  employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension employee_dimension/*+projs('public.employee_dimension')*/ WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6)) AND (employee_dimension.job_title = 'Cashier'::varchar(7))) ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name

5.3 - Activating and deactivating directed queries

The optimizer uses only directed queries that are active.

The optimizer uses only directed queries that are active. If multiple directed queries share the same input query, the optimizer uses the first one to be created.

You activate and deactivate directed queries with ACTIVATE DIRECTED QUERY and DEACTIVATE DIRECTED QUERY, respectively. For example, the following ACTIVATE DIRECTED QUERY statement deactivates RegionalSalesProducts_JoinTables and activates RegionalSalesProduct:


=> DEACTIVATE DIRECTED QUERY RegionalSalesProducts_JoinTables;
DEACTIVATE DIRECTED QUERY;
=> ACTIVATE DIRECTED QUERY RegionalSalesProduct;
ACTIVATE DIRECTED QUERY;

ACTIVATE DIRECTED QUERY and DEACTIVATE DIRECTED QUERY can also activate and deactivate multiple directed queries that are filtered from system table DIRECTED_QUERIES. In the following example, DEACTIVATE DIRECTED QUERY deactivates all directed queries with the same save_plans_version identifier:

=> DEACTIVATE DIRECTED QUERY WHERE save_plans_version = 21;

Vertica uses the active directed query for a given query across all sessions until it is explicitly deactivated by DEACTIVATE DIRECTED QUERY or removed from storage by DROP DIRECTED QUERY. If a directed query is active at the time of database shutdown, Vertica automatically reactivates it when you restart the database.

After a direct query is deactivated, the query optimizer handles subsequent invocations of the input query by using another directed query, if one is available. Otherwise, it generates its own query plan.

5.4 - Exporting directed queries from the catalog

You can also export query plans as directed queries to an external SQL file.

Before upgrading to a new version of Vertica, you can export directed queries for those queries whose optimal performance is critical to your system:

  1. Use EXPORT_CATALOG with the argument DIRECTED_QUERIES to export from the database catalog all current directed queries and their current activation status:

    => SELECT EXPORT_CATALOG('../../export_directedqueries', 'DIRECTED_QUERIES');
    EXPORT_CATALOG
    -------------------------------------
    Catalog data exported successfully
    
  2. EXPORT_CATALOG creates a script to recreate the directed queries, as in the following example:

    SAVE QUERY SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)) ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name;
    CREATE DIRECTED QUERY CUSTOM findEmployeesCityJobTitle_OPT COMMENT 'Optimizer-generated directed query' OPTVER 'Vertica Analytic Database v11.0.1-20210815' PSDATE '2021-08-20 14:53:42.323963' SELECT /*+verbatim*/  employee_dimension.employee_first_name, employee_dimension.employee_last_name
    FROM public.employee_dimension employee_dimension/*+projs('public.employee_dimension')*/
    WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/))
    ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name;
    ACTIVATE DIRECTED QUERY findEmployeesCityJobTitle_OPT;
    
  3. After the upgrade is complete, remove each directed query from the database catalog with DROP DIRECTED QUERY. Alternatively, edit the export script and insert a DROP DIRECTED QUERY statement before each CREATE DIRECTED QUERY statement. For example, you might modify the script generated earlier with the changes shown in bold:

    SAVE QUERY SELECT employee_dimension.employee_first_name, ... ;
    DROP DIRECTED QUERY findEmployeesCityJobTitle_OPT;
    CREATE DIRECTED QUERY CUSTOM findEmployeesCityJobTitle_OPT COMMENT 'Optimizer-generated ... ;
    ACTIVATE DIRECTED QUERY findEmployeesCityJobTitle_OPT;
    
  4. When you run this script, Vertica recreates the directed queries and restores their activation status:

    => \i /home/dbadmin/export_directedqueries
    SAVE QUERY
    DROP DIRECTED QUERY
    CREATE DIRECTED QUERY
    ACTIVATE DIRECTED QUERY
    

5.5 - Dropping directed queries

DROP DIRECTED QUERY removes the specified directed query from the database catalog.

DROP DIRECTED QUERY removes the specified directed query from the database catalog. If the directed query is active, Vertica deactivates it before removal.

For example:

=> DROP DIRECTED QUERY findBostonCashiers_CUSTOM;
DROP DIRECTED QUERY

DROP DIRECTED QUERY can also drop multiple directed queries that are filtered from system table DIRECTED_QUERIES. In the following example, DROP DIRECTED QUERY drops all directed queries with the same save_plans_version identifier:

=> DROP DIRECTED QUERY WHERE save_plans_version = 21;

6 - Batch query plan export

Before upgrading to a new Vertica version, you might want to use directed queries to save query plans for possible reuse in the new database.

Before upgrading to a new Vertica version, you might want to use directed queries to save query plans for possible reuse in the new database. You cannot predict which query plans are likely candidates for reuse, so you probably want to save query plans for many, or all, database queries. However, you run hundreds of queries each day. Saving query plans for each one to the database catalog through repetitive calls to CREATE DIRECTED QUERY is impractical. Moreover, doing so can significantly increase catalog size and possibly impact performance.

In this case, you can bypass the database catalog and batch export query plans as directed queries to an external SQL file. By offloading query plan storage, you can save any number of query plans from the current database without impacting catalog size and performance. After the upgrade, you can decide which query plans you wish to retain in the new database, and selectively import the corresponding directed queries.

Vertica provides a set of meta-functions that support this approach:

  • EXPORT_DIRECTED_QUERIES generates query plans from a set of input queries, and writes SQL for creating directed queries that encapsulate those plans.

  • IMPORT_DIRECTED_QUERIES imports directed queries to the database catalog from a SQL file that was generated by EXPORT_DIRECTED_QUERIES.

6.1 - Exporting directed queries

You can batch export any number of query plans as directed queries to an external SQL file, as follows:.

You can batch export any number of query plans as directed queries to an external SQL file, as follows:

  1. Create a SQL file that contains the input queries whose query plans you wish to save. See Input Format below.

  2. Call the meta-function EXPORT_DIRECTED_QUERIES on that SQL file. The meta-function takes two arguments:

    • Input queries file.

    • Optionally, the name of an output file. EXPORT_DIRECTED_QUERIES writes SQL for creating directed queries to this file. If you supply an empty string, Vertica writes the SQL to standard output.

For example, the following EXPORT_DIRECTED_QUERIES statement specifies input file inputQueries and output file outputQueries:

=> SELECT EXPORT_DIRECTED_QUERIES('/home/dbadmin/inputQueries','/home/dbadmin/outputQueries');
                               EXPORT_DIRECTED_QUERIES
---------------------------------------------------------------------------------------------
 1 queries successfully exported.
Queries exported to /home/dbadmin/outputQueries.

(1 row)

Input file

The input file that you supply to EXPORT_DIRECTED_QUERIES contains one or more input queries. For each input query, you can optionally specify two fields that are used in the generated directed query:

  • DirQueryName provides the directed query's unique identifier, a string that conforms to conventions described in Identifiers.

  • DirQueryComment specifies a quote-delimited string, up to 128 characters.

You format each input query as follows:

--DirQueryName=query-name
--DirQueryComment='comment'
input-query

For example, a file can specify one input query as follows:

/* Query: findEmployeesCityJobTitle_OPT */
/* Comment: This query finds all employees of a given city and job title, ordered by employee name */
SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name, employee_dimension.job_title FROM public.employee_dimension WHERE (employee_dimension.employee_city = 'Boston'::varchar(6)) ORDER BY employee_dimension.job_title;

Output file

EXPORT_DIRECTED_QUERIES generates SQL for creating directed queries, and writes the SQL to the specified file or to standard output. In both cases, output conforms to the following format:

/* Query: directed-query-name */
/* Comment: directed-query-comment */
SAVE QUERY input-query;
CREATE DIRECTED QUERY CUSTOM 'directed-query-name'
COMMENT 'directed-query-comment'
OPTVER 'vertica-release-num'
PSDATE 'timestamp'
annotated-query

For example, given the previous input, Vertica writes the following output to /home/dbadmin/outputQueries:

/* Query: findEmployeesCityJobTitle_OPT */
/* Comment: This query finds all employees of a given city and job title, ordered by employee name */
SAVE QUERY SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)) ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name;
CREATE DIRECTED QUERY CUSTOM 'findEmployeesCityJobTitle_OPT'
COMMENT 'This query finds all employees of a given city and job title, ordered by employee name'
OPTVER 'Vertica Analytic Database v11.1.0-20220102'
PSDATE '2022-01-06 13:45:17.430254'
SELECT /*+verbatim*/ employee_dimension.employee_first_name AS employee_first_name, employee_dimension.employee_last_name AS employee_last_name
FROM public.employee_dimension AS employee_dimension/*+projs('public.employee_dimension')*/
WHERE (employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)
ORDER BY 2 ASC, 1 ASC;

If a given input query omits DirQueryName and DirQueryComment fields, EXPORT_DIRECTED_QUERIES automatically generates the following output:

  • /* Query: Autoname:timestamp.n */, where n is a zero-based integer index that ensures uniqueness among auto-generated names with the same timestamp.

  • /* Comment: Optimizer-generated directed query */

For example, the following input file contains one SELECT statement, and omits the DirQueryName and DirQueryComment fields:


SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name
FROM public.employee_dimension WHERE (employee_dimension.employee_city = 'Boston'::varchar(6))
ORDER BY employee_dimension.job_title;

Given this file, EXPORT_DIRECTED_QUERIES returns with a warning about the missing input fields, which it also writes to an error file:

> SELECT EXPORT_DIRECTED_QUERIES('/home/dbadmin/inputQueries2','/home/dbadmin/outputQueries3');
                                              EXPORT_DIRECTED_QUERIES
--------------------------------------------------------------------------------------------------------------
 1 queries successfully exported.
1 warning message was generated.
Queries exported to /home/dbadmin/outputQueries3.
See error report, /home/dbadmin/outputQueries3.err for details.
(1 row)

The output file contains the following content:

/* Query: Autoname:2022-01-06 14:11:23.071559.0 */
/* Comment: Optimizer-generated directed query */
SAVE QUERY SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension WHERE (employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) ORDER BY employee_dimension.job_title;
CREATE DIRECTED QUERY CUSTOM 'Autoname:2022-01-06 14:11:23.071559.0'
COMMENT 'Optimizer-generated directed query'
OPTVER 'Vertica Analytic Database v11.1.0-20220102'
PSDATE '2022-01-06 14:11:23.071559'
SELECT /*+verbatim*/ employee_dimension.employee_first_name AS employee_first_name, employee_dimension.employee_last_name AS employee_last_name
FROM public.employee_dimension AS employee_dimension/*+projs('public.employee_dimension')*/
WHERE (employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/)
ORDER BY employee_dimension.job_title ASC;

Error file

If any errors or warnings occur during EXPORT_DIRECTED_QUERIES execution, it returns with a message like this one:

1 queries successfully exported.
1 warning message was generated.
Queries exported to /home/dbadmin/outputQueries.
See error report, /home/dbadmin/outputQueries.err for details.

EXPORT_DIRECTED_QUERIES writes all errors and warnings to a file that it creates on the same path as the output file, and uses the output file's base name.

In the previous example, the output filename is /home/dbadmin/outputQueries, so EXPORT_DIRECTED_QUERIES writes errors to /home/dbadmin/outputQueries.err.

The error file can capture a number of errors and warnings, such as all instances where EXPORT_DIRECTED_QUERIES was unable to create a directed query. In the following example, the error file contains a warning that no name field was supplied for the specified input query, and records the name that was auto-generated for it:


----------------------------------------------------------------------------------------------------
WARNING: Name field not supplied. Using auto-generated name: 'Autoname:2016-10-13 09:44:33.527548.0'
Input Query: SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name, employee_dimension.job_title FROM public.employee_dimension WHERE (employee_dimension.employee_city = 'Boston'::varchar(6)) ORDER BY employee_dimension.job_title;
END WARNING

6.2 - Importing directed queries

After you determine which exported query plans you wish to use in the current database, you import them with IMPORT_DIRECTED_QUERIES.

After you determine which exported query plans you wish to use in the current database, you import them with IMPORT_DIRECTED_QUERIES. You supply this function with the name of the export file that you created with EXPORT_DIRECTED_QUERIES, and the names of directed queries you wish to import. For example:


=> SELECT IMPORT_DIRECTED_QUERIES('/home/dbadmin/outputQueries','FindEmployeesBoston');
                                    IMPORT_DIRECTED_QUERIES
------------------------------------------------------------------------------------------
 1 directed queries successfully imported.
To activate a query named 'my_query1':
=> ACTIVATE DIRECTED QUERY 'my_query1';

(1 row)

After importing the desired directed queries, you must activate them with ACTIVATE DIRECTED QUERY before you can use them to create query plans.

7 - Half join and cross join semantics

The Vertica optimizer uses several keywords in directed queries to recreate cross join and half join subqueries.

The Vertica optimizer uses several keywords in directed queries to recreate cross join and half join subqueries. It also supports an additional set of keywords to express complex cross joins and half joins. You can also use these keywords in queries that you execute directly in vsql.

For details, see the following topics:

7.1 - Half-join subquery semantics

The Vertica optimizer uses several keywords in directed queries to recreate half-join subqueries with certain search operators, such as ANY or NOT IN:.

The Vertica optimizer uses several keywords in directed queries to recreate half-join subqueries with certain search operators, such as ANY or NOT IN.

SEMI JOIN

Recreates a query that contains a subquery preceded by an IN, EXISTS, or ANY operator and executes a semi-join.

Input query

SELECT product_description FROM product_dimension
  WHERE product_dimension.product_key IN (SELECT qty_in_stock from inventory_fact);

Query plan


QUERY PLAN DESCRIPTION:
------------------------------

explain SELECT product_description FROM product_dimension WHERE product_dimension.product_key IN (SELECT qty_in_stock from inventory_fact);

 Access Path:
 +-JOIN HASH [Semi] [Cost: 1K, Rows: 30K] (PATH ID: 1) Outer (FILTER) Inner (RESEGMENT)
 |  Join Cond: (product_dimension.product_key = VAL(2))
 |  Materialize at Output: product_dimension.product_description
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for product_dimension [Cost: 152, Rows: 60K] (PATH ID: 2)
 | |      Projection: public.product_dimension
 | |      Materialize: product_dimension.product_key
 | |      Execute on: All Nodes
 | |      Runtime Filter: (SIP1(HashJoin): product_dimension.product_key)
 | +-- Inner -> SELECT [Cost: 248, Rows: 300K] (PATH ID: 3)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for inventory_fact [Cost: 248, Rows: 300K] (PATH ID: 4)
 | | |      Projection: public.inventory_fact_b0
 | | |      Materialize: inventory_fact.qty_in_stock
 | | |      Execute on: All Nodes

Optimizer-generated annotated query

SELECT /*+ syntactic_join */ product_dimension.product_description AS product_description
FROM (public.product_dimension AS product_dimension/*+projs('public.product_dimension')*/ 
SEMI JOIN /*+Distrib(F,R),JType(H)*/ (SELECT inventory_fact.qty_in_stock AS qty_in_stock
FROM public.inventory_fact AS inventory_fact/*+projs('public.inventory_fact')*/) AS subQ_1
ON (product_dimension.product_key = subQ_1.qty_in_stock))

NULLAWARE ANTI JOIN

Recreates a query that contains a subquery preceded by a NOT IN or !=ALL operator, and executes a null-aware anti-join.

Input query

SELECT product_description FROM product_dimension
WHERE product_dimension.product_key NOT IN (SELECT qty_in_stock from inventory_fact);

Query plan

 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT product_description FROM product_dimension WHERE product_dimension.product_key not IN (SELECT qty_in_sto
ck from inventory_fact);

 Access Path:
 +-JOIN HASH [Anti][NotInAnti] [Cost: 7K, Rows: 30K] (PATH ID: 1) Inner (BROADCAST)
 |  Join Cond: (product_dimension.product_key = VAL(2))
 |  Materialize at Output: product_dimension.product_description
 |  Execute on: Query Initiator
 | +-- Outer -> STORAGE ACCESS for product_dimension [Cost: 152, Rows: 60K] (PATH ID: 2)
 | |      Projection: public.product_dimension_DBD_2_rep_VMartDesign
 | |      Materialize: product_dimension.product_key
 | |      Execute on: Query Initiator
 | +-- Inner -> SELECT [Cost: 248, Rows: 300K] (PATH ID: 3)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for inventory_fact [Cost: 248, Rows: 300K] (PATH ID: 4)
 | | |      Projection: public.inventory_fact_DBD_9_seg_VMartDesign_b0
 | | |      Materialize: inventory_fact.qty_in_stock
 | | |      Execute on: All Nodes

Optimizer-generated annotated query

SELECT /*+ syntactic_join */ product_dimension.product_description AS product_description
FROM (public.product_dimension AS product_dimension/*+projs('public.product_dimension')*/ 
NULLAWARE ANTI JOIN /*+Distrib(L,B),JType(H)*/ (SELECT inventory_fact.qty_in_stock AS qty_in_stock
FROM public.inventory_fact AS inventory_fact/*+projs('public.inventory_fact')*/) AS subQ_1
ON (product_dimension.product_key = subQ_1.qty_in_stock))

SEMIALL JOIN

Recreates a query that contains a subquery preceded by an ALL operator, and executes a semi-all join.

Input query

SELECT product_key, product_description FROM product_dimension
  WHERE product_dimension.product_key > ALL (SELECT product_key from inventory_fact);

Query plan


QUERY PLAN DESCRIPTION:
------------------------------

explain SELECT product_key, product_description FROM product_dimension WHERE product_dimension.product_key > ALL (SELECT product_key from inventory_fact);

Access Path:
+-JOIN HASH [Semi][All] [Cost: 7M, Rows: 30K] (PATH ID: 1) Outer (FILTER) Inner (BROADCAST)
|  Join Filter: (product_dimension.product_key > VAL(2))
|  Materialize at Output: product_dimension.product_description
|  Execute on: All Nodes
| +-- Outer -> STORAGE ACCESS for product_dimension [Cost: 152, Rows: 60K] (PATH ID: 2)
| |      Projection: public.product_dimension
| |      Materialize: product_dimension.product_key
| |      Execute on: All Nodes
| +-- Inner -> SELECT [Cost: 248, Rows: 300K] (PATH ID: 3)
| |      Execute on: All Nodes
| | +---> STORAGE ACCESS for inventory_fact [Cost: 248, Rows: 300K] (PATH ID: 4)
| | |      Projection: public.inventory_fact_b0
| | |      Materialize: inventory_fact.product_key
| | |      Execute on: All Nodes

Optimizer-generated annotated query


SELECT /*+ syntactic_join */ product_dimension.product_key AS product_key, product_dimension.product_description AS product_description
FROM (public.product_dimension AS product_dimension/*+projs('public.product_dimension')*/ 
SEMIALL JOIN /*+Distrib(F,B),JType(H)*/ (SELECT inventory_fact.product_key AS product_key FROM public.inventory_fact AS inventory_fact/*+projs('public.inventory_fact')*/) AS subQ_1
ON (product_dimension.product_key > subQ_1.product_key))

ANTI JOIN

Recreates a query that contains a subquery preceded by a NOT EXISTS operator, and executes an anti-join.

Input query

SELECT product_key, product_description FROM product_dimension
WHERE NOT EXISTS (SELECT inventory_fact.product_key FROM inventory_fact
WHERE inventory_fact.product_key=product_dimension.product_key);

Query plan


 QUERY PLAN DESCRIPTION:
 ------------------------------

 explain SELECT product_key, product_description FROM product_dimension WHERE NOT EXISTS (SELECT inventory_fact.product_
key FROM inventory_fact WHERE inventory_fact.product_key=product_dimension.product_key);

 Access Path:
 +-JOIN HASH [Anti] [Cost: 703, Rows: 30K] (PATH ID: 1) Outer (FILTER)
 |  Join Cond: (VAL(1) = product_dimension.product_key)
 |  Materialize at Output: product_dimension.product_description
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for product_dimension [Cost: 152, Rows: 60K] (PATH ID: 2)
 | |      Projection: public.product_dimension_DBD_2_rep_VMartDesign
 | |      Materialize: product_dimension.product_key
 | |      Execute on: All Nodes
 | +-- Inner -> SELECT [Cost: 248, Rows: 300K] (PATH ID: 3)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for inventory_fact [Cost: 248, Rows: 300K] (PATH ID: 4)
 | | |      Projection: public.inventory_fact_DBD_9_seg_VMartDesign_b0
 | | |      Materialize: inventory_fact.product_key
 | | |      Execute on: All Nodes

Optimizer-generated annotated query

SELECT /*+ syntactic_join */ product_dimension.product_key AS product_key, product_dimension.product_description AS product_description
FROM (public.product_dimension AS product_dimension/*+projs('public.product_dimension')*/
ANTI JOIN /*+Distrib(F,L),JType(H)*/ (SELECT inventory_fact.product_key AS "inventory_fact.product_key"
FROM public.inventory_fact AS inventory_fact/*+projs('public.inventory_fact')*/) AS subQ_1
ON (subQ_1."inventory_fact.product_key" = product_dimension.product_key))

7.2 - Complex join semantics

The optimizer uses a set of keywords to express complex cross joins and half joins.

The optimizer uses a set of keywords to express complex cross joins and half joins. All complex joins are indicated by the keyword COMPLEX, which is inserted before the keyword JOIN—for example, CROSS COMPLEX JOIN. Semantics for complex half joins have an additional requirement, which is detailed below.

Complex cross join

Vertica uses the keyword phrase CROSS COMPLEX JOIN to describe all complex cross joins. For example:

Input query

SELECT
  (SELECT max(sales_quantity) FROM store.store_sales_fact) *
  (SELECT max(sales_quantity) FROM online_sales.online_sales_fact);

Query plan


 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT
   (SELECT max(sales_quantity) FROM store.store_sales_fact) *
   (SELECT max(sales_quantity) FROM online_sales.online_sales_fact);

 Access Path:
 +-JOIN  (CROSS JOIN) [Cost: 4K, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
 |  Execute on: Query Initiator
 | +-- Outer -> JOIN  (CROSS JOIN) [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 2)
 | |      Execute on: Query Initiator
 | | +-- Outer -> STORAGE ACCESS for dual [Cost: 10, Rows: 1] (PATH ID: 3)
 | | |      Projection: v_catalog.dual_p
 | | |      Materialize: dual.dummy
 | | |      Execute on: Query Initiator
 | | +-- Inner -> SELECT [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 4)
 | | |      Execute on: Query Initiator
 | | | +---> GROUPBY NOTHING [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 5)
 | | | |      Aggregates: max(store_sales_fact.sales_quantity)
 | | | |      Execute on: All Nodes
 | | | | +---> STORAGE ACCESS for store_sales_fact [Cost: 1K, Rows: 5M (NO STATISTICS)] (PATH ID: 6)
 | | | | |      Projection: store.store_sales_fact_super
 | | | | |      Materialize: store_sales_fact.sales_quantity
 | | | | |      Execute on: All Nodes
 | +-- Inner -> SELECT [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 7)
 | |      Execute on: Query Initiator
 | | +---> GROUPBY NOTHING [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 8)
 | | |      Aggregates: max(online_sales_fact.sales_quantity)
 | | |      Execute on: All Nodes
 | | | +---> STORAGE ACCESS for online_sales_fact [Cost: 1K, Rows: 5M (NO STATISTICS)] (PATH ID: 9)
 | | | |      Projection: online_sales.online_sales_fact_super
 | | | |      Materialize: online_sales_fact.sales_quantity
 | | | |      Execute on: All Nodes

Optimizer-generated annotated query

The following annotated query returns the same results as the input query shown earlier. As with all optimizer-generated annotated queries, you can execute this query directly in vsql, either as written or with modifications:

SELECT /*+syntactic_join,verbatim*/ (subQ_1.max * subQ_2.max) AS "?column?"
FROM ((v_catalog.dual AS dual CROSS COMPLEX JOIN /*+Distrib(L,L),JType(H)*/
(SELECT max(store_sales_fact.sales_quantity) AS max
FROM store.store_sales_fact AS store_sales_fact/*+projs('store.store_sales_fact')*/) AS subQ_1 )
CROSS COMPLEX JOIN /*+Distrib(L,L),JType(H)*/ (SELECT max(online_sales_fact.sales_quantity) AS max
FROM online_sales.online_sales_fact AS online_sales_fact/*+projs('online_sales.online_sales_fact')*/) AS subQ_2 )

Complex half join

Complex half joins are expressed by one of the following keywords:

  • SEMI COMPLEX JOIN

  • NULLAWARE ANTI COMPLEX JOIN

  • SEMIALL COMPLEX JOIN

  • ANTI COMPLEX JOIN

An additional requirement applies to all complex half joins: each subquery's SELECT list ends with a dummy column (labeled as false) that invokes the Vertica meta-function complex_join_marker(). As the subquery processes each row, complex_join_marker() returns true or false to indicate the row's inclusion or exclusion from the result set. The result set returns with this flag to the outer query, which can use the flag from this and other subqueries to filter its own result set.

For example, the query optimizer rewrites the following input query as a NULLAWARE ANTI COMPLEX JOIN. The join returns all rows from the subquery with their complex_join_marker() flag set to the appropriate Boolean value.

Input query

SELECT product_dimension.product_description FROM public.product_dimension
   WHERE (NOT (product_dimension.product_key NOT IN (SELECT inventory_fact.qty_in_stock FROM public.inventory_fact)));

Query plan

 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT product_dimension.product_description FROM public.product_dimension
   WHERE (NOT (product_dimension.product_key NOT IN (SELECT inventory_fact.qty_in_stock FROM public.inventory_fact)));

 Access Path:
 +-JOIN HASH [Anti][NotInAnti] [Cost: 3K, Rows: 30K] (PATH ID: 1) Inner (BROADCAST)
 |  Join Cond: (product_dimension.product_key = VAL(2))
 |  Materialize at Output: product_dimension.product_description
 |  Filter: (NOT VAL(2))
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for product_dimension [Cost: 56, Rows: 60K] (PATH ID: 2)
 | |      Projection: public.product_dimension_super
 | |      Materialize: product_dimension.product_key
 | |      Execute on: All Nodes
 | +-- Inner -> SELECT [Cost: 248, Rows: 300K] (PATH ID: 3)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for inventory_fact [Cost: 248, Rows: 300K] (PATH ID: 4)
 | | |      Projection: public.inventory_fact_super
 | | |      Materialize: inventory_fact.qty_in_stock
 | | |      Execute on: All Nodes

Optimizer-generated annotated query

The following annotated query returns the same results as the input query shown earlier. As with all optimizer-generated annotated queries, you can execute this query directly in vsql, either as written or with modifications. For example, you can control the outer query's output by modifying how its predicate evaluates the flag subQ_1."false".

SELECT /*+syntactic_join,verbatim*/ product_dimension.product_description AS product_description
FROM (public.product_dimension AS product_dimension/*+projs('public.product_dimension')*/
NULLAWARE ANTI COMPLEX JOIN /*+Distrib(L,B),JType(H)*/
   (SELECT inventory_fact.qty_in_stock AS qty_in_stock, complex_join_marker() AS "false"
    FROM public.inventory_fact AS inventory_fact/*+projs('public.inventory_fact')*/) AS subQ_1
ON (product_dimension.product_key = subQ_1.qty_in_stock)) WHERE (NOT subQ_1."false")

8 - Directed query restrictions

In general, directed queries support only SELECT statements as input.

In general, directed queries support only SELECT statements as input. Within that broad restriction, a number of specific restrictions apply to input queries. Vertica handles all restrictions through optimizer-generated warnings. The sections below divide these restrictions into several categories.

Tables, views, and projections

Input queries cannot query the following objects:

  • Tables without projections

  • Tables with access policies

  • System and data collector tables, except explicit and implicit references to V_CATALOG.DUAL

  • Views

  • Projections

Functions

Input queries cannot include the following functions:

Language elements

Input queries cannot include the following:

Data types

Input queries cannot include the following data types: