This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Creating directed queries

The two approaches can be used together: you can use the annotated SQL that the optimizer creates as the basis for creating your own (custom) directed queries.

CREATE DIRECTED QUERY associates an input query with a query annotated with optimizer hints. It stores the association under a unique identifier. CREATE DIRECTED QUERY has two variants:

  • CREATE DIRECTED QUERY OPTIMIZER directs the query optimizer to generate annotated SQL from the specified input query. The annotated query contains hints that the optimizer can use to recreate its current query plan for that input query.

  • CREATE DIRECTED QUERY CUSTOM specifies an annotated query supplied by the user. Vertica associates the annotated query with the input query specified by the last SAVE QUERY statement.

In both cases, Vertica associates the annotated query and input query, and registers their association in the system table DIRECTED_QUERIES under query_name.

The two approaches can be used together: you can use the annotated SQL that the optimizer creates as the basis for creating your own (custom) directed queries.

1 - Optimizer-generated directed queries

CREATE DIRECTED QUERY OPTIMIZER passes an input query to the optimizer, which generates an annotated query from its own query plan.

CREATE DIRECTED QUERY OPTIMIZER passes an input query to the optimizer, which generates an annotated query from its own query plan. It then pairs the input and annotated queries and saves them as a directed query. This directed query can be used to handle other queries that are identical except for the predicate strings on which query results are filtered.

You can use optimized-generated directed queries to capture query plans before you upgrade. Doing so can be especially useful if you detect diminished performance of a given query after the upgrade. In this case, you can use the corresponding directed query to recreate an earlier query plan, and compare its performance to the plan generated by the current optimizer.

Example

The following SQL statements create and activate the directed query findEmployeesCityJobTitle_OPT:


=> CREATE DIRECTED QUERY OPTIMIZER 'findEmployeesCityJobTitle_OPT'
     SELECT employee_first_name, employee_last_name FROM public.employee_dimension
     WHERE employee_city='Boston' and job_title='Cashier' ORDER BY employee_last_name, employee_first_name;
CREATE DIRECTED QUERY

=> ACTIVATE DIRECTED QUERY findEmployeesCityJobTitle_OPT;
ACTIVATE DIRECTED QUERY

After this directed query plan is activated, the optimizer uses it to generate a query plan for all subsequent invocations of this input query, and others like it. You can view the optimizer-generated annotated query by either calling GET DIRECTED QUERY or querying system table DIRECTED_QUERIES:

=> SELECT input_query, annotated_query FROM V_CATALOG.DIRECTED_QUERIES WHERE query_name = 'findEmployeesCityJobTitle_OPT';
-[ RECORD 1 ]---+----------------------------------------------------------------------------
input_query     | SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension
WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/))
ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name
annotated_query | SELECT /*+verbatim*/ employee_dimension.employee_first_name AS employee_first_name, employee_dimension.employee_last_name AS employee_last_name FROM public.employee_dimension AS employee_dimension/*+projs('public.employee_dimension')*/
WHERE (employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)
ORDER BY 2 ASC, 1 ASC

The annotated query includes the following hints:

  • /*+verbatim*/ specifies to execute the annotated query exactly as written and produce a query plan accordingly.
  • /*+projs('public.Emp_Dimension')*/ directs the optimizer to create a query plan that uses the projection public.Emp_Dimension.
  • /*+:v(n)*/ (alias of /*+IGNORECONST(n)*/) is included several times in the annotated and input queries. These hints qualify two constants in the query predicates: Boston and Cashier. Each :v hint has an integer argument n that pairs corresponding constants in the input and annotated query queries: *+:v(1)*/ for Boston, and /*+:v(2)*/ for Cashier. The hints tell the optimizer to disregard these constants when it decides whether to apply this directed query to other input queries that are similar. Thus, ignore constant hints can let you use the same directed query for different input queries.

The following query uses different values for the columns employee_city and job_title, but is otherwise identical to the original input query of directed query EmployeesCityJobTitle_OPT:

=> SELECT employee_first_name, employee_last_name FROM public.employee_dimension
     WHERE employee_city = 'San Francisco' and job_title = 'Branch Manager' ORDER BY employee_last_name, employee_first_name;

If the directed query EmployeesCityJobTitle_OPT is active, the optimizer can use it for this query:

=> EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension WHERE employee_city='San Francisco' AND job_title='Branch Manager' ORDER BY employee_last_name, employee_first_name;
 ...
 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------
 EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension WHERE employee_city='San Francisco' AND job_title='Branch Manager' ORDER BY employee_last_name, employee_first_name;

 The following active directed query(query name: findEmployeesCityJobTitle_OPT) is being executed:
 SELECT /*+verbatim*/  employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension employee_dimension/*+projs('public.employee_dimension')*/
 WHERE ((employee_dimension.employee_city = 'San Francisco'::varchar(13)) AND (employee_dimension.job_title = 'Branch Manager'::varchar(14)))
 ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name

 Access Path:
 +-SORT [Cost: 222, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
 |  Order: employee_dimension.employee_last_name ASC, employee_dimension.employee_first_name ASC
 |  Execute on: All Nodes
 | +---> STORAGE ACCESS for employee_dimension [Cost: 60, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.employee_dimension_super
 | |      Materialize: employee_dimension.employee_first_name, employee_dimension.employee_last_name
 | |      Filter: (employee_dimension.employee_city = 'San Francisco')
 | |      Filter: (employee_dimension.job_title = 'Branch Manager')
 | |      Execute on: All Nodes
 ...

2 - Custom directed queries

CREATE DIRECTED QUERY CUSTOM specifies an annotated query and pairs it to an input query previously saved by SAVE QUERY.

CREATE DIRECTED QUERY CUSTOM specifies an annotated query and pairs it to an input query previously saved by SAVE QUERY. You must issue both statements in the same user session.

For example, you might want a query to use a specific projection:

  1. Specify the query with SAVE QUERY:

    => SAVE QUERY SELECT employee_first_name, employee_last_name FROM employee_dimension
        WHERE employee_city='Boston' AND job_title='Cashier';
    SAVE QUERY
    
  2. Create a custom directed query with CREATE DIRECTED QUERY CUSTOM, which specifies an annotated query and associates it with the saved query. The annotated query includes a /*+projs*/ hint, which instructs the optimizer to use the projection public.emp_dimension_unseg when users call the saved query:

    
    => CREATE DIRECTED QUERY CUSTOM 'findBostonCashiers_CUSTOM'
       SELECT employee_first_name, employee_last_name
       FROM employee_dimension /*+Projs('public.emp_dimension_unseg')*/
       WHERE employee_city='Boston' AND job_title='Cashier';
    CREATE DIRECTED QUERY
    
  3. Activate the directed query:

    => ACTIVATE DIRECTED QUERY findBostonCashiers_CUSTOM;
    ACTIVATE DIRECTED QUERY
    
  4. After activation, the optimizer uses this directed query to generate a query plan for all subsequent invocations of its input query. The following EXPLAIN output verifies the optimizer's use of this directed query and the projection it specifies:

    
    => EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension
       WHERE employee_city='Boston' AND job_title='Cashier';
    
    QUERY PLAN
    ------------------------------
    QUERY PLAN DESCRIPTION:
    ------------------------------
    EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension where employee_city='Boston' AND job_title='Cashier';
    
     The following active directed query(query name: findBostonCashiers_CUSTOM) is being executed:
     SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name
      FROM public.employee_dimension/*+Projs('public.emp_dimension_unseg')*/
      WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6)) AND (employee_dimension.job_title = 'Cashier'::varchar(7)))
    
     Access Path:
     +-STORAGE ACCESS for employee_dimension [Cost: 158, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
     |  Projection: public.emp_dimension_unseg
     |  Materialize: employee_dimension.employee_first_name, employee_dimension.employee_last_name
     |  Filter: (employee_dimension.employee_city = 'Boston')
     |  Filter: (employee_dimension.job_title = 'Cashier')
     |  Execute on: Query Initiator
    

See also

Rewriting Join Queries

3 - Using optimizer-generated and custom directed queries together

You can use the annotated SQL that the optimizer creates as the basis for creating your own custom directed queries.

You can use the annotated SQL that the optimizer creates as the basis for creating your own custom directed queries. This approach can be especially useful in evaluating the plan that the optimizer creates to handle a given query, and testing plan modifications.

For example, you might want to modify how the optimizer implements the following query:

=> SELECT COUNT(customer_name) Total, customer_region Region
    FROM (store_sales s JOIN customer_dimension c ON c.customer_key = s.customer_key)
    JOIN product_dimension p ON s.product_key = p.product_key
    WHERE p.category_description ilike '%Medical%'
      AND p.product_description ilike '%antibiotics%'
      AND c.customer_age <= 30 AND YEAR(s.sales_date)=2017
    GROUP BY customer_region;

When you run EXPLAIN on this query, you discover that the optimizer uses projection customers_proj_age for the customer_dimension table. This projection is sorted on column customer_age. Consequently, the optimizer hash-joins the tables store_sales and customer_dimension on customer_key.

After analyzing customer_dimension table data, you observe that most customers are under 30, so it makes more sense to use projection customer_proj_id for the customer_dimension table, which is sorted on customer_key:

You can create a directed query that encapsulates this change as follows:

  1. Obtain optimizer-generated annotations on the query with EXPLAIN ANNOTATED:

    => \o annotatedQuery
    => EXPLAIN ANNOTATED SELECT COUNT(customer_name) Total, customer_region Region
         FROM (store_sales s JOIN customer_dimension c ON c.customer_key = s.customer_key)
         JOIN product_dimension p ON s.product_key = p.product_key
         WHERE p.category_description ilike '%Medical%'
           AND p.product_description ilike '%antibiotics%'
           AND c.customer_age <= 30 AND YEAR(s.sales_date)=2017
         GROUP BY customer_region;
    => \o
    => \! cat annotatedQuery
    ...
    SELECT /*+syntactic_join,verbatim*/ count(c.customer_name) AS Total, c.customer_region AS Region
     FROM ((public.store_sales AS s/*+projs('public.store_sales_super')*/
        JOIN /*+Distrib(L,B),JType(H)*/ public.customer_dimension AS c/*+projs('public.customers_proj_age')*/
          ON (c.customer_key = s.customer_key))
        JOIN /*+Distrib(L,B),JType(M)*/ public.product_dimension AS p/*+projs('public.product_dimension')*/
          ON (s.product_key = p.product_key))
     WHERE ((date_part('year'::varchar(4), (s.sales_date)::timestamp(0)))::int = 2017)
         AND (c.customer_age <= 30)
         AND ((p.category_description)::varchar(32) ~~* '%Medical%'::varchar(9))
         AND (p.product_description ~~* '%antibiotics%'::varchar(13))
     GROUP BY  /*+GByType(Hash)*/ 2
    (4 rows)
    
  2. Modify the annotated query:

    
    SELECT /*+syntactic_join,verbatim*/ count(c.customer_name) AS Total, c.customer_region AS Region
     FROM ((public.store_sales AS s/*+projs('public.store_sales_super')*/
        JOIN /*+Distrib(L,B),JType(H)*/ public.customer_dimension AS c/*+projs('public.customer_proj_id')*/
          ON (c.customer_key = s.customer_key))
        JOIN /*+Distrib(L,B),JType(H)*/ public.product_dimension AS p/*+projs('public.product_dimension')*/
          ON (s.product_key = p.product_key))
     WHERE ((date_part('year'::varchar(4), (s.sales_date)::timestamp(0)))::int = 2017)
         AND (c.customer_age <= 30)
         AND ((p.category_description)::varchar(32) ~~* '%Medical%'::varchar(9))
         AND (p.product_description ~~* '%antibiotics%'::varchar(13))
     GROUP BY  /*+GByType(Hash)*/ 2
    
  3. Use the modified annotated query to create the desired directed query:

    • Save the desired input query with SAVE QUERY:

      
      => SAVE QUERY SELECT COUNT(customer_name) Total, customer_region Region
          FROM (store_sales s JOIN customer_dimension c ON c.customer_key = s.customer_key)
          JOIN product_dimension p ON s.product_key = p.product_key
          WHERE p.category_description ilike '%Medical%'
            AND p.product_description ilike '%antibiotics%'
            AND c.customer_age <= 30 AND YEAR(s.sales_date)=2017
          GROUP BY customer_region;
      
    • Create a custom directed query that associates the saved input query with the modified annotated query:

      
      => CREATE DIRECTED QUERY CUSTOM 'getCustomersUnder31'
         SELECT /*+syntactic_join,verbatim*/ count(c.customer_name) AS Total, c.customer_region AS Region
       FROM ((public.store_sales AS s/*+projs('public.store_sales_super')*/
          JOIN /*+Distrib(L,B),JType(H)*/ public.customer_dimension AS c/*+projs('public.customer_proj_id')*/
            ON (c.customer_key = s.customer_key))
          JOIN /*+Distrib(L,B),JType(H)*/ public.product_dimension AS p/*+projs('public.product_dimension')*/
            ON (s.product_key = p.product_key))
       WHERE ((date_part('year'::varchar(4), (s.sales_date)::timestamp(0)))::int = 2017)
           AND (c.customer_age <= 30)
           AND ((p.category_description)::varchar(32) ~~* '%Medical%'::varchar(9))
           AND (p.product_description ~~* '%antibiotics%'::varchar(13))
       GROUP BY  /*+GByType(Hash)*/ 2;
      CREATE DIRECTED QUERY
      
  4. Activate this directed query:

    => ACTIVATE DIRECTED QUERY getCustomersUnder31;
    ACTIVATE DIRECTED QUERY
    

When the optimizer processes a query that matches this directed query's input query, it uses the directed query's annotated query to generate a query plan:

=> EXPLAIN SELECT COUNT(customer_name) Total, customer_region Region
     FROM (store_sales s JOIN customer_dimension c ON c.customer_key = s.customer_key)
     JOIN product_dimension p ON s.product_key = p.product_key
     WHERE p.category_description ilike '%Medical%'
       AND p.product_description ilike '%antibiotics%'
       AND c.customer_age <= 30 AND YEAR(s.sales_date)=2017
     GROUP BY customer_region;

 The following active directed query(query name: getCustomersUnder31) is being executed:
...