This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Creating directed queries
CREATE DIRECTED QUERY associates an input query with a query annotated with optimizer hints.
CREATE DIRECTED QUERY associates an input query with a query annotated with optimizer hints. It stores the association under a unique identifier.
CREATE DIRECTED QUERY has two variants:
- CREATE DIRECTED QUERY OPTIMIZER directs the query optimizer to generate annotated SQL from the specified input query. The annotated query contains hints that the optimizer can use to recreate its current query plan for that input query.
- CREATE DIRECTED QUERY CUSTOM specifies an annotated query supplied by the user. Vertica associates the annotated query with the input query specified by the last SAVE QUERY statement.
In both cases, Vertica associates the annotated query and input query, and registers their association in the system table DIRECTED_QUERIES under query_name
.
The two approaches can be used together: you can use the annotated SQL that the optimizer creates as the basis for creating your own (custom) directed queries.
1 - Optimizer-generated directed queries
CREATE DIRECTED QUERY OPTIMIZER passes an input query to the optimizer, which generates an annotated query from its own query plan.
CREATE DIRECTED QUERY OPTIMIZER passes an input query to the optimizer, which generates an annotated query from its own query plan. It then pairs the input and annotated queries and saves them as a directed query. This directed query can be used to handle other queries that are identical except for the predicate strings on which query results are filtered.
You can use optimizer-generated directed queries to capture query plans before you upgrade. Doing so can be especially useful if you detect diminished performance of a given query after the upgrade. In this case, you can use the corresponding directed query to recreate an earlier query plan, and compare its performance to the plan generated by the current optimizer.
You can also create multiple optimizer-generated directed queries from the most frequently executed queries, by invoking the meta-function SAVE_PLANS. For details, see Bulk-Creation of Directed Queries.
Example
The following SQL statements create and activate the directed query findEmployeesCityJobTitle_OPT
:
=> CREATE DIRECTED QUERY OPTIMIZER 'findEmployeesCityJobTitle_OPT'
SELECT employee_first_name, employee_last_name FROM public.employee_dimension
WHERE employee_city='Boston' and job_title='Cashier' ORDER BY employee_last_name, employee_first_name;
CREATE DIRECTED QUERY
=> ACTIVATE DIRECTED QUERY findEmployeesCityJobTitle_OPT;
ACTIVATE DIRECTED QUERY
After this directed query plan is activated, the optimizer uses it to generate a query plan for all subsequent invocations of this input query, and others like it. You can view the optimizer-generated annotated query by calling GET DIRECTED QUERY or querying system table DIRECTED_QUERIES:
=> SELECT input_query, annotated_query FROM V_CATALOG.DIRECTED_QUERIES
WHERE query_name = 'findEmployeesCityJobTitle_OPT';
-[ RECORD 1 ]---+----------------------------------------------------------------------------
input_query | SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension
WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/))
ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name
annotated_query | SELECT /*+verbatim*/ employee_dimension.employee_first_name AS employee_first_name, employee_dimension.employee_last_name AS employee_last_name FROM public.employee_dimension AS employee_dimension/*+projs('public.employee_dimension')*/
WHERE (employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)
ORDER BY 2 ASC, 1 ASC
In this case, the annotated query includes the following hints:
-
/*+verbatim*/
specifies to execute the annotated query exactly as written and produce a query plan accordingly.
-
/*+projs('public.Emp_Dimension')*/
directs the optimizer to create a query plan that uses the projection public.Emp_Dimension
.
/*+:v(
n
)*/
(alias of /*+IGNORECONST(
n
)*/
) is included several times in the annotated and input queries. These hints qualify two constants in the query predicates: Boston
and Cashier
. Each :v
hint has an integer argument n
that pairs corresponding constants in the input and annotated query queries: *+:v(1)*/
for Boston
, and /*+:v(2)*/
for Cashier
. The hints tell the optimizer to disregard these constants when it decides whether to apply this directed query to other input queries that are similar. Thus, ignore constant hints can let you use the same directed query for different input queries.
The following query uses different values for the columns employee_city
and job_title
, but is otherwise identical to the original input query of directed query EmployeesCityJobTitle_OPT
:
=> SELECT employee_first_name, employee_last_name FROM public.employee_dimension
WHERE employee_city = 'San Francisco' and job_title = 'Branch Manager' ORDER BY employee_last_name, employee_first_name;
If the directed query EmployeesCityJobTitle_OPT
is active, the optimizer can use it for this query:
=> EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension
WHERE employee_city='San Francisco' AND job_title='Branch Manager' ORDER BY employee_last_name, employee_first_name;
...
------------------------------
QUERY PLAN DESCRIPTION:
------------------------------
EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension WHERE employee_city='San Francisco' AND job_title='Branch Manager' ORDER BY employee_last_name, employee_first_name;
The following active directed query(query name: findEmployeesCityJobTitle_OPT) is being executed:
SELECT /*+verbatim*/ employee_dimension.employee_first_name, employee_dimension.employee_last_name
FROM public.employee_dimension employee_dimension/*+projs('public.employee_dimension')*/
WHERE ((employee_dimension.employee_city = 'San Francisco'::varchar(13)) AND (employee_dimension.job_title = 'Branch Manager'::varchar(14)))
ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name
Access Path:
+-SORT [Cost: 222, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
| Order: employee_dimension.employee_last_name ASC, employee_dimension.employee_first_name ASC
| Execute on: All Nodes
| +---> STORAGE ACCESS for employee_dimension [Cost: 60, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| | Projection: public.employee_dimension_super
| | Materialize: employee_dimension.employee_first_name, employee_dimension.employee_last_name
| | Filter: (employee_dimension.employee_city = 'San Francisco')
| | Filter: (employee_dimension.job_title = 'Branch Manager')
| | Execute on: All Nodes
...
Bulk-creation of directed queries
The meta-function SAVE_PLANS lets you create multiple optimizer-generated directed queries from the most frequently executed queries. SAVE_PLANS works as follows:
-
Iterates over all queries in the data collector table dc_requests_issued
and selects the most-frequently requested queries, up to the maximum specified by its query-budget
argument. If the meta-function's since-date
argument is also set, then SAVE_PLANS iterates only over queries that were issued on or after the specified date.
As SAVE_PLANS iterates over dc_requests_issued
, it tests queries against various restrictions. In general, directed queries support only SELECT statements as input. Within this broad requirement, input queries are subject to other restrictions.
-
Calls CREATE DIRECTED QUERY OPTIMIZER on all qualifying input queries, which creates a directed query for each unique input query as described above.
-
Saves metadata on the new set of directed queries to system table DIRECTED_QUERIES, where all directed queries of that set share the same SAVE_PLANS_VERSION
integer. This integer is computed from the highest SAVE_PLANS_VERSION
+ 1.
You can later use SAVE_PLANS_VERSION
identifiers to bulk activate, deactivate, and drop directed queries. For example:
=> SELECT save_plans (40);
save_plans
-------------------------------------------------------------------------------------------------------------
9 directed query supported queries out of 40 most frequently run queries were saved under the save_plans_version 3.
To view the saved queries, run:
SELECT * FROM directed_queries WHERE save_plans_version = '3';
To drop the saved queries, run:
DROP DIRECTED QUERY WHERE save_plans_version = '3';
(1 row)
=> SELECT input_query::VARCHAR(60) FROM directed_queries WHERE save_plans_version = 3 AND input_query ILIKE '%line_search%';
input_query
--------------------------------------------------------------
SELECT public.line_search_logistic2(udtf1.deviance, udtf1.G
SELECT public.line_search_logistic2(udtf1.deviance, udtf1.G
(2 rows)
=> ACTIVATE DIRECTED QUERY WHERE save_plans_version = 3 AND input_query ILIKE '%line_search%';
ACTIVATE DIRECTED QUERY
=> SELECT query_name, input_query::VARCHAR(60), is_active FROM directed_queries WHERE save_plans_version = 3 AND input_query ILIKE '%line_search%';
query_name | input_query | is_active
------------------------+--------------------------------------------------------------+-----------
save_plans_nolabel_3_3 | SELECT public.line_search_logistic2(udtf1.deviance, udtf1.G | t
save_plans_nolabel_6_3 | SELECT public.line_search_logistic2(udtf1.deviance, udtf1.G | t
(2 rows)
Note
query_name
values are concatenated from the following strings:
save_plans_query-label_query-number_save-plans-version
where:
query-label
is a LABEL hint embedded in the input query associated with this directed query. If theinput query contains no label, then this string is set to nolabel
.
query-number
is an integer in a continuous sequence between 0 and budget-query
, which uniquely identifies this directed query from others in the same SAVE_PLANS-generated set.
- [save-plans-version](/en/sql-reference/system-tables/v-catalog-schema/directed-queries/#SAVE_PLANS_VERSION) identifies the set of directed queries to which this directed query belongs.
2 - Custom directed queries
CREATE DIRECTED QUERY CUSTOM specifies an annotated query and pairs it to an input query previously saved by SAVE QUERY.
CREATE DIRECTED QUERY CUSTOM specifies an annotated query and pairs it to an input query previously saved by SAVE QUERY. You must issue both statements in the same user session.
For example, you might want a query to use a specific projection:
-
Specify the query with SAVE QUERY:
=> SAVE QUERY SELECT employee_first_name, employee_last_name FROM employee_dimension
WHERE employee_city='Boston' AND job_title='Cashier';
SAVE QUERY
Note
The input query that you supply to SAVE QUERY only supports the
:v hint.
-
Create a custom directed query with CREATE DIRECTED QUERY CUSTOM, which specifies an annotated query and associates it with the saved query. The annotated query includes a /*+projs*/
hint, which instructs the optimizer to use the projection public.emp_dimension_unseg
when users call the saved query:
=> CREATE DIRECTED QUERY CUSTOM 'findBostonCashiers_CUSTOM'
SELECT employee_first_name, employee_last_name
FROM employee_dimension /*+Projs('public.emp_dimension_unseg')*/
WHERE employee_city='Boston' AND job_title='Cashier';
CREATE DIRECTED QUERY
Caution
Vertica associates a saved query and annotated query without checking whether the input and annotated queries are compatible. Be careful to sequence SAVE QUERY and CREATE DIRECTED QUERY CUSTOM statements so the saved and directed queries are correctly matched.
-
Activate the directed query:
=> ACTIVATE DIRECTED QUERY findBostonCashiers_CUSTOM;
ACTIVATE DIRECTED QUERY
-
After activation, the optimizer uses this directed query to generate a query plan for all subsequent invocations of its input query. The following EXPLAIN output verifies the optimizer's use of this directed query and the projection it specifies:
=> EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension
WHERE employee_city='Boston' AND job_title='Cashier';
QUERY PLAN
------------------------------
QUERY PLAN DESCRIPTION:
------------------------------
EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension where employee_city='Boston' AND job_title='Cashier';
The following active directed query(query name: findBostonCashiers_CUSTOM) is being executed:
SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name
FROM public.employee_dimension/*+Projs('public.emp_dimension_unseg')*/
WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6)) AND (employee_dimension.job_title = 'Cashier'::varchar(7)))
Access Path:
+-STORAGE ACCESS for employee_dimension [Cost: 158, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
| Projection: public.emp_dimension_unseg
| Materialize: employee_dimension.employee_first_name, employee_dimension.employee_last_name
| Filter: (employee_dimension.employee_city = 'Boston')
| Filter: (employee_dimension.job_title = 'Cashier')
| Execute on: Query Initiator
See also
Rewriting Join Queries
3 - Using optimizer-generated and custom directed queries together
You can use the annotated SQL that the optimizer creates as the basis for creating your own custom directed queries.
You can use the annotated SQL that the optimizer creates as the basis for creating your own custom directed queries. This approach can be especially useful in evaluating the plan that the optimizer creates to handle a given query, and testing plan modifications.
For example, you might want to modify how the optimizer implements the following query:
=> SELECT COUNT(customer_name) Total, customer_region Region
FROM (store_sales s JOIN customer_dimension c ON c.customer_key = s.customer_key)
JOIN product_dimension p ON s.product_key = p.product_key
WHERE p.category_description ilike '%Medical%'
AND p.product_description ilike '%antibiotics%'
AND c.customer_age <= 30 AND YEAR(s.sales_date)=2017
GROUP BY customer_region;
When you run EXPLAIN on this query, you discover that the optimizer uses projection customers_proj_age
for the customer_dimension
table. This projection is sorted on column customer_age
. Consequently, the optimizer hash-joins the tables store_sales
and customer_dimension
on customer_key
.
After analyzing customer_dimension
table data, you observe that most customers are under 30, so it makes more sense to use projection customer_proj_id
for the customer_dimension
table, which is sorted on customer_key
:
You can create a directed query that encapsulates this change as follows:
-
Obtain optimizer-generated annotations on the query with EXPLAIN ANNOTATED:
=> \o annotatedQuery
=> EXPLAIN ANNOTATED SELECT COUNT(customer_name) Total, customer_region Region
FROM (store_sales s JOIN customer_dimension c ON c.customer_key = s.customer_key)
JOIN product_dimension p ON s.product_key = p.product_key
WHERE p.category_description ilike '%Medical%'
AND p.product_description ilike '%antibiotics%'
AND c.customer_age <= 30 AND YEAR(s.sales_date)=2017
GROUP BY customer_region;
=> \o
=> \! cat annotatedQuery
...
SELECT /*+syntactic_join,verbatim*/ count(c.customer_name) AS Total, c.customer_region AS Region
FROM ((public.store_sales AS s/*+projs('public.store_sales_super')*/
JOIN /*+Distrib(L,B),JType(H)*/ public.customer_dimension AS c/*+projs('public.customers_proj_age')*/
ON (c.customer_key = s.customer_key))
JOIN /*+Distrib(L,B),JType(M)*/ public.product_dimension AS p/*+projs('public.product_dimension')*/
ON (s.product_key = p.product_key))
WHERE ((date_part('year'::varchar(4), (s.sales_date)::timestamp(0)))::int = 2017)
AND (c.customer_age <= 30)
AND ((p.category_description)::varchar(32) ~~* '%Medical%'::varchar(9))
AND (p.product_description ~~* '%antibiotics%'::varchar(13))
GROUP BY /*+GByType(Hash)*/ 2
(4 rows)
-
Modify the annotated query:
SELECT /*+syntactic_join,verbatim*/ count(c.customer_name) AS Total, c.customer_region AS Region
FROM ((public.store_sales AS s/*+projs('public.store_sales_super')*/
JOIN /*+Distrib(L,B),JType(H)*/ public.customer_dimension AS c/*+projs('public.customer_proj_id')*/
ON (c.customer_key = s.customer_key))
JOIN /*+Distrib(L,B),JType(H)*/ public.product_dimension AS p/*+projs('public.product_dimension')*/
ON (s.product_key = p.product_key))
WHERE ((date_part('year'::varchar(4), (s.sales_date)::timestamp(0)))::int = 2017)
AND (c.customer_age <= 30)
AND ((p.category_description)::varchar(32) ~~* '%Medical%'::varchar(9))
AND (p.product_description ~~* '%antibiotics%'::varchar(13))
GROUP BY /*+GByType(Hash)*/ 2
-
Use the modified annotated query to create the desired directed query:
-
Save the desired input query with SAVE QUERY:
=> SAVE QUERY SELECT COUNT(customer_name) Total, customer_region Region
FROM (store_sales s JOIN customer_dimension c ON c.customer_key = s.customer_key)
JOIN product_dimension p ON s.product_key = p.product_key
WHERE p.category_description ilike '%Medical%'
AND p.product_description ilike '%antibiotics%'
AND c.customer_age <= 30 AND YEAR(s.sales_date)=2017
GROUP BY customer_region;
-
Create a custom directed query that associates the saved input query with the modified annotated query:
=> CREATE DIRECTED QUERY CUSTOM 'getCustomersUnder31'
SELECT /*+syntactic_join,verbatim*/ count(c.customer_name) AS Total, c.customer_region AS Region
FROM ((public.store_sales AS s/*+projs('public.store_sales_super')*/
JOIN /*+Distrib(L,B),JType(H)*/ public.customer_dimension AS c/*+projs('public.customer_proj_id')*/
ON (c.customer_key = s.customer_key))
JOIN /*+Distrib(L,B),JType(H)*/ public.product_dimension AS p/*+projs('public.product_dimension')*/
ON (s.product_key = p.product_key))
WHERE ((date_part('year'::varchar(4), (s.sales_date)::timestamp(0)))::int = 2017)
AND (c.customer_age <= 30)
AND ((p.category_description)::varchar(32) ~~* '%Medical%'::varchar(9))
AND (p.product_description ~~* '%antibiotics%'::varchar(13))
GROUP BY /*+GByType(Hash)*/ 2;
CREATE DIRECTED QUERY
-
Activate this directed query:
=> ACTIVATE DIRECTED QUERY getCustomersUnder31;
ACTIVATE DIRECTED QUERY
When the optimizer processes a query that matches this directed query's input query, it uses the directed query's annotated query to generate a query plan:
=> EXPLAIN SELECT COUNT(customer_name) Total, customer_region Region
FROM (store_sales s JOIN customer_dimension c ON c.customer_key = s.customer_key)
JOIN product_dimension p ON s.product_key = p.product_key
WHERE p.category_description ilike '%Medical%'
AND p.product_description ilike '%antibiotics%'
AND c.customer_age <= 30 AND YEAR(s.sales_date)=2017
GROUP BY customer_region;
The following active directed query(query name: getCustomersUnder31) is being executed:
...