Rewriting queries
You can use directed queries to change the semantics of a given query—that is, substitute one query for another. This can be especially important when you have little or no control over the content and format of input queries that your Vertica database processes. You can map these queries to directed queries that rewrite the original input for optimal execution.
The following sections describe two use cases:
Rewriting join queries
Many of your input queries join multiple tables. You've determined that in many cases, it would be more efficient to denormalize much of your data in several flattened tables and query those tables directly. You cannot revise the input queries themselves. However, you can use directed queries to redirect join queries to the flattened table data.
For example, the following query aggregates regional sales of wine products by joining three tables in the VMart database:
=> SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
FROM store.store_sales_fact SF
JOIN store.store_dimension SD ON SF.store_key=SD.store_key
JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
WHERE P.product_description ILIKE '%wine%'
GROUP BY ROLLUP (SD.store_region, SD.store_city)
ORDER BY Region,Total DESC;
You can consolidate the joined table data in a single flattened table, and query this table instead. By doing so, you can access the same data faster. You can create the flattened table with the following DDL statements:
=> CREATE TABLE store.store_sales_wide AS SELECT * FROM store.store_sales_fact;
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_name VARCHAR(64)
SET USING (SELECT store_name FROM store.store_dimension WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_city varchar(64)
SET USING (SELECT store_city FROM store.store_dimension WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_state char(2)
SET USING (SELECT store_state char FROM store.store_dimension WHERE store.store_sales_wide.store_key=store.store_dimension.store_key)
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_region varchar(64)
SET USING (SELECT store_region FROM store.store_dimension WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
=> ALTER TABLE store.store_sales_wide ADD column product_description VARCHAR(128)
SET USING (SELECT product_description FROM public.product_dimension
WHERE store_sales_wide.product_key||store_sales_wide.product_version = product_dimension.product_key||product_dimension.product_version);
=> ALTER TABLE store.store_sales_wide ADD COLUMN sku_number char(32)
SET USING (SELECT sku_number char FROM product_dimension
WHERE store_sales_wide.product_key||store_sales_wide.product_version = product_dimension.product_key||product_dimension.product_version);
=> SELECT REFRESH_COLUMNS ('store.store_sales_wide','', 'rebuild');
After creating this table and refreshing its SET USING
columns, you can rewrite the earlier query as follows:
=> SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
FROM store.store_sales_fact SF
JOIN store.store_dimension SD ON SF.store_key=SD.store_key
JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
WHERE P.product_description ILIKE '%wine%'
GROUP BY ROLLUP (SD.store_region, SD.store_city)
ORDER BY Region,Total DESC;
Region | City | Total
-----------+------------------+---------
East | | 1679788
East | Boston | 138494
East | Elizabeth | 138071
East | Sterling Heights | 137719
East | Allentown | 137122
East | New Haven | 117751
East | Lowell | 102670
East | Washington | 84595
East | Charlotte | 83255
East | Waterbury | 81516
East | Erie | 80784
East | Stamford | 59935
East | Hartford | 59843
East | Baltimore | 55873
East | Clarksville | 54117
East | Nashville | 53757
East | Manchester | 53290
East | Columbia | 52799
East | Memphis | 52648
East | Philadelphia | 29711
East | Portsmouth | 29316
East | New York | 27033
East | Cambridge | 26111
East | Alexandria | 23378
MidWest | | 1073224
MidWest | Lansing | 145616
MidWest | Livonia | 129349
--More--
Querying the flattened table is more efficient; however, you still must account for input queries that continue to use the earlier join syntax. You can do so by creating a custom directed query, which redirects these input queries to syntax that targets the flattened table:
-
=> SAVE QUERY SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total FROM store.store_sales_fact SF JOIN store.store_dimension SD ON SF.store_key=SD.store_key JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version WHERE P.product_description ILIKE '%wine%' GROUP BY ROLLUP (SD.store_region, SD.store_city) ORDER BY Region,Total DESC; SAVE QUERY
-
Map the saved query to a directed query with the desired syntax, and activate the directed query:
=> CREATE DIRECTED QUERY CUSTOM 'RegionalSalesWine' SELECT store_region AS Region, store_city AS City, SUM(gross_profit_dollar_amount) AS Total FROM store.store_sales_wide WHERE product_description ILIKE '%wine%' GROUP BY ROLLUP (region, city) ORDER BY Region,Total DESC; CREATE DIRECTED QUERY => ACTIVATE DIRECTED QUERY RegionalSalesWine; ACTIVATE DIRECTED QUERY
When directed query RegionalSalesWine
is active, the query optimizer maps all queries that match the original input format to the directed query, as shown in the following query plan:
=> EXPLAIN SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
FROM store.store_sales_fact SF
JOIN store.store_dimension SD ON SF.store_key=SD.store_key
JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
WHERE P.product_description ILIKE '%wine%'
GROUP BY ROLLUP (SD.store_region, SD.store_city)
ORDER BY Region,Total DESC;
...
The following active directed query(query name: RegionalSalesWine) is being executed:
SELECT store_sales_wide.store_region AS Region, store_sales_wide.store_city AS City, sum(store_sales_wide.gross_profit_dollar_amount) AS Total
FROM store.store_sales_wide WHERE (store_sales_wide.product_description ~~* '%wine%'::varchar(6))
GROUP BY GROUPING SETS((store_sales_wide.store_region, store_sales_wide.store_city), (store_sales_wide.store_region),())
ORDER BY store_sales_wide.store_region, sum(store_sales_wide.gross_profit_dollar_amount) DESC
Access Path:
+-SORT [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
| Order: store_sales_wide.store_region ASC, sum(store_sales_wide.gross_profit_dollar_amount) DESC
| Execute on: All Nodes
| +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| | Aggregates: sum(store_sales_wide.gross_profit_dollar_amount)
| | Group By: store_sales_wide.store_region, store_sales_wide.store_city
| | Grouping Sets: (store_sales_wide.store_region, store_sales_wide.store_city, <SVAR>), (store_sales_wide.store_region, <SVAR>), (<SVAR>)
| | Execute on: All Nodes
| | +---> STORAGE ACCESS for store_sales_wide [Cost: 864, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
| | | Projection: store.store_sales_wide_b0
| | | Materialize: store_sales_wide.gross_profit_dollar_amount, store_sales_wide.store_city, store_sales_wide.store_region
| | | Filter: (store_sales_wide.product_description ~~* '%wine%')
| | | Execute on: All Nodes
To compare the costs of executing the directed query and executing the original input query, deactivate the directed query and use EXPLAIN on the original input query. The optimizer reverts to creating a plan for the input query that incurs significantly greater cost—188K versus 2K:
=> DEACTIVATE DIRECTED QUERY RegionalSalesWine;
DEACTIVATE DIRECTED QUERY
=> EXPLAIN SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
FROM store.store_sales_fact SF
JOIN store.store_dimension SD ON SF.store_key=SD.store_key
JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
WHERE P.product_description ILIKE '%wine%'
GROUP BY ROLLUP (SD.store_region, SD.store_city)
ORDER BY Region,Total DESC;
...
Access Path:
+-SORT [Cost: 188K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
| Order: SD.store_region ASC, sum(SF.gross_profit_dollar_amount) DESC
| Execute on: All Nodes
| +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 188K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| | Aggregates: sum(SF.gross_profit_dollar_amount)
| | Group By: SD.store_region, SD.store_city
| | Grouping Sets: (SD.store_region, SD.store_city, <SVAR>), (SD.store_region, <SVAR>), (<SVAR>)
| | Execute on: All Nodes
| | +---> JOIN HASH [Cost: 12K, Rows: 5M (NO STATISTICS)] (PATH ID: 3) Inner (BROADCAST)
| | | Join Cond: (concat((SF.product_key)::varchar, (SF.product_version)::varchar) = concat((P.product_key)::varchar, (P.product_version)::varchar))
| | | Materialize at Input: SF.product_key, SF.product_version
| | | Materialize at Output: SF.gross_profit_dollar_amount
| | | Execute on: All Nodes
| | | +-- Outer -> JOIN HASH [Cost: 2K, Rows: 5M (NO STATISTICS)] (PATH ID: 4) Inner (BROADCAST)
| | | | Join Cond: (SF.store_key = SD.store_key)
| | | | Execute on: All Nodes
| | | | +-- Outer -> STORAGE ACCESS for SF [Cost: 1K, Rows: 5M (NO STATISTICS)] (PATH ID: 5)
| | | | | Projection: store.store_sales_fact_super
| | | | | Materialize: SF.store_key
| | | | | Execute on: All Nodes
| | | | | Runtime Filters: (SIP2(HashJoin): SF.store_key), (SIP1(HashJoin): concat((SF.product_key)::varchar, (SF.product_version)::varchar))
| | | | +-- Inner -> STORAGE ACCESS for SD [Cost: 13, Rows: 250 (NO STATISTICS)] (PATH ID: 6)
| | | | | Projection: store.store_dimension_super
| | | | | Materialize: SD.store_key, SD.store_city, SD.store_region
| | | | | Execute on: All Nodes
| | | +-- Inner -> STORAGE ACCESS for P [Cost: 201, Rows: 60K (NO STATISTICS)] (PATH ID: 7)
| | | | Projection: public.product_dimension_super
| | | | Materialize: P.product_key, P.product_version
| | | | Filter: (P.product_description ~~* '%wine%')
| | | | Execute on: All Nodes
Creating query templates
You can use directed queries to implement multiple queries that are identical except for the predicate strings on which query results are filtered. For example, directed query RegionalSalesWine
only handles input queries that filter on product_description
values that contain the string wine
. You can create a modified version of this directed query that matches the syntax of multiple input queries, which differ only in their input values—for example, tuna
.
Create this query template in the following steps:
-
Create two optimizer-generated directed queries:
-
From the original query on the joined tables:
=> CREATE DIRECTED QUERY OPTIMIZER RegionalSalesProducts_JoinTables SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total FROM store.store_sales_fact SF JOIN store.store_dimension SD ON SF.store_key=SD.store_key JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version WHERE P.product_description ILIKE '%wine%' GROUP BY ROLLUP (SD.store_region, SD.store_city) ORDER BY Region,Total DESC; CREATE DIRECTED QUERY
-
From the query on the flattened table:
=> CREATE DIRECTED QUERY OPTIMIZER RegionalSalesProduct SELECT store_region AS Region, store_city AS City, SUM(gross_profit_dollar_amount) AS Total FROM store.store_sales_wide WHERE product_description ILIKE '%wine%' GROUP BY ROLLUP (region, city) ORDER BY Region,Total DESC; CREATE DIRECTED QUERY
-
-
Query system table DIRECTED_QUERIES and copy the input query for directed query
RegionalSalesProducts_JoinTables
:SELECT input_query FROM directed_queries WHERE query_name = 'RegionalSalesProducts_JoinTables';
-
Use the copied input query with SAVE QUERY:
SAVE QUERY SELECT SD.store_region AS Region, SD.store_city AS City, sum(SF.gross_profit_dollar_amount) AS Total FROM ((store.store_sales_fact SF JOIN store.store_dimension SD ON ((SF.store_key = SD.store_key))) JOIN public.product_dimension P ON ((concat((SF.product_key)::varchar, (SF.product_version)::varchar) = concat((P.product_key)::varchar, (P.product_version)::varchar)))) WHERE (P.product_description ~~* '%wine%'::varchar(6) /*+:v(1)*/) GROUP BY GROUPING SETS((SD.store_region, SD.store_city), (SD.store_region), ()) ORDER BY SD.store_region, sum(SF.gross_profit_dollar_amount) DESC (1 row)
-
Query system table DIRECTED_QUERIES and copy the annotated query for directed query
RegionalSalesProducts_FlatTables
:SELECT input_query FROM directed_queries WHERE query_name = 'RegionalSalesProducts_JoinTables';
-
Use the copied annotated query to create a custom directed query:
=> CREATE DIRECTED QUERY CUSTOM RegionalSalesProduct SELECT /*+verbatim*/ store_sales_wide.store_region AS Region, store_sales_wide.store_city AS City, sum(store_sales_wide.gross_profit_dollar_amount) AS Total FROM store.store_sales_wide AS store_sales_wide/*+projs('store.store_sales_wide')*/ WHERE (store_sales_wide.product_description ~~* '%wine%'::varchar(6) /*+:v(1)*/) GROUP BY /*+GByType(Hash)*/ GROUPING SETS((1, 2), (1), ()) ORDER BY 1 ASC, 3 DESC; CREATE DIRECTED QUERY
-
Activate the directed query:
ACTIVATE DIRECTED QUERY RegionalSalesProduct;
After activating this directed query, Vertica can use it for input queries that match the template, differing only in the predicate value for product_description
:
=> EXPLAIN SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
FROM store.store_sales_fact SF
JOIN store.store_dimension SD ON SF.store_key=SD.store_key
JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
WHERE P.product_description ILIKE '%tuna%'
GROUP BY ROLLUP (SD.store_region, SD.store_city)
ORDER BY Region,Total DESC;
...
The following active directed query(query name: RegionalSalesProduct) is being executed:
SELECT /*+verbatim*/ store_sales_wide.store_region AS Region, store_sales_wide.store_city AS City, sum(store_sales_wide.gross_profit_dollar_amount) AS Total
FROM store.store_sales_wide store_sales_wide/*+projs('store.store_sales_wide')*/
WHERE (store_sales_wide.product_description ~~* '%tuna%'::varchar(6))
GROUP BY /*+GByType(Hash)*/ GROUPING SETS((store_sales_wide.store_region, store_sales_wide.store_city), (store_sales_wide.store_region), ())
ORDER BY store_sales_wide.store_region, sum(store_sales_wide.gross_profit_dollar_amount) DESC
Access Path:
+-SORT [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
| Order: store_sales_wide.store_region ASC, sum(store_sales_wide.gross_profit_dollar_amount) DESC
| Execute on: All Nodes
| +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| | Aggregates: sum(store_sales_wide.gross_profit_dollar_amount)
| | Group By: store_sales_wide.store_region, store_sales_wide.store_city
| | Grouping Sets: (store_sales_wide.store_region, store_sales_wide.store_city, <SVAR>), (store_sales_wide.store_region, <SVAR>), (<SVAR>)
| | Execute on: All Nodes
| | +---> STORAGE ACCESS for store_sales_wide [Cost: 864, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
| | | Projection: store.store_sales_wide_b0
| | | Materialize: store_sales_wide.gross_profit_dollar_amount, store_sales_wide.store_city, store_sales_wide.store_region
| | | Filter: (store_sales_wide.product_description ~~* '%tuna%')
| | | Execute on: All Nodes
When you execute this query, it returns with the following results:
Region | City | Total
-----------+------------------+---------
East | | 1564582
East | Elizabeth | 131328
East | Allentown | 129704
East | Boston | 128188
East | Sterling Heights | 125877
East | Lowell | 112133
East | New Haven | 101161
East | Waterbury | 85401
East | Washington | 76127
East | Erie | 73002
East | Charlotte | 67850
East | Memphis | 53650
East | Clarksville | 53416
East | Hartford | 52583
East | Columbia | 51950
East | Nashville | 50031
East | Manchester | 48607
East | Baltimore | 48108
East | Stamford | 47302
East | New York | 30840
East | Portsmouth | 26485
East | Alexandria | 26391
East | Philadelphia | 23092
East | Cambridge | 21356
MidWest | | 980209
MidWest | Lansing | 130044
MidWest | Livonia | 118740
--More--