重写查询
您可以使用定向查询来更改给定查询的语义,即用一个查询替换另一个查询。当您对 Vertica 数据库处理的输入查询的内容和格式没有或几乎没有控制权时,这一点尤其重要。您可以将这些查询映射到定向查询,从而重写原始输入以实现最佳执行效果。
以下部分介绍了两个用例:
重写联接查询
许多输入查询联接多个表。您已经确定,在许多情况下,更高效的做法是对多个扁平表中的大量数据进行反向标准化并直接查询这些表。您不能修改输入查询本身。但是,您可以使用定向查询将联接查询重定向到扁平表数据。
例如,以下查询通过联接 VMart 数据库中的三个表来聚合葡萄酒产品的区域销售额:
=> SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
FROM store.store_sales_fact SF
JOIN store.store_dimension SD ON SF.store_key=SD.store_key
JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
WHERE P.product_description ILIKE '%wine%'
GROUP BY ROLLUP (SD.store_region, SD.store_city)
ORDER BY Region,Total DESC;
您可以将联接的表数据合并到一个扁平表中,然后改为查询该表。通过这样做,您可以更快地访问同一数据。您可以使用以下 DDL 语句创建扁平表:
=> CREATE TABLE store.store_sales_wide AS SELECT * FROM store.store_sales_fact;
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_name VARCHAR(64)
SET USING (SELECT store_name FROM store.store_dimension WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_city varchar(64)
SET USING (SELECT store_city FROM store.store_dimension WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_state char(2)
SET USING (SELECT store_state char FROM store.store_dimension WHERE store.store_sales_wide.store_key=store.store_dimension.store_key)
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_region varchar(64)
SET USING (SELECT store_region FROM store.store_dimension WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
=> ALTER TABLE store.store_sales_wide ADD column product_description VARCHAR(128)
SET USING (SELECT product_description FROM public.product_dimension
WHERE store_sales_wide.product_key||store_sales_wide.product_version = product_dimension.product_key||product_dimension.product_version);
=> ALTER TABLE store.store_sales_wide ADD COLUMN sku_number char(32)
SET USING (SELECT sku_number char FROM product_dimension
WHERE store_sales_wide.product_key||store_sales_wide.product_version = product_dimension.product_key||product_dimension.product_version);
=> SELECT REFRESH_COLUMNS ('store.store_sales_wide','', 'rebuild');
创建此表并刷新其 SET USING
列后,可以重写前面的查询,如下所示:
=> SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
FROM store.store_sales_fact SF
JOIN store.store_dimension SD ON SF.store_key=SD.store_key
JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
WHERE P.product_description ILIKE '%wine%'
GROUP BY ROLLUP (SD.store_region, SD.store_city)
ORDER BY Region,Total DESC;
Region | City | Total
-----------+------------------+---------
East | | 1679788
East | Boston | 138494
East | Elizabeth | 138071
East | Sterling Heights | 137719
East | Allentown | 137122
East | New Haven | 117751
East | Lowell | 102670
East | Washington | 84595
East | Charlotte | 83255
East | Waterbury | 81516
East | Erie | 80784
East | Stamford | 59935
East | Hartford | 59843
East | Baltimore | 55873
East | Clarksville | 54117
East | Nashville | 53757
East | Manchester | 53290
East | Columbia | 52799
East | Memphis | 52648
East | Philadelphia | 29711
East | Portsmouth | 29316
East | New York | 27033
East | Cambridge | 26111
East | Alexandria | 23378
MidWest | | 1073224
MidWest | Lansing | 145616
MidWest | Livonia | 129349
--More--
查询扁平表更高效;但是,您仍然必须考虑继续使用早期联接语法的输入查询。这可以通过创建自定义定向查询来完成,这会将这些输入查询重定向到以扁平表为目标的语法:
-
=> SAVE QUERY SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total FROM store.store_sales_fact SF JOIN store.store_dimension SD ON SF.store_key=SD.store_key JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version WHERE P.product_description ILIKE '%wine%' GROUP BY ROLLUP (SD.store_region, SD.store_city) ORDER BY Region,Total DESC; SAVE QUERY
-
将保存的查询映射到具有所需语法的定向查询,然后激活定向查询:
=> CREATE DIRECTED QUERY CUSTOM 'RegionalSalesWine' SELECT store_region AS Region, store_city AS City, SUM(gross_profit_dollar_amount) AS Total FROM store.store_sales_wide WHERE product_description ILIKE '%wine%' GROUP BY ROLLUP (region, city) ORDER BY Region,Total DESC; CREATE DIRECTED QUERY => ACTIVATE DIRECTED QUERY RegionalSalesWine; ACTIVATE DIRECTED QUERY
当定向查询 RegionalSalesWine
处于活动状态时,查询优化器会将所有与原始输入格式相匹配的查询映射到定向查询,如下面的查询计划所示:
=> EXPLAIN SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
FROM store.store_sales_fact SF
JOIN store.store_dimension SD ON SF.store_key=SD.store_key
JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
WHERE P.product_description ILIKE '%wine%'
GROUP BY ROLLUP (SD.store_region, SD.store_city)
ORDER BY Region,Total DESC;
...
The following active directed query(query name: RegionalSalesWine) is being executed:
SELECT store_sales_wide.store_region AS Region, store_sales_wide.store_city AS City, sum(store_sales_wide.gross_profit_dollar_amount) AS Total
FROM store.store_sales_wide WHERE (store_sales_wide.product_description ~~* '%wine%'::varchar(6))
GROUP BY GROUPING SETS((store_sales_wide.store_region, store_sales_wide.store_city), (store_sales_wide.store_region),())
ORDER BY store_sales_wide.store_region, sum(store_sales_wide.gross_profit_dollar_amount) DESC
Access Path:
+-SORT [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
| Order: store_sales_wide.store_region ASC, sum(store_sales_wide.gross_profit_dollar_amount) DESC
| Execute on: All Nodes
| +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| | Aggregates: sum(store_sales_wide.gross_profit_dollar_amount)
| | Group By: store_sales_wide.store_region, store_sales_wide.store_city
| | Grouping Sets: (store_sales_wide.store_region, store_sales_wide.store_city, <SVAR>), (store_sales_wide.store_region, <SVAR>), (<SVAR>)
| | Execute on: All Nodes
| | +---> STORAGE ACCESS for store_sales_wide [Cost: 864, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
| | | Projection: store.store_sales_wide_b0
| | | Materialize: store_sales_wide.gross_profit_dollar_amount, store_sales_wide.store_city, store_sales_wide.store_region
| | | Filter: (store_sales_wide.product_description ~~* '%wine%')
| | | Execute on: All Nodes
要对定向查询和原始输入查询的执行成本进行比较,请停用定向查询并对原始输入查询使用 EXPLAIN。优化器恢复到为输入查询创建计划,这会产生高得多的成本 - 188K 相对于 2K:
=> DEACTIVATE DIRECTED QUERY RegionalSalesWine;
DEACTIVATE DIRECTED QUERY
=> EXPLAIN SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
FROM store.store_sales_fact SF
JOIN store.store_dimension SD ON SF.store_key=SD.store_key
JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
WHERE P.product_description ILIKE '%wine%'
GROUP BY ROLLUP (SD.store_region, SD.store_city)
ORDER BY Region,Total DESC;
...
Access Path:
+-SORT [Cost: 188K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
| Order: SD.store_region ASC, sum(SF.gross_profit_dollar_amount) DESC
| Execute on: All Nodes
| +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 188K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| | Aggregates: sum(SF.gross_profit_dollar_amount)
| | Group By: SD.store_region, SD.store_city
| | Grouping Sets: (SD.store_region, SD.store_city, <SVAR>), (SD.store_region, <SVAR>), (<SVAR>)
| | Execute on: All Nodes
| | +---> JOIN HASH [Cost: 12K, Rows: 5M (NO STATISTICS)] (PATH ID: 3) Inner (BROADCAST)
| | | Join Cond: (concat((SF.product_key)::varchar, (SF.product_version)::varchar) = concat((P.product_key)::varchar, (P.product_version)::varchar))
| | | Materialize at Input: SF.product_key, SF.product_version
| | | Materialize at Output: SF.gross_profit_dollar_amount
| | | Execute on: All Nodes
| | | +-- Outer -> JOIN HASH [Cost: 2K, Rows: 5M (NO STATISTICS)] (PATH ID: 4) Inner (BROADCAST)
| | | | Join Cond: (SF.store_key = SD.store_key)
| | | | Execute on: All Nodes
| | | | +-- Outer -> STORAGE ACCESS for SF [Cost: 1K, Rows: 5M (NO STATISTICS)] (PATH ID: 5)
| | | | | Projection: store.store_sales_fact_super
| | | | | Materialize: SF.store_key
| | | | | Execute on: All Nodes
| | | | | Runtime Filters: (SIP2(HashJoin): SF.store_key), (SIP1(HashJoin): concat((SF.product_key)::varchar, (SF.product_version)::varchar))
| | | | +-- Inner -> STORAGE ACCESS for SD [Cost: 13, Rows: 250 (NO STATISTICS)] (PATH ID: 6)
| | | | | Projection: store.store_dimension_super
| | | | | Materialize: SD.store_key, SD.store_city, SD.store_region
| | | | | Execute on: All Nodes
| | | +-- Inner -> STORAGE ACCESS for P [Cost: 201, Rows: 60K (NO STATISTICS)] (PATH ID: 7)
| | | | Projection: public.product_dimension_super
| | | | Materialize: P.product_key, P.product_version
| | | | Filter: (P.product_description ~~* '%wine%')
| | | | Execute on: All Nodes
创建查询模板
您可以使用定向查询来实施除了用于筛选查询结果的谓词字符串以外都完全相同的多个查询。例如,定向查询 RegionalSalesWine
仅处理对包含字符串 wine
的 product_description
值进行筛选的输入查询。您可以创建此定向查询的修改版本,使其匹配多个输入查询的语法,这些输入查询仅在输入值(例如 tuna
)上有所不同。
按以下步骤创建此查询模板:
-
创建两个优化器生成的定向查询:
-
从对联接表的原始查询:
=> CREATE DIRECTED QUERY OPTIMIZER RegionalSalesProducts_JoinTables SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total FROM store.store_sales_fact SF JOIN store.store_dimension SD ON SF.store_key=SD.store_key JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version WHERE P.product_description ILIKE '%wine%' GROUP BY ROLLUP (SD.store_region, SD.store_city) ORDER BY Region,Total DESC; CREATE DIRECTED QUERY
-
从对扁平表的查询:
=> CREATE DIRECTED QUERY OPTIMIZER RegionalSalesProduct SELECT store_region AS Region, store_city AS City, SUM(gross_profit_dollar_amount) AS Total FROM store.store_sales_wide WHERE product_description ILIKE '%wine%' GROUP BY ROLLUP (region, city) ORDER BY Region,Total DESC; CREATE DIRECTED QUERY
-
-
查询系统表 DIRECTED_QUERIES 并复制定向查询
RegionalSalesProducts_JoinTables
的输入查询:SELECT input_query FROM directed_queries WHERE query_name = 'RegionalSalesProducts_JoinTables';
-
通过 SAVE QUERY 使用复制的输入查询:
SAVE QUERY SELECT SD.store_region AS Region, SD.store_city AS City, sum(SF.gross_profit_dollar_amount) AS Total FROM ((store.store_sales_fact SF JOIN store.store_dimension SD ON ((SF.store_key = SD.store_key))) JOIN public.product_dimension P ON ((concat((SF.product_key)::varchar, (SF.product_version)::varchar) = concat((P.product_key)::varchar, (P.product_version)::varchar)))) WHERE (P.product_description ~~* '%wine%'::varchar(6) /*+:v(1)*/) GROUP BY GROUPING SETS((SD.store_region, SD.store_city), (SD.store_region), ()) ORDER BY SD.store_region, sum(SF.gross_profit_dollar_amount) DESC (1 row)
-
查询系统表 DIRECTED_QUERIES 并复制定向查询
RegionalSalesProducts_FlatTables
的带注释查询:SELECT input_query FROM directed_queries WHERE query_name = 'RegionalSalesProducts_JoinTables';
-
使用复制的带注释查询创建自定义定向查询:
=> CREATE DIRECTED QUERY CUSTOM RegionalSalesProduct SELECT /*+verbatim*/ store_sales_wide.store_region AS Region, store_sales_wide.store_city AS City, sum(store_sales_wide.gross_profit_dollar_amount) AS Total FROM store.store_sales_wide AS store_sales_wide/*+projs('store.store_sales_wide')*/ WHERE (store_sales_wide.product_description ~~* '%wine%'::varchar(6) /*+:v(1)*/) GROUP BY /*+GByType(Hash)*/ GROUPING SETS((1, 2), (1), ()) ORDER BY 1 ASC, 3 DESC; CREATE DIRECTED QUERY
-
激活此定向查询:
ACTIVATE DIRECTED QUERY RegionalSalesProduct;
激活此定向查询后,Vertica 可以将其用于和模板相匹配的输入查询中,不同之处仅在于 product_description
的谓词值:
=> EXPLAIN SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
FROM store.store_sales_fact SF
JOIN store.store_dimension SD ON SF.store_key=SD.store_key
JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
WHERE P.product_description ILIKE '%tuna%'
GROUP BY ROLLUP (SD.store_region, SD.store_city)
ORDER BY Region,Total DESC;
...
The following active directed query(query name: RegionalSalesProduct) is being executed:
SELECT /*+verbatim*/ store_sales_wide.store_region AS Region, store_sales_wide.store_city AS City, sum(store_sales_wide.gross_profit_dollar_amount) AS Total
FROM store.store_sales_wide store_sales_wide/*+projs('store.store_sales_wide')*/
WHERE (store_sales_wide.product_description ~~* '%tuna%'::varchar(6))
GROUP BY /*+GByType(Hash)*/ GROUPING SETS((store_sales_wide.store_region, store_sales_wide.store_city), (store_sales_wide.store_region), ())
ORDER BY store_sales_wide.store_region, sum(store_sales_wide.gross_profit_dollar_amount) DESC
Access Path:
+-SORT [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
| Order: store_sales_wide.store_region ASC, sum(store_sales_wide.gross_profit_dollar_amount) DESC
| Execute on: All Nodes
| +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| | Aggregates: sum(store_sales_wide.gross_profit_dollar_amount)
| | Group By: store_sales_wide.store_region, store_sales_wide.store_city
| | Grouping Sets: (store_sales_wide.store_region, store_sales_wide.store_city, <SVAR>), (store_sales_wide.store_region, <SVAR>), (<SVAR>)
| | Execute on: All Nodes
| | +---> STORAGE ACCESS for store_sales_wide [Cost: 864, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
| | | Projection: store.store_sales_wide_b0
| | | Materialize: store_sales_wide.gross_profit_dollar_amount, store_sales_wide.store_city, store_sales_wide.store_region
| | | Filter: (store_sales_wide.product_description ~~* '%tuna%')
| | | Execute on: All Nodes
当您执行此查询时,它会返回以下结果:
Region | City | Total
-----------+------------------+---------
East | | 1564582
East | Elizabeth | 131328
East | Allentown | 129704
East | Boston | 128188
East | Sterling Heights | 125877
East | Lowell | 112133
East | New Haven | 101161
East | Waterbury | 85401
East | Washington | 76127
East | Erie | 73002
East | Charlotte | 67850
East | Memphis | 53650
East | Clarksville | 53416
East | Hartford | 52583
East | Columbia | 51950
East | Nashville | 50031
East | Manchester | 48607
East | Baltimore | 48108
East | Stamford | 47302
East | New York | 30840
East | Portsmouth | 26485
East | Alexandria | 26391
East | Philadelphia | 23092
East | Cambridge | 21356
MidWest | | 980209
MidWest | Lansing | 130044
MidWest | Livonia | 118740
--More--