重写查询

您可以使用定向查询来更改给定查询的语义,即用一个查询替换另一个查询。当您对 Vertica 数据库处理的输入查询的内容和格式没有或几乎没有控制权时,这一点尤其重要。您可以将这些查询映射到定向查询,从而重写原始输入以实现最佳执行效果。

以下部分介绍了两个用例:

重写联接查询

许多输入查询联接多个表。您已经确定,在许多情况下,更高效的做法是对多个扁平表中的大量数据进行反向标准化并直接查询这些表。您不能修改输入查询本身。但是,您可以使用定向查询将联接查询重定向到扁平表数据。

例如,以下查询通过联接 VMart 数据库中的三个表来聚合葡萄酒产品的区域销售额:

=> SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
    FROM store.store_sales_fact SF
    JOIN store.store_dimension SD ON SF.store_key=SD.store_key
    JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
  WHERE P.product_description ILIKE '%wine%'
  GROUP BY ROLLUP (SD.store_region, SD.store_city)
  ORDER BY Region,Total DESC;

您可以将联接的表数据合并到一个扁平表中,然后改为查询该表。通过这样做,您可以更快地访问同一数据。您可以使用以下 DDL 语句创建扁平表:

=> CREATE TABLE store.store_sales_wide AS SELECT * FROM store.store_sales_fact;
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_name VARCHAR(64)
     SET USING (SELECT store_name FROM store.store_dimension WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_city varchar(64)
    SET USING (SELECT store_city FROM store.store_dimension  WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_state char(2)
    SET USING (SELECT store_state char FROM store.store_dimension WHERE store.store_sales_wide.store_key=store.store_dimension.store_key)
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_region varchar(64)
    SET USING (SELECT store_region FROM store.store_dimension  WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
=> ALTER TABLE store.store_sales_wide ADD column product_description VARCHAR(128)
    SET USING (SELECT product_description FROM public.product_dimension
    WHERE store_sales_wide.product_key||store_sales_wide.product_version = product_dimension.product_key||product_dimension.product_version);
=> ALTER TABLE store.store_sales_wide ADD COLUMN sku_number char(32)
    SET USING (SELECT sku_number char FROM product_dimension
    WHERE store_sales_wide.product_key||store_sales_wide.product_version = product_dimension.product_key||product_dimension.product_version);

=> SELECT REFRESH_COLUMNS ('store.store_sales_wide','', 'rebuild');

创建此表并刷新其 SET USING 列后,可以重写前面的查询,如下所示:

=> SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
    FROM store.store_sales_fact SF
    JOIN store.store_dimension SD ON SF.store_key=SD.store_key
    JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
  WHERE P.product_description ILIKE '%wine%'
  GROUP BY ROLLUP (SD.store_region, SD.store_city)
  ORDER BY Region,Total DESC;
  Region   |       City       |  Total
-----------+------------------+---------
 East      |                  | 1679788
 East      | Boston           |  138494
 East      | Elizabeth        |  138071
 East      | Sterling Heights |  137719
 East      | Allentown        |  137122
 East      | New Haven        |  117751
 East      | Lowell           |  102670
 East      | Washington       |   84595
 East      | Charlotte        |   83255
 East      | Waterbury        |   81516
 East      | Erie             |   80784
 East      | Stamford         |   59935
 East      | Hartford         |   59843
 East      | Baltimore        |   55873
 East      | Clarksville      |   54117
 East      | Nashville        |   53757
 East      | Manchester       |   53290
 East      | Columbia         |   52799
 East      | Memphis          |   52648
 East      | Philadelphia     |   29711
 East      | Portsmouth       |   29316
 East      | New York         |   27033
 East      | Cambridge        |   26111
 East      | Alexandria       |   23378
 MidWest   |                  | 1073224
 MidWest   | Lansing          |  145616
 MidWest   | Livonia          |  129349
--More--

查询扁平表更高效;但是,您仍然必须考虑继续使用早期联接语法的输入查询。这可以通过创建自定义定向查询来完成,这会将这些输入查询重定向到以扁平表为目标的语法:

  1. 保存输入查询

    => SAVE QUERY SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
        FROM store.store_sales_fact SF
        JOIN store.store_dimension SD ON SF.store_key=SD.store_key
        JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
      WHERE P.product_description ILIKE '%wine%'
      GROUP BY ROLLUP (SD.store_region, SD.store_city)
      ORDER BY Region,Total DESC;
    SAVE QUERY
    
  2. 将保存的查询映射到具有所需语法的定向查询,然后激活定向查询:

    => CREATE DIRECTED QUERY CUSTOM 'RegionalSalesWine'
        SELECT store_region AS Region,
          store_city AS City,
          SUM(gross_profit_dollar_amount) AS Total
        FROM store.store_sales_wide
        WHERE product_description ILIKE '%wine%'
        GROUP BY ROLLUP (region, city)
        ORDER BY Region,Total DESC;
    CREATE DIRECTED QUERY
    
    => ACTIVATE DIRECTED QUERY RegionalSalesWine;
    ACTIVATE DIRECTED QUERY
    

当定向查询 RegionalSalesWine 处于活动状态时,查询优化器会将所有与原始输入格式相匹配的查询映射到定向查询,如下面的查询计划所示:

=> EXPLAIN SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
     FROM store.store_sales_fact SF
     JOIN store.store_dimension SD ON SF.store_key=SD.store_key
     JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
   WHERE P.product_description ILIKE '%wine%'
   GROUP BY ROLLUP (SD.store_region, SD.store_city)
   ORDER BY Region,Total DESC;
 ...
 The following active directed query(query name: RegionalSalesWine) is being executed:
 SELECT store_sales_wide.store_region AS Region, store_sales_wide.store_city AS City, sum(store_sales_wide.gross_profit_dollar_amount) AS Total
  FROM store.store_sales_wide WHERE (store_sales_wide.product_description ~~* '%wine%'::varchar(6))
  GROUP BY GROUPING SETS((store_sales_wide.store_region, store_sales_wide.store_city), (store_sales_wide.store_region),())
  ORDER BY store_sales_wide.store_region, sum(store_sales_wide.gross_profit_dollar_amount) DESC

 Access Path:
 +-SORT [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
 |  Order: store_sales_wide.store_region ASC, sum(store_sales_wide.gross_profit_dollar_amount) DESC
 |  Execute on: All Nodes
 | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Aggregates: sum(store_sales_wide.gross_profit_dollar_amount)
 | |      Group By: store_sales_wide.store_region, store_sales_wide.store_city
 | |      Grouping Sets: (store_sales_wide.store_region, store_sales_wide.store_city, <SVAR>), (store_sales_wide.store_region, <SVAR>), (<SVAR>)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for store_sales_wide [Cost: 864, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
 | | |      Projection: store.store_sales_wide_b0
 | | |      Materialize: store_sales_wide.gross_profit_dollar_amount, store_sales_wide.store_city, store_sales_wide.store_region
 | | |      Filter: (store_sales_wide.product_description ~~* '%wine%')
 | | |      Execute on: All Nodes

要对定向查询和原始输入查询的执行成本进行比较,请停用定向查询并对原始输入查询使用 EXPLAIN。优化器恢复到为输入查询创建计划,这会产生高得多的成本 - 188K 相对于 2K:

=> DEACTIVATE DIRECTED QUERY RegionalSalesWine;
DEACTIVATE DIRECTED QUERY
=> EXPLAIN SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
     FROM store.store_sales_fact SF
     JOIN store.store_dimension SD ON SF.store_key=SD.store_key
     JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
   WHERE P.product_description ILIKE '%wine%'
   GROUP BY ROLLUP (SD.store_region, SD.store_city)
   ORDER BY Region,Total DESC;
 ...
  Access Path:
 +-SORT [Cost: 188K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
 |  Order: SD.store_region ASC, sum(SF.gross_profit_dollar_amount) DESC
 |  Execute on: All Nodes
 | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 188K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Aggregates: sum(SF.gross_profit_dollar_amount)
 | |      Group By: SD.store_region, SD.store_city
 | |      Grouping Sets: (SD.store_region, SD.store_city, <SVAR>), (SD.store_region, <SVAR>), (<SVAR>)
 | |      Execute on: All Nodes
 | | +---> JOIN HASH [Cost: 12K, Rows: 5M (NO STATISTICS)] (PATH ID: 3) Inner (BROADCAST)
 | | |      Join Cond: (concat((SF.product_key)::varchar, (SF.product_version)::varchar) = concat((P.product_key)::varchar, (P.product_version)::varchar))
 | | |      Materialize at Input: SF.product_key, SF.product_version
 | | |      Materialize at Output: SF.gross_profit_dollar_amount
 | | |      Execute on: All Nodes
 | | | +-- Outer -> JOIN HASH [Cost: 2K, Rows: 5M (NO STATISTICS)] (PATH ID: 4) Inner (BROADCAST)
 | | | |      Join Cond: (SF.store_key = SD.store_key)
 | | | |      Execute on: All Nodes
 | | | | +-- Outer -> STORAGE ACCESS for SF [Cost: 1K, Rows: 5M (NO STATISTICS)] (PATH ID: 5)
 | | | | |      Projection: store.store_sales_fact_super
 | | | | |      Materialize: SF.store_key
 | | | | |      Execute on: All Nodes
 | | | | |      Runtime Filters: (SIP2(HashJoin): SF.store_key), (SIP1(HashJoin): concat((SF.product_key)::varchar, (SF.product_version)::varchar))
 | | | | +-- Inner -> STORAGE ACCESS for SD [Cost: 13, Rows: 250 (NO STATISTICS)] (PATH ID: 6)
 | | | | |      Projection: store.store_dimension_super
 | | | | |      Materialize: SD.store_key, SD.store_city, SD.store_region
 | | | | |      Execute on: All Nodes
 | | | +-- Inner -> STORAGE ACCESS for P [Cost: 201, Rows: 60K (NO STATISTICS)] (PATH ID: 7)
 | | | |      Projection: public.product_dimension_super
 | | | |      Materialize: P.product_key, P.product_version
 | | | |      Filter: (P.product_description ~~* '%wine%')
 | | | |      Execute on: All Nodes

创建查询模板

您可以使用定向查询来实施除了用于筛选查询结果的谓词字符串以外都完全相同的多个查询。例如,定向查询 RegionalSalesWine 仅处理对包含字符串 wineproduct_description 值进行筛选的输入查询。您可以创建此定向查询的修改版本,使其匹配多个输入查询的语法,这些输入查询仅在输入值(例如 tuna)上有所不同。

按以下步骤创建此查询模板:

  1. 创建两个优化器生成的定向查询

    • 从对联接表的原始查询:

      => CREATE DIRECTED QUERY OPTIMIZER RegionalSalesProducts_JoinTables
           SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
             FROM store.store_sales_fact SF
             JOIN store.store_dimension SD ON SF.store_key=SD.store_key
             JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
             WHERE P.product_description ILIKE '%wine%'
           GROUP BY ROLLUP (SD.store_region, SD.store_city)
           ORDER BY Region,Total DESC;
      CREATE DIRECTED QUERY
      
    • 从对扁平表的查询:

      => CREATE DIRECTED QUERY OPTIMIZER RegionalSalesProduct
          SELECT store_region AS Region, store_city AS City, SUM(gross_profit_dollar_amount) AS Total
          FROM store.store_sales_wide
          WHERE product_description ILIKE '%wine%'
          GROUP BY ROLLUP (region, city)
          ORDER BY Region,Total DESC;
      CREATE DIRECTED QUERY
      
  2. 查询系统表 DIRECTED_QUERIES 并复制定向查询 RegionalSalesProducts_JoinTables 的输入查询:

    SELECT input_query FROM directed_queries WHERE query_name = 'RegionalSalesProducts_JoinTables';
    
  3. 通过 SAVE QUERY 使用复制的输入查询:

    SAVE QUERY SELECT SD.store_region AS Region, SD.store_city AS City, sum(SF.gross_profit_dollar_amount) AS Total
      FROM ((store.store_sales_fact SF
      JOIN store.store_dimension SD ON ((SF.store_key = SD.store_key)))
      JOIN public.product_dimension P ON ((concat((SF.product_key)::varchar, (SF.product_version)::varchar) = concat((P.product_key)::varchar, (P.product_version)::varchar))))
      WHERE (P.product_description ~~* '%wine%'::varchar(6) /*+:v(1)*/)
      GROUP BY GROUPING SETS((SD.store_region, SD.store_city), (SD.store_region), ())
      ORDER BY SD.store_region, sum(SF.gross_profit_dollar_amount) DESC
    (1 row)
    
  4. 查询系统表 DIRECTED_QUERIES 并复制定向查询 RegionalSalesProducts_FlatTables 的带注释查询:

    SELECT input_query FROM directed_queries WHERE query_name = 'RegionalSalesProducts_JoinTables';
    
  5. 使用复制的带注释查询创建自定义定向查询

    => CREATE DIRECTED QUERY CUSTOM RegionalSalesProduct  SELECT /*+verbatim*/ store_sales_wide.store_region AS Region, store_sales_wide.store_city AS City, sum(store_sales_wide.gross_profit_dollar_amount) AS Total
         FROM store.store_sales_wide AS store_sales_wide/*+projs('store.store_sales_wide')*/
         WHERE (store_sales_wide.product_description ~~* '%wine%'::varchar(6) /*+:v(1)*/)
         GROUP BY  /*+GByType(Hash)*/ GROUPING SETS((1, 2), (1), ())
         ORDER BY 1 ASC, 3 DESC;
    CREATE DIRECTED QUERY
    
  6. 激活此定向查询:

    ACTIVATE DIRECTED QUERY RegionalSalesProduct;
    

激活此定向查询后,Vertica 可以将其用于和模板相匹配的输入查询中,不同之处仅在于 product_description 的谓词值:


=> EXPLAIN SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
         FROM store.store_sales_fact SF
         JOIN store.store_dimension SD ON SF.store_key=SD.store_key
         JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
       WHERE P.product_description ILIKE '%tuna%'
       GROUP BY ROLLUP (SD.store_region, SD.store_city)
       ORDER BY Region,Total DESC;
 ...
 The following active directed query(query name: RegionalSalesProduct) is being executed:
 SELECT /*+verbatim*/  store_sales_wide.store_region AS Region, store_sales_wide.store_city AS City, sum(store_sales_wide.gross_profit_dollar_amount) AS Total
  FROM store.store_sales_wide store_sales_wide/*+projs('store.store_sales_wide')*/
  WHERE (store_sales_wide.product_description ~~* '%tuna%'::varchar(6))
  GROUP BY /*+GByType(Hash)*/  GROUPING SETS((store_sales_wide.store_region, store_sales_wide.store_city), (store_sales_wide.store_region), ())
  ORDER BY store_sales_wide.store_region, sum(store_sales_wide.gross_profit_dollar_amount) DESC

 Access Path:
 +-SORT [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
 |  Order: store_sales_wide.store_region ASC, sum(store_sales_wide.gross_profit_dollar_amount) DESC
 |  Execute on: All Nodes
 | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Aggregates: sum(store_sales_wide.gross_profit_dollar_amount)
 | |      Group By: store_sales_wide.store_region, store_sales_wide.store_city
 | |      Grouping Sets: (store_sales_wide.store_region, store_sales_wide.store_city, <SVAR>), (store_sales_wide.store_region, <SVAR>), (<SVAR>)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for store_sales_wide [Cost: 864, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
 | | |      Projection: store.store_sales_wide_b0
 | | |      Materialize: store_sales_wide.gross_profit_dollar_amount, store_sales_wide.store_city, store_sales_wide.store_region
 | | |      Filter: (store_sales_wide.product_description ~~* '%tuna%')
 | | |      Execute on: All Nodes

当您执行此查询时,它会返回以下结果:

  Region   |       City       |  Total
-----------+------------------+---------
 East      |                  | 1564582
 East      | Elizabeth        |  131328
 East      | Allentown        |  129704
 East      | Boston           |  128188
 East      | Sterling Heights |  125877
 East      | Lowell           |  112133
 East      | New Haven        |  101161
 East      | Waterbury        |   85401
 East      | Washington       |   76127
 East      | Erie             |   73002
 East      | Charlotte        |   67850
 East      | Memphis          |   53650
 East      | Clarksville      |   53416
 East      | Hartford         |   52583
 East      | Columbia         |   51950
 East      | Nashville        |   50031
 East      | Manchester       |   48607
 East      | Baltimore        |   48108
 East      | Stamford         |   47302
 East      | New York         |   30840
 East      | Portsmouth       |   26485
 East      | Alexandria       |   26391
 East      | Philadelphia     |   23092
 East      | Cambridge        |   21356
 MidWest   |                  |  980209
 MidWest   | Lansing          |  130044
 MidWest   | Livonia          |  118740
--More--