这是本节的多页打印视图。 点击此处打印.

返回本页常规视图.

定向查询

定向查询封装优化器可用于创建查询计划的信息。定向查询可以实现以下目标:

  • 在计划的升级前保留当前查询计划。大多数情况下,升级 Vertica 后,查询的执行效率会变得更高。在少数并非如此的情况下,您可以使用升级之前所创建的定向查询,以便重新创建早期版本的查询计划。

  • 使您能够创建改善优化工具性能的查询计划。有时,您可能希望影响优化器,使其在执行给定查询时做出更好的选择。例如,您可以选择不同的投影,或强制执行不同的联接顺序。在这种情况下,可使用定向查询来创建一个查询计划,替代优化器可能另外创建的计划。

  • 将输入查询重定向到使用不同语义的查询 — 例如,将联接查询映射到用来查询扁平表的 SELECT 语句

定向查询组件

定向查询为两个组件配对:

  • 输入查询:在此定向查询处于活动状态时,触发使用它的查询。

  • 带注释的查询:具有嵌入式优化器提示的 SQL 语句,其中的提示指导优化器如何为指定的输入查询创建查询计划。这些提示指定重要的查询计划元素,如联接顺序和投影选择。

Vertica 提供两种创建定向查询的方法:

  • 优化器可以从给定输入查询创建带注释的查询,并将这两个查询配对为定向查询。

  • 您可以编写自己的带注释查询并将其与输入查询配对。

有关这两种方法的描述,请参阅创建定向查询

1 - 创建定向查询

CREATE DIRECTED QUERY 将输入查询与使用优化器提示进行注释的查询相关联。它将此关联存储在一个唯一标识符下。CREATE DIRECTED QUERY 有两种变体:

  • CREATE DIRECTED QUERY OPTIMIZER 指示查询优化器从指定的输入查询生成带注释的 SQL。带注释的查询包含一些提示,优化器可使用这些提示为输入查询重新创建其当前查询计划。

  • CREATE DIRECTED QUERY CUSTOM 指定用户提供的带注释的查询。Vertica 将带注释的查询与最后一个 SAVE QUERY 语句指定的输入查询相关联。

在这两种情形中,Vertica 都会将带注释的查询与输入查询相关联,并将其关联注册在 query_name 下的系统表 DIRECTED_QUERIES 中。

这两种方法可以一起使用:您可以使用优化器创建的带注释 SQL 作为创建自己的(自定义)定向查询的基础。

1.1 - 优化器生成的定向查询

CREATE DIRECTED QUERY OPTIMIZER 将输入查询传递给优化器,优化器从自己的查询计划生成带注释的查询。然后,它将输入查询和带注释的查询配对并将其保存为定向查询。此定向查询可用于处理其他除了作为查询结果筛选依据的谓词字符串以外完全相同的查询。

可在升级之前使用优化器生成的定向查询来捕获查询计划。如果在升级后检测到给定查询的性能下降,这样做将会非常有用。在这种情况下,可使用相应的定向查询重新创建早期查询计划,并将其性能与当前优化器生成的计划相对比。

示例

以下 SQL 语句创建和激活定向查询 findEmployeesCityJobTitle_OPT


=> CREATE DIRECTED QUERY OPTIMIZER 'findEmployeesCityJobTitle_OPT'
     SELECT employee_first_name, employee_last_name FROM public.employee_dimension
     WHERE employee_city='Boston' and job_title='Cashier' ORDER BY employee_last_name, employee_first_name;
CREATE DIRECTED QUERY

=> ACTIVATE DIRECTED QUERY findEmployeesCityJobTitle_OPT;
ACTIVATE DIRECTED QUERY

在激活此定向查询计划后,优化器会使用它为此输入查询的所有后续调用以及其他类似调用生成查询计划。可通过调用 GET DIRECTED QUERY 或查询系统表 DIRECTED_QUERIES 来查看优化器生成的带注释查询:

=> SELECT input_query, annotated_query FROM V_CATALOG.DIRECTED_QUERIES WHERE query_name = 'findEmployeesCityJobTitle_OPT';
-[ RECORD 1 ]---+----------------------------------------------------------------------------
input_query     | SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension
WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/))
ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name
annotated_query | SELECT /*+verbatim*/ employee_dimension.employee_first_name AS employee_first_name, employee_dimension.employee_last_name AS employee_last_name FROM public.employee_dimension AS employee_dimension/*+projs('public.employee_dimension')*/
WHERE (employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)
ORDER BY 2 ASC, 1 ASC

带注释的查询包括以下提示:

  • /*+verbatim*/ 指定完全按编写执行带注释的查询并相应地生成查询计划。
  • /*+projs('public.Emp_Dimension')*/ 指示优化器创建使用投影 public.Emp_Dimension 的查询计划。
  • /*+:v({{< codevar >}}n{{< /codevar >}})*/ (alias of /*+IGNORECONST(n)*/) 多次包含在带注释查询和输入查询中。这些提示限定查询谓词中的两个常量: Boston Cashier. Each :v 提示有一个整数实参 n,该实参将输入查询和带注释查询中的对应常量进行配对: *+:v(1)*/ (用于 Boston)和 /*+:v(2)*/(用于 Cashier)。这些提示告诉优化器在决定是否将此定向查询应用于其他类似的输入查询时忽略这些常量。因此,忽略常量提示可让您对不同的输入查询使用相同的定向查询。

以下查询对 employee_cityjob_title 列使用不同的值,但在其他方面与定向查询 EmployeesCityJobTitle_OPT 的原始输入查询相同:

=> SELECT employee_first_name, employee_last_name FROM public.employee_dimension
     WHERE employee_city = 'San Francisco' and job_title = 'Branch Manager' ORDER BY employee_last_name, employee_first_name;

如果定向查询 EmployeesCityJobTitle_OPT 处于活动状态,则优化器可以将其用于此查询:

=> EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension WHERE employee_city='San Francisco' AND job_title='Branch Manager' ORDER BY employee_last_name, employee_first_name;
 ...
 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------
 EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension WHERE employee_city='San Francisco' AND job_title='Branch Manager' ORDER BY employee_last_name, employee_first_name;

 The following active directed query(query name: findEmployeesCityJobTitle_OPT) is being executed:
 SELECT /*+verbatim*/  employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension employee_dimension/*+projs('public.employee_dimension')*/
 WHERE ((employee_dimension.employee_city = 'San Francisco'::varchar(13)) AND (employee_dimension.job_title = 'Branch Manager'::varchar(14)))
 ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name

 Access Path:
 +-SORT [Cost: 222, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
 |  Order: employee_dimension.employee_last_name ASC, employee_dimension.employee_first_name ASC
 |  Execute on: All Nodes
 | +---> STORAGE ACCESS for employee_dimension [Cost: 60, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.employee_dimension_super
 | |      Materialize: employee_dimension.employee_first_name, employee_dimension.employee_last_name
 | |      Filter: (employee_dimension.employee_city = 'San Francisco')
 | |      Filter: (employee_dimension.job_title = 'Branch Manager')
 | |      Execute on: All Nodes
 ...

1.2 - 自定义定向查询

CREATE DIRECTED QUERY CUSTOM 指定带注释的查询并将其与 SAVE QUERY 先前保存的输入查询配对。必须在同一用户会话中发出这两个语句。

例如,您可能希望某查询使用特定投影:

  1. 使用 SAVE QUERY 指定查询:

    => SAVE QUERY SELECT employee_first_name, employee_last_name FROM employee_dimension
        WHERE employee_city='Boston' AND job_title='Cashier';
    SAVE QUERY
    
  2. 使用 CREATE DIRECTED QUERY CUSTOM 创建自定义定向查询,它指定带注释的查询并将其与保存的查询相关联。带注释的查询包含一个 /*+projs*/ 提示,它指示优化器在用户调用保存的查询时使用投影 public.emp_dimension_unseg

    
    => CREATE DIRECTED QUERY CUSTOM 'findBostonCashiers_CUSTOM'
       SELECT employee_first_name, employee_last_name
       FROM employee_dimension /*+Projs('public.emp_dimension_unseg')*/
       WHERE employee_city='Boston' AND job_title='Cashier';
    CREATE DIRECTED QUERY
    
  3. 激活此定向查询:

    => ACTIVATE DIRECTED QUERY findBostonCashiers_CUSTOM;
    ACTIVATE DIRECTED QUERY
    
  4. 激活后,优化器将使用此定向查询为其输入查询的所有后续调用生成查询计划。下面的 EXPLAIN 输出验证优化器使用此定向查询的情况及其指定的投影:

    
    => EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension
       WHERE employee_city='Boston' AND job_title='Cashier';
    
    QUERY PLAN
    ------------------------------
    QUERY PLAN DESCRIPTION:
    ------------------------------
    EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension where employee_city='Boston' AND job_title='Cashier';
    
     The following active directed query(query name: findBostonCashiers_CUSTOM) is being executed:
     SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name
      FROM public.employee_dimension/*+Projs('public.emp_dimension_unseg')*/
      WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6)) AND (employee_dimension.job_title = 'Cashier'::varchar(7)))
    
     Access Path:
     +-STORAGE ACCESS for employee_dimension [Cost: 158, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
     |  Projection: public.emp_dimension_unseg
     |  Materialize: employee_dimension.employee_first_name, employee_dimension.employee_last_name
     |  Filter: (employee_dimension.employee_city = 'Boston')
     |  Filter: (employee_dimension.job_title = 'Cashier')
     |  Execute on: Query Initiator
    

另请参阅

重写联接查询

1.3 - 结合使用优化器生成的定向查询和自定义定向查询

可以根据优化器所创建的带注释 SQL 来创建自己的自定义定向查询。当评估优化器创建用于处理给定查询的计划以及测试计划修改内容时,此方法特别有用。

例如,您可能希望修改优化器对以下查询的实施方式:

=> SELECT COUNT(customer_name) Total, customer_region Region
    FROM (store_sales s JOIN customer_dimension c ON c.customer_key = s.customer_key)
    JOIN product_dimension p ON s.product_key = p.product_key
    WHERE p.category_description ilike '%Medical%'
      AND p.product_description ilike '%antibiotics%'
      AND c.customer_age <= 30 AND YEAR(s.sales_date)=2017
    GROUP BY customer_region;

对此查询运行 EXPLAIN 时,您会发现优化器将投影 customers_proj_age 用于 customer_dimension 表。此投影按列 customer_age 进行排序。因此,优化器将根据 customer_key 对表 store_salescustomer_dimension 进行哈希联接。

分析 customer_dimension 表数据后,您观察到大部分客户的年龄不到 30 岁,因此对按 customer_key 排序的 customer_dimension 表使用投影 customer_proj_id 更有意义:

您可以创建一个用于封装此更改的定向查询,如下所示:

  1. 使用 EXPLAIN ANNOTATED 获取由优化器生成的查询注释:

    => \o annotatedQuery
    => EXPLAIN ANNOTATED SELECT COUNT(customer_name) Total, customer_region Region
         FROM (store_sales s JOIN customer_dimension c ON c.customer_key = s.customer_key)
         JOIN product_dimension p ON s.product_key = p.product_key
         WHERE p.category_description ilike '%Medical%'
           AND p.product_description ilike '%antibiotics%'
           AND c.customer_age <= 30 AND YEAR(s.sales_date)=2017
         GROUP BY customer_region;
    => \o
    => \! cat annotatedQuery
    ...
    SELECT /*+syntactic_join,verbatim*/ count(c.customer_name) AS Total, c.customer_region AS Region
     FROM ((public.store_sales AS s/*+projs('public.store_sales_super')*/
        JOIN /*+Distrib(L,B),JType(H)*/ public.customer_dimension AS c/*+projs('public.customers_proj_age')*/
          ON (c.customer_key = s.customer_key))
        JOIN /*+Distrib(L,B),JType(M)*/ public.product_dimension AS p/*+projs('public.product_dimension')*/
          ON (s.product_key = p.product_key))
     WHERE ((date_part('year'::varchar(4), (s.sales_date)::timestamp(0)))::int = 2017)
         AND (c.customer_age <= 30)
         AND ((p.category_description)::varchar(32) ~~* '%Medical%'::varchar(9))
         AND (p.product_description ~~* '%antibiotics%'::varchar(13))
     GROUP BY  /*+GByType(Hash)*/ 2
    (4 rows)
    
  2. 修改带注释的查询:

    
    SELECT /*+syntactic_join,verbatim*/ count(c.customer_name) AS Total, c.customer_region AS Region
     FROM ((public.store_sales AS s/*+projs('public.store_sales_super')*/
        JOIN /*+Distrib(L,B),JType(H)*/ public.customer_dimension AS c/*+projs('public.customer_proj_id')*/
          ON (c.customer_key = s.customer_key))
        JOIN /*+Distrib(L,B),JType(H)*/ public.product_dimension AS p/*+projs('public.product_dimension')*/
          ON (s.product_key = p.product_key))
     WHERE ((date_part('year'::varchar(4), (s.sales_date)::timestamp(0)))::int = 2017)
         AND (c.customer_age <= 30)
         AND ((p.category_description)::varchar(32) ~~* '%Medical%'::varchar(9))
         AND (p.product_description ~~* '%antibiotics%'::varchar(13))
     GROUP BY  /*+GByType(Hash)*/ 2
    
  3. 使用修改后的带注释查询来创建所需的定向查询:

    • 使用 SAVE QUERY 保存所需的输入查询:

      
      => SAVE QUERY SELECT COUNT(customer_name) Total, customer_region Region
          FROM (store_sales s JOIN customer_dimension c ON c.customer_key = s.customer_key)
          JOIN product_dimension p ON s.product_key = p.product_key
          WHERE p.category_description ilike '%Medical%'
            AND p.product_description ilike '%antibiotics%'
            AND c.customer_age <= 30 AND YEAR(s.sales_date)=2017
          GROUP BY customer_region;
      
    • 创建一个自定义定向查询,将保存的输入查询与修改后的带注释查询相关联:

      
      => CREATE DIRECTED QUERY CUSTOM 'getCustomersUnder31'
         SELECT /*+syntactic_join,verbatim*/ count(c.customer_name) AS Total, c.customer_region AS Region
       FROM ((public.store_sales AS s/*+projs('public.store_sales_super')*/
          JOIN /*+Distrib(L,B),JType(H)*/ public.customer_dimension AS c/*+projs('public.customer_proj_id')*/
            ON (c.customer_key = s.customer_key))
          JOIN /*+Distrib(L,B),JType(H)*/ public.product_dimension AS p/*+projs('public.product_dimension')*/
            ON (s.product_key = p.product_key))
       WHERE ((date_part('year'::varchar(4), (s.sales_date)::timestamp(0)))::int = 2017)
           AND (c.customer_age <= 30)
           AND ((p.category_description)::varchar(32) ~~* '%Medical%'::varchar(9))
           AND (p.product_description ~~* '%antibiotics%'::varchar(13))
       GROUP BY  /*+GByType(Hash)*/ 2;
      CREATE DIRECTED QUERY
      
  4. 激活此定向查询:

    => ACTIVATE DIRECTED QUERY getCustomersUnder31;
    ACTIVATE DIRECTED QUERY
    

当优化器处理与此定向查询的输入查询相匹配的查询时,它会使用此定向查询的带注释查询来生成查询计划:

=> EXPLAIN SELECT COUNT(customer_name) Total, customer_region Region
     FROM (store_sales s JOIN customer_dimension c ON c.customer_key = s.customer_key)
     JOIN product_dimension p ON s.product_key = p.product_key
     WHERE p.category_description ilike '%Medical%'
       AND p.product_description ilike '%antibiotics%'
       AND c.customer_age <= 30 AND YEAR(s.sales_date)=2017
     GROUP BY customer_region;

 The following active directed query(query name: getCustomersUnder31) is being executed:
...

2 - 设置带注释查询中的提示

定向查询的带注释查询中的提示为优化器提供关于如何执行输入查询的说明。带注释查询支持以下提示:

  • 联接提示 用于指定联接顺序、联接类型和联接数据分布:SYNTACTIC_JOINDISTRIBJTYPEUTYPE

  • 表提示 用于指定在查询计划中包括和排除哪些投影:PROJSSKIP_PROJS

  • :v 及其别名 IGNORECONSTANT 标记谓词字符串常量,当优化器决定是否对给定输入查询使用定向查询时,您希望优化器忽略这些常量。有关详细信息,请参阅忽略定向查询中的常量

  • :c 提示标记不得忽略的谓词常量。

  • VERBATIM 完全按编写的那样强制执行带注释查询。

带注释查询中的其他提示(如 DIRECT 或 LABEL)不起作用。

您可以在 vsql 查询中使用与带注释查询中相同的提示,只有两个例外情况: :v (IGNORECONSTANT) 和 VERBATIM

3 - 忽略定向查询中的常量

优化器生成的定向查询通常包括一个或多个 :vIGNORECONSTANT 的别名)提示,这些提示标记谓词字符串常量,当优化器决定是否对给定输入查询使用定向查询时,您希望优化器忽略这些常量。 :v 提示使多个查询能够使用同一个定向查询,但前提是这些查询在除了其谓词字符串以外的所有其他方面都相同。

例如,在以下两个查询中,除了分别为 employee_cityjob_title 列指定的字符串常量 Boston|San FranciscoCashier|Branch Manager 外,所有其他方面都相同:


=> SELECT employee_first_name, employee_last_name FROM public.employee_dimension
     WHERE employee_city='Boston' and job_title ='Cashier' ORDER BY employee_last_name, employee_first_name;

=> SELECT employee_first_name, employee_last_name FROM public.employee_dimension
     WHERE employee_city = 'San Francisco' and job_title = 'Branch Manager' ORDER BY employee_last_name, employee_first_name;

在这种情况下,您从一个查询创建的、由优化器生成的定向查询可用于这两个查询:


=> CREATE DIRECTED QUERY OPTIMIZER 'findEmployeesCityJobTitle_OPT'
     SELECT employee_first_name, employee_last_name FROM public.employee_dimension
     WHERE employee_city='Boston' and job_title='Cashier' ORDER BY employee_last_name, employee_first_name;
CREATE DIRECTED QUERY

=> ACTIVATE DIRECTED QUERY findEmployeesCityJobTitle_OPT;
ACTIVATE DIRECTED QUERY

定向查询的输入查询和带注释查询都包含 :v 提示:

=> SELECT input_query, annotated_query FROM V_CATALOG.DIRECTED_QUERIES WHERE query_name = 'findEmployeesCityJobTitle_OPT';
-[ RECORD 1 ]---+----------------------------------------------------------------------------
input_query     | SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension
WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/))
ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name
annotated_query | SELECT /*+verbatim*/ employee_dimension.employee_first_name AS employee_first_name, employee_dimension.employee_last_name AS employee_last_name FROM public.employee_dimension AS employee_dimension/*+projs('public.employee_dimension')*/
WHERE (employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)
ORDER BY 2 ASC, 1 ASC

输入查询和带注释查询中的提示实参将两个谓词常量配对:

  • /*+:v(1)*/employee_city 的输入查询和带注释查询设置配对。

  • /*+:v(2)*/job_title 的输入查询和带注释查询设置配对。

:v 提示告诉优化器在决定是否可以将此定向查询用于给定输入查询时忽略这两列的值。

例如,以下查询虽然对 employee_cityjob_title 使用不同的值,但在其他方面与用于创建定向查询 EmployeesCityJobTitle_OPT 的查询相同:

=> SELECT employee_first_name, employee_last_name FROM public.employee_dimension
     WHERE employee_city = 'San Francisco' and job_title = 'Branch Manager' ORDER BY employee_last_name, employee_first_name;

如果定向查询 EmployeesCityJobTitle_OPT 处于活动状态,则优化器可以将该定向查询用在以下查询的查询计划中:

=> EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension WHERE employee_city='San Francisco' AND job_title='Branch Manager' ORDER BY employee_last_name, employee_first_name;
 ...
 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------
 EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension WHERE employee_city='San Francisco' AND job_title='Branch Manager' ORDER BY employee_last_name, employee_first_name;

 The following active directed query(query name: findEmployeesCityJobTitle_OPT) is being executed:
 SELECT /*+verbatim*/  employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension employee_dimension/*+projs('public.employee_dimension')*/
 WHERE ((employee_dimension.employee_city = 'San Francisco'::varchar(13)) AND (employee_dimension.job_title = 'Branch Manager'::varchar(14)))
 ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name

 Access Path:
 +-SORT [Cost: 222, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
 |  Order: employee_dimension.employee_last_name ASC, employee_dimension.employee_first_name ASC
 |  Execute on: All Nodes
 | +---> STORAGE ACCESS for employee_dimension [Cost: 60, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.employee_dimension_super
 | |      Materialize: employee_dimension.employee_first_name, employee_dimension.employee_last_name
 | |      Filter: (employee_dimension.employee_city = 'San Francisco')
 | |      Filter: (employee_dimension.job_title = 'Branch Manager')
 | |      Execute on: All Nodes
 ...

映射一对多 :v 提示

目前所显示的示例演示了 :v 提示的一对一配对。您也可以使用 :v 提示将一个输入常数映射到带注释查询中的多个常数。当您想为优化器提供关于如何执行查询(用来联接表)的显式说明时,这种方法特别有用。

例如,以下查询联接了两个表:

SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 8;

在这种情况下,优化器可以推断 S.aT.b 具有相同的值,并相应地实施联接。

<a name="simpleJoinExample"></a>=> CREATE DIRECTED QUERY OPTIMIZER simpleJoin SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 8;
CREATE DIRECTED QUERY
=> SELECT input_query, annotated_query FROM directed_queries WHERE query_name = 'simpleJoin';
-[ RECORD 1 ]---+---------------------------------------------------------------------------------------------------------------------
input_query     | SELECT S.a, T.b FROM (public.S JOIN public.T ON ((S.a = T.b))) WHERE (S.a = 8 /*+:v(1)*/)
annotated_query | SELECT /*+syntactic_join,verbatim*/ S.a AS a, T.b AS b
FROM (public.S AS S/*+projs('public.S')*/ JOIN /*+Distrib(L,L),JType(M)*/ public.T AS T/*+projs('public.T')*/  ON (S.a = T.b))
WHERE (S.a = 8 /*+:v(1)*/) AND (T.b = 8 /*+:v(1)*/)
(1 row)

=> ACTIVATE DIRECTED QUERY simpleJoin;
ACTIVATED DIRECTED QUERY

现在,以下列输入查询为例:

SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 3;

优化器忽略联接谓词常量并使用其查询计划中的定向查询 simpleJoin

 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 3;

 The following active directed query(query name: simpleJoin) is being executed:
 SELECT /*+syntactic_join,verbatim*/  S.a, T.b FROM (public.S S/*+projs('public.S')*/ JOIN /*+Distrib('L', 'L'), JType('
M')*/public.T T/*+projs('public.T')*/ ON ((S.a = T.b))) WHERE ((S.a = 3) AND (T.b = 3))

 Access Path:
 +-JOIN MERGEJOIN(inputs presorted) [Cost: 21, Rows: 4 (NO STATISTICS)] (PATH ID: 1)
 |  Join Cond: (S.a = T.b)
 |  Execute on: Query Initiator
 | +-- Outer -> STORAGE ACCESS for S [Cost: 12, Rows: 4 (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.S_b0
 | |      Materialize: S.a
 | |      Filter: (S.a = 3)
 | |      Execute on: Query Initiator
 | |      Runtime Filter: (SIP1(MergeJoin): S.a)
 | +-- Inner -> STORAGE ACCESS for T [Cost: 8, Rows: 3 (NO STATISTICS)] (PATH ID: 3)
 | |      Projection: public.T_b0
 | |      Materialize: T.b
 | |      Filter: (T.b = 3)
 | |      Execute on: Query Initiator
 ...

在定向查询中保留谓词常量

默认情况下,优化器生成的定向查询在谓词常量上设置 :v 提示。您可以覆盖此行为,方法是使用 :c 提示来标记不得忽略的谓词常量。例如,以下语句创建一个定向查询,该定向查询只能用于联接谓词常量 8 与原始输入查询中相同的输入查询:

=> CREATE DIRECTED QUERY OPTIMIZER simpleJoin_KeepPredicateConstant SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 8 /*+:c*/;
CREATE DIRECTED QUERY
=> ACTIVATE DIRECTED QUERY simpleJoin_KeepPredicateConstant;

以下对系统表 DIRECTED_QUERIES 的查询对定向查询 simpleJoin(在前面的示例中创建)和 simpleJoin_KeepPredicateConstant 进行了比较。与 simpleJoin 不同,simpleJoin_KeepPredicateConstant 的输入查询和带注释查询的联接谓词省略了 :v 提示:



=> SELECT query_name, input_query, annotated_query FROM directed_queries WHERE query_name ILIKE'%simpleJoin%';
-[ RECORD 1 ]---+
query_name      | simpleJoin
input_query     | SELECT S.a, T.b FROM (public.S JOIN public.T ON ((S.a = T.b))) WHERE (S.a = 8 /*+:v(1)*/)
annotated_query | SELECT /*+syntactic_join,verbatim*/ S.a AS a, T.b AS b
FROM (public.S AS S/*+projs('public.S')*/ JOIN /*+Distrib(L,L),JType(M)*/ public.T AS T/*+projs('public.T')*/  ON (S.a = T.b))
WHERE (S.a = 8 /*+:v(1)*/) AND (T.b = 8 /*+:v(1)*/)
-[ RECORD 2 ]---+
query_name      | simpleJoin_KeepPredicateConstant
input_query     | SELECT S.a, T.b FROM (public.S JOIN public.T ON ((S.a = T.b))) WHERE (S.a = 8)
annotated_query | SELECT /*+syntactic_join,verbatim*/ S.a AS a, T.b AS b
FROM (public.S AS S/*+projs('public.S')*/ JOIN /*+Distrib(L,L),JType(M)*/ public.T AS T/*+projs('public.T')*/  ON (S.a = T.b))
WHERE (S.a = 8) AND (T.b = 8)

如果停用定向查询 simpleJoin(否则该定向查询会优先于 simpleJoin_KeepPredicateConstant,因为前者创建的早),Vertica 仅将 simpleJoin_KeepPredicateConstant 应用于输入查询,这里联接谓词常量与原始输入查询中相同。例如,比较以下两个查询计划:

=> EXPLAIN SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 8;
 ...
 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 8;

 The following active directed query(query name: simpleJoin_KeepPredicateConstant) is being executed:
 SELECT /*+syntactic_join,verbatim*/  S.a, T.b FROM (public.S S/*+projs('public.S')*/ JOIN /*+Distrib('L', 'L'), JType('
M')*/public.T T/*+projs('public.T')*/ ON ((S.a = T.b))) WHERE ((S.a = 8) AND (T.b = 8))

 Access Path:
 +-JOIN MERGEJOIN(inputs presorted) [Cost: 21, Rows: 4 (NO STATISTICS)] (PATH ID: 1)
 |  Join Cond: (S.a = T.b)
 ...
=> EXPLAIN SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 3
 ...
 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 3;

 Access Path:
 +-JOIN MERGEJOIN(inputs presorted) [Cost: 21, Rows: 4 (NO STATISTICS)] (PATH ID: 1)
 |  Join Cond: (S.a = T.b)
 ...

4 - 重写查询

您可以使用定向查询来更改给定查询的语义,即用一个查询替换另一个查询。当您对 Vertica 数据库处理的输入查询的内容和格式没有或几乎没有控制权时,这一点尤其重要。您可以将这些查询映射到定向查询,从而重写原始输入以实现最佳执行效果。

以下部分介绍了两个用例:

重写联接查询

许多输入查询联接多个表。您已经确定,在许多情况下,更高效的做法是对多个扁平表中的大量数据进行反向标准化并直接查询这些表。您不能修改输入查询本身。但是,您可以使用定向查询将联接查询重定向到扁平表数据。

例如,以下查询通过联接 VMart 数据库中的三个表来聚合葡萄酒产品的区域销售额:

=> SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
    FROM store.store_sales_fact SF
    JOIN store.store_dimension SD ON SF.store_key=SD.store_key
    JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
  WHERE P.product_description ILIKE '%wine%'
  GROUP BY ROLLUP (SD.store_region, SD.store_city)
  ORDER BY Region,Total DESC;

您可以将联接的表数据合并到一个扁平表中,然后改为查询该表。通过这样做,您可以更快地访问同一数据。您可以使用以下 DDL 语句创建扁平表:

=> CREATE TABLE store.store_sales_wide AS SELECT * FROM store.store_sales_fact;
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_name VARCHAR(64)
     SET USING (SELECT store_name FROM store.store_dimension WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_city varchar(64)
    SET USING (SELECT store_city FROM store.store_dimension  WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_state char(2)
    SET USING (SELECT store_state char FROM store.store_dimension WHERE store.store_sales_wide.store_key=store.store_dimension.store_key)
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_region varchar(64)
    SET USING (SELECT store_region FROM store.store_dimension  WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
=> ALTER TABLE store.store_sales_wide ADD column product_description VARCHAR(128)
    SET USING (SELECT product_description FROM public.product_dimension
    WHERE store_sales_wide.product_key||store_sales_wide.product_version = product_dimension.product_key||product_dimension.product_version);
=> ALTER TABLE store.store_sales_wide ADD COLUMN sku_number char(32)
    SET USING (SELECT sku_number char FROM product_dimension
    WHERE store_sales_wide.product_key||store_sales_wide.product_version = product_dimension.product_key||product_dimension.product_version);

=> SELECT REFRESH_COLUMNS ('store.store_sales_wide','', 'rebuild');

创建此表并刷新其 SET USING 列后,可以重写前面的查询,如下所示:

=> SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
    FROM store.store_sales_fact SF
    JOIN store.store_dimension SD ON SF.store_key=SD.store_key
    JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
  WHERE P.product_description ILIKE '%wine%'
  GROUP BY ROLLUP (SD.store_region, SD.store_city)
  ORDER BY Region,Total DESC;
  Region   |       City       |  Total
-----------+------------------+---------
 East      |                  | 1679788
 East      | Boston           |  138494
 East      | Elizabeth        |  138071
 East      | Sterling Heights |  137719
 East      | Allentown        |  137122
 East      | New Haven        |  117751
 East      | Lowell           |  102670
 East      | Washington       |   84595
 East      | Charlotte        |   83255
 East      | Waterbury        |   81516
 East      | Erie             |   80784
 East      | Stamford         |   59935
 East      | Hartford         |   59843
 East      | Baltimore        |   55873
 East      | Clarksville      |   54117
 East      | Nashville        |   53757
 East      | Manchester       |   53290
 East      | Columbia         |   52799
 East      | Memphis          |   52648
 East      | Philadelphia     |   29711
 East      | Portsmouth       |   29316
 East      | New York         |   27033
 East      | Cambridge        |   26111
 East      | Alexandria       |   23378
 MidWest   |                  | 1073224
 MidWest   | Lansing          |  145616
 MidWest   | Livonia          |  129349
--More--

查询扁平表更高效;但是,您仍然必须考虑继续使用早期联接语法的输入查询。这可以通过创建自定义定向查询来完成,这会将这些输入查询重定向到以扁平表为目标的语法:

  1. 保存输入查询

    => SAVE QUERY SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
        FROM store.store_sales_fact SF
        JOIN store.store_dimension SD ON SF.store_key=SD.store_key
        JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
      WHERE P.product_description ILIKE '%wine%'
      GROUP BY ROLLUP (SD.store_region, SD.store_city)
      ORDER BY Region,Total DESC;
    SAVE QUERY
    
  2. 将保存的查询映射到具有所需语法的定向查询,然后激活定向查询:

    => CREATE DIRECTED QUERY CUSTOM 'RegionalSalesWine'
        SELECT store_region AS Region,
          store_city AS City,
          SUM(gross_profit_dollar_amount) AS Total
        FROM store.store_sales_wide
        WHERE product_description ILIKE '%wine%'
        GROUP BY ROLLUP (region, city)
        ORDER BY Region,Total DESC;
    CREATE DIRECTED QUERY
    
    => ACTIVATE DIRECTED QUERY RegionalSalesWine;
    ACTIVATE DIRECTED QUERY
    

当定向查询 RegionalSalesWine 处于活动状态时,查询优化器会将所有与原始输入格式相匹配的查询映射到定向查询,如下面的查询计划所示:

=> EXPLAIN SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
     FROM store.store_sales_fact SF
     JOIN store.store_dimension SD ON SF.store_key=SD.store_key
     JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
   WHERE P.product_description ILIKE '%wine%'
   GROUP BY ROLLUP (SD.store_region, SD.store_city)
   ORDER BY Region,Total DESC;
 ...
 The following active directed query(query name: RegionalSalesWine) is being executed:
 SELECT store_sales_wide.store_region AS Region, store_sales_wide.store_city AS City, sum(store_sales_wide.gross_profit_dollar_amount) AS Total
  FROM store.store_sales_wide WHERE (store_sales_wide.product_description ~~* '%wine%'::varchar(6))
  GROUP BY GROUPING SETS((store_sales_wide.store_region, store_sales_wide.store_city), (store_sales_wide.store_region),())
  ORDER BY store_sales_wide.store_region, sum(store_sales_wide.gross_profit_dollar_amount) DESC

 Access Path:
 +-SORT [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
 |  Order: store_sales_wide.store_region ASC, sum(store_sales_wide.gross_profit_dollar_amount) DESC
 |  Execute on: All Nodes
 | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Aggregates: sum(store_sales_wide.gross_profit_dollar_amount)
 | |      Group By: store_sales_wide.store_region, store_sales_wide.store_city
 | |      Grouping Sets: (store_sales_wide.store_region, store_sales_wide.store_city, <SVAR>), (store_sales_wide.store_region, <SVAR>), (<SVAR>)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for store_sales_wide [Cost: 864, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
 | | |      Projection: store.store_sales_wide_b0
 | | |      Materialize: store_sales_wide.gross_profit_dollar_amount, store_sales_wide.store_city, store_sales_wide.store_region
 | | |      Filter: (store_sales_wide.product_description ~~* '%wine%')
 | | |      Execute on: All Nodes

要对定向查询和原始输入查询的执行成本进行比较,请停用定向查询并对原始输入查询使用 EXPLAIN。优化器恢复到为输入查询创建计划,这会产生高得多的成本 - 188K 相对于 2K:

=> DEACTIVATE DIRECTED QUERY RegionalSalesWine;
DEACTIVATE DIRECTED QUERY
=> EXPLAIN SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
     FROM store.store_sales_fact SF
     JOIN store.store_dimension SD ON SF.store_key=SD.store_key
     JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
   WHERE P.product_description ILIKE '%wine%'
   GROUP BY ROLLUP (SD.store_region, SD.store_city)
   ORDER BY Region,Total DESC;
 ...
  Access Path:
 +-SORT [Cost: 188K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
 |  Order: SD.store_region ASC, sum(SF.gross_profit_dollar_amount) DESC
 |  Execute on: All Nodes
 | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 188K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Aggregates: sum(SF.gross_profit_dollar_amount)
 | |      Group By: SD.store_region, SD.store_city
 | |      Grouping Sets: (SD.store_region, SD.store_city, <SVAR>), (SD.store_region, <SVAR>), (<SVAR>)
 | |      Execute on: All Nodes
 | | +---> JOIN HASH [Cost: 12K, Rows: 5M (NO STATISTICS)] (PATH ID: 3) Inner (BROADCAST)
 | | |      Join Cond: (concat((SF.product_key)::varchar, (SF.product_version)::varchar) = concat((P.product_key)::varchar, (P.product_version)::varchar))
 | | |      Materialize at Input: SF.product_key, SF.product_version
 | | |      Materialize at Output: SF.gross_profit_dollar_amount
 | | |      Execute on: All Nodes
 | | | +-- Outer -> JOIN HASH [Cost: 2K, Rows: 5M (NO STATISTICS)] (PATH ID: 4) Inner (BROADCAST)
 | | | |      Join Cond: (SF.store_key = SD.store_key)
 | | | |      Execute on: All Nodes
 | | | | +-- Outer -> STORAGE ACCESS for SF [Cost: 1K, Rows: 5M (NO STATISTICS)] (PATH ID: 5)
 | | | | |      Projection: store.store_sales_fact_super
 | | | | |      Materialize: SF.store_key
 | | | | |      Execute on: All Nodes
 | | | | |      Runtime Filters: (SIP2(HashJoin): SF.store_key), (SIP1(HashJoin): concat((SF.product_key)::varchar, (SF.product_version)::varchar))
 | | | | +-- Inner -> STORAGE ACCESS for SD [Cost: 13, Rows: 250 (NO STATISTICS)] (PATH ID: 6)
 | | | | |      Projection: store.store_dimension_super
 | | | | |      Materialize: SD.store_key, SD.store_city, SD.store_region
 | | | | |      Execute on: All Nodes
 | | | +-- Inner -> STORAGE ACCESS for P [Cost: 201, Rows: 60K (NO STATISTICS)] (PATH ID: 7)
 | | | |      Projection: public.product_dimension_super
 | | | |      Materialize: P.product_key, P.product_version
 | | | |      Filter: (P.product_description ~~* '%wine%')
 | | | |      Execute on: All Nodes

创建查询模板

您可以使用定向查询来实施除了用于筛选查询结果的谓词字符串以外都完全相同的多个查询。例如,定向查询 RegionalSalesWine 仅处理对包含字符串 wineproduct_description 值进行筛选的输入查询。您可以创建此定向查询的修改版本,使其匹配多个输入查询的语法,这些输入查询仅在输入值(例如 tuna)上有所不同。

按以下步骤创建此查询模板:

  1. 创建两个优化器生成的定向查询

    • 从对联接表的原始查询:

      => CREATE DIRECTED QUERY OPTIMIZER RegionalSalesProducts_JoinTables
           SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
             FROM store.store_sales_fact SF
             JOIN store.store_dimension SD ON SF.store_key=SD.store_key
             JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
             WHERE P.product_description ILIKE '%wine%'
           GROUP BY ROLLUP (SD.store_region, SD.store_city)
           ORDER BY Region,Total DESC;
      CREATE DIRECTED QUERY
      
    • 从对扁平表的查询:

      => CREATE DIRECTED QUERY OPTIMIZER RegionalSalesProduct
          SELECT store_region AS Region, store_city AS City, SUM(gross_profit_dollar_amount) AS Total
          FROM store.store_sales_wide
          WHERE product_description ILIKE '%wine%'
          GROUP BY ROLLUP (region, city)
          ORDER BY Region,Total DESC;
      CREATE DIRECTED QUERY
      
  2. 查询系统表 DIRECTED_QUERIES 并复制定向查询 RegionalSalesProducts_JoinTables 的输入查询:

    SELECT input_query FROM directed_queries WHERE query_name = 'RegionalSalesProducts_JoinTables';
    
  3. 通过 SAVE QUERY 使用复制的输入查询:

    SAVE QUERY SELECT SD.store_region AS Region, SD.store_city AS City, sum(SF.gross_profit_dollar_amount) AS Total
      FROM ((store.store_sales_fact SF
      JOIN store.store_dimension SD ON ((SF.store_key = SD.store_key)))
      JOIN public.product_dimension P ON ((concat((SF.product_key)::varchar, (SF.product_version)::varchar) = concat((P.product_key)::varchar, (P.product_version)::varchar))))
      WHERE (P.product_description ~~* '%wine%'::varchar(6) /*+:v(1)*/)
      GROUP BY GROUPING SETS((SD.store_region, SD.store_city), (SD.store_region), ())
      ORDER BY SD.store_region, sum(SF.gross_profit_dollar_amount) DESC
    (1 row)
    
  4. 查询系统表 DIRECTED_QUERIES 并复制定向查询 RegionalSalesProducts_FlatTables 的带注释查询:

    SELECT input_query FROM directed_queries WHERE query_name = 'RegionalSalesProducts_JoinTables';
    
  5. 使用复制的带注释查询创建自定义定向查询

    => CREATE DIRECTED QUERY CUSTOM RegionalSalesProduct  SELECT /*+verbatim*/ store_sales_wide.store_region AS Region, store_sales_wide.store_city AS City, sum(store_sales_wide.gross_profit_dollar_amount) AS Total
         FROM store.store_sales_wide AS store_sales_wide/*+projs('store.store_sales_wide')*/
         WHERE (store_sales_wide.product_description ~~* '%wine%'::varchar(6) /*+:v(1)*/)
         GROUP BY  /*+GByType(Hash)*/ GROUPING SETS((1, 2), (1), ())
         ORDER BY 1 ASC, 3 DESC;
    CREATE DIRECTED QUERY
    
  6. 激活此定向查询:

    ACTIVATE DIRECTED QUERY RegionalSalesProduct;
    

激活此定向查询后,Vertica 可以将其用于和模板相匹配的输入查询中,不同之处仅在于 product_description 的谓词值:


=> EXPLAIN SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
         FROM store.store_sales_fact SF
         JOIN store.store_dimension SD ON SF.store_key=SD.store_key
         JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
       WHERE P.product_description ILIKE '%tuna%'
       GROUP BY ROLLUP (SD.store_region, SD.store_city)
       ORDER BY Region,Total DESC;
 ...
 The following active directed query(query name: RegionalSalesProduct) is being executed:
 SELECT /*+verbatim*/  store_sales_wide.store_region AS Region, store_sales_wide.store_city AS City, sum(store_sales_wide.gross_profit_dollar_amount) AS Total
  FROM store.store_sales_wide store_sales_wide/*+projs('store.store_sales_wide')*/
  WHERE (store_sales_wide.product_description ~~* '%tuna%'::varchar(6))
  GROUP BY /*+GByType(Hash)*/  GROUPING SETS((store_sales_wide.store_region, store_sales_wide.store_city), (store_sales_wide.store_region), ())
  ORDER BY store_sales_wide.store_region, sum(store_sales_wide.gross_profit_dollar_amount) DESC

 Access Path:
 +-SORT [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
 |  Order: store_sales_wide.store_region ASC, sum(store_sales_wide.gross_profit_dollar_amount) DESC
 |  Execute on: All Nodes
 | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Aggregates: sum(store_sales_wide.gross_profit_dollar_amount)
 | |      Group By: store_sales_wide.store_region, store_sales_wide.store_city
 | |      Grouping Sets: (store_sales_wide.store_region, store_sales_wide.store_city, <SVAR>), (store_sales_wide.store_region, <SVAR>), (<SVAR>)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for store_sales_wide [Cost: 864, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
 | | |      Projection: store.store_sales_wide_b0
 | | |      Materialize: store_sales_wide.gross_profit_dollar_amount, store_sales_wide.store_city, store_sales_wide.store_region
 | | |      Filter: (store_sales_wide.product_description ~~* '%tuna%')
 | | |      Execute on: All Nodes

当您执行此查询时,它会返回以下结果:

  Region   |       City       |  Total
-----------+------------------+---------
 East      |                  | 1564582
 East      | Elizabeth        |  131328
 East      | Allentown        |  129704
 East      | Boston           |  128188
 East      | Sterling Heights |  125877
 East      | Lowell           |  112133
 East      | New Haven        |  101161
 East      | Waterbury        |   85401
 East      | Washington       |   76127
 East      | Erie             |   73002
 East      | Charlotte        |   67850
 East      | Memphis          |   53650
 East      | Clarksville      |   53416
 East      | Hartford         |   52583
 East      | Columbia         |   51950
 East      | Nashville        |   50031
 East      | Manchester       |   48607
 East      | Baltimore        |   48108
 East      | Stamford         |   47302
 East      | New York         |   30840
 East      | Portsmouth       |   26485
 East      | Alexandria       |   26391
 East      | Philadelphia     |   23092
 East      | Cambridge        |   21356
 MidWest   |                  |  980209
 MidWest   | Lansing          |  130044
 MidWest   | Livonia          |  118740
--More--

5.1 - 获取定向查询

可按以下两种方式获取有关定向查询的编录信息:

GET DIRECTED QUERY

GET DIRECTED QUERY 在指定输入查询上查询系统表 DIRECTED_QUERIES。它返回映射到输入查询的所有定向查询的详细信息。

以下 GET DIRECTED QUERY 语句返回映射到以下输入查询 findEmployeesCityJobTitle_OPT 的定向查询:

=> GET DIRECTED QUERY SELECT employee_first_name, employee_last_name from employee_dimension where employee_city='Boston' AND job_title='Cashier' order by employee_last_name, employee_first_name;
-[ RECORD 1 ]---+
query_name      | findEmployeesCityJobTitle_OPT
is_active       | t
vertica_version | Vertica Analytic Database v11.0.1-20210815
comment         | Optimizer-generated directed query
creation_date   | 2021-08-20 14:53:42.323963
annotated_query | SELECT /*+verbatim*/  employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension employee_dimension/*+projs('public.employee_dimension')*/ WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)) ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name

DIRECTED_QUERIES

可以直接查询系统表 DIRECTED_QUERIES。例如:

=> SELECT query_name, is_active FROM V_CATALOG.DIRECTED_QUERIES WHERE query_name ILIKE '%findEmployeesCityJobTitle%';
          query_name           | is_active
-------------------------------+-----------
 findEmployeesCityJobTitle_OPT | t
(1 row)

5.2 - 识别处于活动状态的定向查询

多个定向查询可以映射到同一输入查询。系统表 DIRECTED_QUERIES 中的 is_active 列标识处于活动状态的定向查询。如果多个定向查询对于同一输入查询均处于活动状态,优化器则使用要创建的第一个定向查询。在这种情况下,可使用 EXPLAIN 识别哪个定向查询处于活动状态。

DIRECTED_QUERIES 上的以下查询查找所有处于活动状态的定向查询,且其中的输入查询包含被查询表 employee_dimension 的名称:

=> SELECT * FROM directed_queries WHERE input_query ILIKE ('%employee_dimension%') AND is_active='t';
-[ RECORD 1 ]------+
query_name         | findEmployeesCityJobTitle_OPT
is_active          | t
vertica_version    | Vertica Analytic Database v11.0.1-20210815
comment            | Optimizer-generated directed query
save_plans_version | 0
username           | dbadmin
creation_date      | 2021-08-20 14:53:42.323963
since_date         |
input_query        | SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)) ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name
annotated_query    | SELECT /*+verbatim*/  employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension employee_dimension/*+projs('public.employee_dimension')*/ WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)) ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name

如果对输入查询运行 EXPLAIN,它将返回一个确认使用 findEmployeesCityJobTitle_OPT 作为活动定向查询的查询计划:

=> EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension WHERE employee_city='Boston' AND job_title ='Cashier' ORDER BY employee_last_name, employee_first_name;

 The following active directed query(query name: findEmployeesCityJobTitle_OPT) is being executed:
 SELECT /*+verbatim*/  employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension employee_dimension/*+projs('public.employee_dimension')*/ WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6)) AND (employee_dimension.job_title = 'Cashier'::varchar(7))) ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name

5.3 - 激活和停用定向查询

优化器仅使用处于活动状态的定向查询。如果多个定向查询共享同一输入查询,优化器将使用要创建的第一个查询。

定向查询分别使用 ACTIVATE DIRECTED QUERYDEACTIVATE DIRECTED QUERY 来激活和停用。例如,以下 ACTIVATE DIRECTED QUERY 语句停用 RegionalSalesProducts_JoinTables 并激活 RegionalSalesProduct


=> DEACTIVATE DIRECTED QUERY RegionalSalesProducts_JoinTables;
DEACTIVATE DIRECTED QUERY;
=> ACTIVATE DIRECTED QUERY RegionalSalesProduct;
ACTIVATE DIRECTED QUERY;

Vertica 为所有会话中的给定查询使用处于活动状态的定向查询,除非它被 DEACTIVATE DIRECTED QUERY 明确停用或被 DROP DIRECTED QUERY 从存储空间中移除。如果在数据库关闭时定向查询处于活动状态,当您重新启动数据库时,Vertica 会自动将其重新激活。

定向查询停用后,查询优化器通过使用另一个定向查询(如果存在这样的定向查询)处理输入查询的后续调用。否则,它会生成自己的查询计划。

5.4 - 从编录中导出定向查询

升级到新版本 Vertica 之前,您可以导出定向查询,从而将其用于实现最佳性能对系统而言至关重要的查询。

  1. 配合使用 EXPORT_CATALOG 和实参 DIRECTED_QUERIES 从数据库编录导出所有当前定向查询及其当前激活状态:

    => SELECT EXPORT_CATALOG('../../export_directedqueries', 'DIRECTED_QUERIES');
    EXPORT_CATALOG
    -------------------------------------
    Catalog data exported successfully
    
  2. EXPORT_CATALOG 创建一个脚本以重新创建定向查询,如以下示例所示:

    SAVE QUERY SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)) ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name;
    CREATE DIRECTED QUERY CUSTOM findEmployeesCityJobTitle_OPT COMMENT 'Optimizer-generated directed query' OPTVER 'Vertica Analytic Database v11.0.1-20210815' PSDATE '2021-08-20 14:53:42.323963' SELECT /*+verbatim*/  employee_dimension.employee_first_name, employee_dimension.employee_last_name
    FROM public.employee_dimension employee_dimension/*+projs('public.employee_dimension')*/
    WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/))
    ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name;
    ACTIVATE DIRECTED QUERY findEmployeesCityJobTitle_OPT;
    
  3. 升级完成后,使用 DROP DIRECTED QUERY 从数据库编录中移除每个定向查询。或者,编辑导出脚本并在每个 CREATE DIRECTED QUERY 语句之前插入 DROP DIRECTED QUERY 语句。例如,您可以使用粗体显示的更改来修改先前生成的脚本:

    SAVE QUERY SELECT employee_dimension.employee_first_name, ... ;
    DROP DIRECTED QUERY findEmployeesCityJobTitle_OPT;
    CREATE DIRECTED QUERY CUSTOM findEmployeesCityJobTitle_OPT COMMENT 'Optimizer-generated ... ;
    ACTIVATE DIRECTED QUERY findEmployeesCityJobTitle_OPT;
    
  4. 当您运行此脚本时,Vertica 将重新创建定向查询并还原其激活状态:

    => \i /home/dbadmin/export_directedqueries
    SAVE QUERY
    DROP DIRECTED QUERY
    CREATE DIRECTED QUERY
    ACTIVATE DIRECTED QUERY
    

5.5 - 删除定向查询

DROP DIRECTED QUERY 从数据库编录中移除指定的定向查询。如果该定向查询处于活动状态,Vertica 会在移除前将其停用。

例如:

=> DROP DIRECTED QUERY findBostonCashiers_CUSTOM;
DROP DIRECTED QUERY

6 - 批量导出查询计划

在升级到新 Vertica 版本之前,您可能希望使用定向查询来保存查询计划,以便可能在新数据库中重新使用。您无法预测哪些查询计划可能会重新使用,因此可能要保存很多(或全部)数据库查询的查询计划。但是,每天运行的查询有数百个。通过重复调用 CREATE DIRECTED QUERY 将每个查询计划保存到数据库编录是不切实际的。此外,这样做会显著增加编录大小,可能还会影响性能。

在此情况下,可绕过数据库编录并将查询计划作为定向查询批量导出到外部 SQL 文件。通过为查询计划存储卸载,可以保存当前数据库中任意数量的查询计划,而不影响编录大小和性能。升级之后,您可以决定在新数据库中保留哪些查询计划,并有选择地导入相应的定向查询。

Vertica 提供一组支持此方法的元函数:

  • EXPORT_DIRECTED_QUERIES 从一组输入查询生成查询计划,并编写用于创建封装这些计划的定向查询的 SQL。

  • IMPORT_DIRECTED_QUERIES 将定向查询从由 EXPORT_DIRECTED_QUERIES 生成的 SQL 文件导入数据库编录中。

6.1 - 导出定向查询

可以将任意数量的查询计划作为定向查询批量导出到外部 SQL 文件中,如下所示:

  1. 创建一个 SQL 文件,其中包含要保存其查询计划的输入查询。请参阅下文的输入格式

  2. 对该 SQL 文件调用元函数 EXPORT_DIRECTED_QUERIES。此元函数具有两个实参:

    • 输入查询文件

    • (可选)输出文件的名称。EXPORT_DIRECTED_QUERIES 将用于创建定向查询的 SQL 写入此文件。如果您提供的是空字符串,则 Vertica 会将 SQL 写入标准输出。

例如,以下 EXPORT_DIRECTED_QUERIES 语句指定了输入文件 inputQueries 和输出文件 outputQueries

=> SELECT EXPORT_DIRECTED_QUERIES('/home/dbadmin/inputQueries','/home/dbadmin/outputQueries');
                               EXPORT_DIRECTED_QUERIES
---------------------------------------------------------------------------------------------
 1 queries successfully exported.
Queries exported to /home/dbadmin/outputQueries.

(1 row)

输入文件

您提供给 EXPORT_DIRECTED_QUERIES 的输入文件包含一个或多个输入查询。对于每个输入查询,您可以选择指定两个字段,这两个字段将用于生成的定向查询中:

  • DirQueryName 提供了定向查询的唯一标识符,即一个符合标识符中所述惯例的字符串。

  • DirQueryComment 指定了一个最多 128 个字符并以引号分隔的字符串。

您可以按如下方式设置每个输入查询的格式:

--DirQueryName=query-name
--DirQueryComment='comment'
input-query

例如,一个文件可以指定一个输入查询,如下所示:

/* Query: findEmployeesCityJobTitle_OPT */
/* Comment: This query finds all employees of a given city and job title, ordered by employee name */
SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name, employee_dimension.job_title FROM public.employee_dimension WHERE (employee_dimension.employee_city = 'Boston'::varchar(6)) ORDER BY employee_dimension.job_title;

输出文件

EXPORT_DIRECTED_QUERIES 会将生成用于创建定向查询的 SQL,并将此 SQL 写入到指定文件或标准输出中。对于这两种情况,输出都遵循以下格式:

/* Query: directed-query-name */
/* Comment: directed-query-comment */
SAVE QUERY input-query;
CREATE DIRECTED QUERY CUSTOM 'directed-query-name'
COMMENT 'directed-query-comment'
OPTVER 'vertica-release-num'
PSDATE 'timestamp'
annotated-query

例如,给定前面的输入时,Vertica 会将以下输出写入 /home/dbadmin/outputQueries

/* Query: findEmployeesCityJobTitle_OPT */
/* Comment: This query finds all employees of a given city and job title, ordered by employee name */
SAVE QUERY SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)) ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name;
CREATE DIRECTED QUERY CUSTOM 'findEmployeesCityJobTitle_OPT'
COMMENT 'This query finds all employees of a given city and job title, ordered by employee name'
OPTVER 'Vertica Analytic Database v11.1.0-20220102'
PSDATE '2022-01-06 13:45:17.430254'
SELECT /*+verbatim*/ employee_dimension.employee_first_name AS employee_first_name, employee_dimension.employee_last_name AS employee_last_name
FROM public.employee_dimension AS employee_dimension/*+projs('public.employee_dimension')*/
WHERE (employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)
ORDER BY 2 ASC, 1 ASC;

如果给定的输入查询省略了 DirQueryNameDirQueryComment 字段,则 EXPORT_DIRECTED_QUERIES 会自动生成以下输出:

  • /* Query: Autoname:时间戳 .n */,其中 n 为基于零的整数索引,用于确保具有相同时间戳的自动生成名称具有唯一性。

  • /* Comment: Optimizer-generated directed query */

例如,以下输入文件包含一个 SELECT 语句,并且省略 DirQueryNameDirQueryComment 字段:


SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name
FROM public.employee_dimension WHERE (employee_dimension.employee_city = 'Boston'::varchar(6))
ORDER BY employee_dimension.job_title;

给定这个文件,EXPORT_DIRECTED_QUERIES 会返回一个关于缺少输入字段的警告,它还会将该警告写入错误文件

> SELECT EXPORT_DIRECTED_QUERIES('/home/dbadmin/inputQueries2','/home/dbadmin/outputQueries3');
                                              EXPORT_DIRECTED_QUERIES
--------------------------------------------------------------------------------------------------------------
 1 queries successfully exported.
1 warning message was generated.
Queries exported to /home/dbadmin/outputQueries3.
See error report, /home/dbadmin/outputQueries3.err for details.
(1 row)

输出文件包含以下内容:

/* Query: Autoname:2022-01-06 14:11:23.071559.0 */
/* Comment: Optimizer-generated directed query */
SAVE QUERY SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension WHERE (employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) ORDER BY employee_dimension.job_title;
CREATE DIRECTED QUERY CUSTOM 'Autoname:2022-01-06 14:11:23.071559.0'
COMMENT 'Optimizer-generated directed query'
OPTVER 'Vertica Analytic Database v11.1.0-20220102'
PSDATE '2022-01-06 14:11:23.071559'
SELECT /*+verbatim*/ employee_dimension.employee_first_name AS employee_first_name, employee_dimension.employee_last_name AS employee_last_name
FROM public.employee_dimension AS employee_dimension/*+projs('public.employee_dimension')*/
WHERE (employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/)
ORDER BY employee_dimension.job_title ASC;

错误文件

如果在执行 EXPORT_DIRECTED_QUERIES 期间发生任何错误或警告,它会返回一条类似于以下内容的消息:

1 queries successfully exported.
1 warning message was generated.
Queries exported to /home/dbadmin/outputQueries.
See error report, /home/dbadmin/outputQueries.err for details.

EXPORT_DIRECTED_QUERIES 会将所有错误或警告都写入到一个在与输出文件相同的路径下所创建的文件中,并使用输出文件的基名。

在上一示例中,输出文件名为 /home/dbadmin/outputQueries,因此 EXPORT_DIRECTED_QUERIES 会将错误写入 /home/dbadmin/outputQueries.err

错误文件可以捕获许多错误和警告,例如,可以捕获 EXPORT_DIRECTED_QUERIES 无法创建定向查询的所有实例。在以下示例中,错误文件包含一个警告,指示没有为指定的输入查询提供名称字段,并且还记录了自动为该查询生成的名称:


----------------------------------------------------------------------------------------------------
WARNING: Name field not supplied. Using auto-generated name: 'Autoname:2016-10-13 09:44:33.527548.0'
Input Query: SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name, employee_dimension.job_title FROM public.employee_dimension WHERE (employee_dimension.employee_city = 'Boston'::varchar(6)) ORDER BY employee_dimension.job_title;
END WARNING

6.2 - 导入定向查询

确定您希望在当前数据库中使用的已导出查询计划后,使用 IMPORT_DIRECTED_QUERIES 将其导入。您为此函数提供您使用 EXPORT_DIRECTED_QUERIES 创建的导出文件的名称以及您希望导入的定向查询的名称。例如:


=> SELECT IMPORT_DIRECTED_QUERIES('/home/dbadmin/outputQueries','FindEmployeesBoston');
                                    IMPORT_DIRECTED_QUERIES
------------------------------------------------------------------------------------------
 1 directed queries successfully imported.
To activate a query named 'my_query1':
=> ACTIVATE DIRECTED QUERY 'my_query1';

(1 row)

导入所需的定向查询后,必须使用 ACTIVATE DIRECTED QUERY 将其激活,然后才能使用它们创建查询计划。

7 - 半联接和交叉联接语义

Vertica 优化器在定向查询中使用多个关键字来重新创建交叉联接和半联接子查询。它还支持使用一组额外的关键字来表示复杂交叉联接和半联接。您还可以在直接从 vsql 中执行的查询中使用这些关键字。

有关详细信息,请参阅以下主题:

7.1 - 半联接子查询语义

Vertica 优化器在定向查询中使用多个关键字来重新创建与特定搜索运算符的半联接子查询,例如 ANYNOT IN

  • [SEMI JOIN](#SEMI)
  • [NULLAWARE ANTI JOIN](#NULLAWARE)
  • [SEMIALL JOIN](#SEMIALL)
  • [ANTI JOIN](#ANTI)

SEMI JOIN

重新创建用来执行 semi-join 的查询,其中包含以 INEXISTANY 运算符开头的子查询。

输入查询

SELECT product_description FROM product_dimension
  WHERE product_dimension.product_key IN (SELECT qty_in_stock from inventory_fact);

查询计划


QUERY PLAN DESCRIPTION:
------------------------------

explain SELECT product_description FROM product_dimension WHERE product_dimension.product_key IN (SELECT qty_in_stock from inventory_fact);

 Access Path:
 +-JOIN HASH [Semi] [Cost: 1K, Rows: 30K] (PATH ID: 1) Outer (FILTER) Inner (RESEGMENT)
 |  Join Cond: (product_dimension.product_key = VAL(2))
 |  Materialize at Output: product_dimension.product_description
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for product_dimension [Cost: 152, Rows: 60K] (PATH ID: 2)
 | |      Projection: public.product_dimension
 | |      Materialize: product_dimension.product_key
 | |      Execute on: All Nodes
 | |      Runtime Filter: (SIP1(HashJoin): product_dimension.product_key)
 | +-- Inner -> SELECT [Cost: 248, Rows: 300K] (PATH ID: 3)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for inventory_fact [Cost: 248, Rows: 300K] (PATH ID: 4)
 | | |      Projection: public.inventory_fact_b0
 | | |      Materialize: inventory_fact.qty_in_stock
 | | |      Execute on: All Nodes

优化器生成的带注释查询

SELECT /*+ syntactic_join */ product_dimension.product_description AS product_description
FROM (public.product_dimension AS product_dimension/*+projs('public.product_dimension')*/ 
SEMI JOIN /*+Distrib(F,R),JType(H)*/ (SELECT inventory_fact.qty_in_stock AS qty_in_stock
FROM public.inventory_fact AS inventory_fact/*+projs('public.inventory_fact')*/) AS subQ_1
ON (product_dimension.product_key = subQ_1.qty_in_stock))

NULLAWARE ANTI JOIN

重新创建用来执行 null-aware anti-join 的查询,其中包含以 NOT IN!=ALL 运算符开头的子查询。

输入查询

SELECT product_description FROM product_dimension
WHERE product_dimension.product_key NOT IN (SELECT qty_in_stock from inventory_fact);

查询计划

 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT product_description FROM product_dimension WHERE product_dimension.product_key not IN (SELECT qty_in_sto
ck from inventory_fact);

 Access Path:
 +-JOIN HASH [Anti][NotInAnti] [Cost: 7K, Rows: 30K] (PATH ID: 1) Inner (BROADCAST)
 |  Join Cond: (product_dimension.product_key = VAL(2))
 |  Materialize at Output: product_dimension.product_description
 |  Execute on: Query Initiator
 | +-- Outer -> STORAGE ACCESS for product_dimension [Cost: 152, Rows: 60K] (PATH ID: 2)
 | |      Projection: public.product_dimension_DBD_2_rep_VMartDesign
 | |      Materialize: product_dimension.product_key
 | |      Execute on: Query Initiator
 | +-- Inner -> SELECT [Cost: 248, Rows: 300K] (PATH ID: 3)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for inventory_fact [Cost: 248, Rows: 300K] (PATH ID: 4)
 | | |      Projection: public.inventory_fact_DBD_9_seg_VMartDesign_b0
 | | |      Materialize: inventory_fact.qty_in_stock
 | | |      Execute on: All Nodes

优化器生成的带注释查询

SELECT /*+ syntactic_join */ product_dimension.product_description AS product_description
FROM (public.product_dimension AS product_dimension/*+projs('public.product_dimension')*/ 
NULLAWARE ANTI JOIN /*+Distrib(L,B),JType(H)*/ (SELECT inventory_fact.qty_in_stock AS qty_in_stock
FROM public.inventory_fact AS inventory_fact/*+projs('public.inventory_fact')*/) AS subQ_1
ON (product_dimension.product_key = subQ_1.qty_in_stock))

SEMIALL JOIN

重新创建用来执行 semi-all join 的查询,其中包含以 ALL 运算符开头的子查询。

输入查询

SELECT product_key, product_description FROM product_dimension
  WHERE product_dimension.product_key > ALL (SELECT product_key from inventory_fact);

查询计划


QUERY PLAN DESCRIPTION:
------------------------------

explain SELECT product_key, product_description FROM product_dimension WHERE product_dimension.product_key > ALL (SELECT product_key from inventory_fact);

Access Path:
+-JOIN HASH [Semi][All] [Cost: 7M, Rows: 30K] (PATH ID: 1) Outer (FILTER) Inner (BROADCAST)
|  Join Filter: (product_dimension.product_key > VAL(2))
|  Materialize at Output: product_dimension.product_description
|  Execute on: All Nodes
| +-- Outer -> STORAGE ACCESS for product_dimension [Cost: 152, Rows: 60K] (PATH ID: 2)
| |      Projection: public.product_dimension
| |      Materialize: product_dimension.product_key
| |      Execute on: All Nodes
| +-- Inner -> SELECT [Cost: 248, Rows: 300K] (PATH ID: 3)
| |      Execute on: All Nodes
| | +---> STORAGE ACCESS for inventory_fact [Cost: 248, Rows: 300K] (PATH ID: 4)
| | |      Projection: public.inventory_fact_b0
| | |      Materialize: inventory_fact.product_key
| | |      Execute on: All Nodes

优化器生成的带注释查询


SELECT /*+ syntactic_join */ product_dimension.product_key AS product_key, product_dimension.product_description AS product_description
FROM (public.product_dimension AS product_dimension/*+projs('public.product_dimension')*/ 
SEMIALL JOIN /*+Distrib(F,B),JType(H)*/ (SELECT inventory_fact.product_key AS product_key FROM public.inventory_fact AS inventory_fact/*+projs('public.inventory_fact')*/) AS subQ_1
ON (product_dimension.product_key > subQ_1.product_key))

ANTI JOIN

重新创建用来执行 anti-join 的查询,其中包含以 NOT EXISTS 运算符开头的子查询。

输入查询

SELECT product_key, product_description FROM product_dimension
WHERE NOT EXISTS (SELECT inventory_fact.product_key FROM inventory_fact
WHERE inventory_fact.product_key=product_dimension.product_key);

查询计划


 QUERY PLAN DESCRIPTION:
 ------------------------------

 explain SELECT product_key, product_description FROM product_dimension WHERE NOT EXISTS (SELECT inventory_fact.product_
key FROM inventory_fact WHERE inventory_fact.product_key=product_dimension.product_key);

 Access Path:
 +-JOIN HASH [Anti] [Cost: 703, Rows: 30K] (PATH ID: 1) Outer (FILTER)
 |  Join Cond: (VAL(1) = product_dimension.product_key)
 |  Materialize at Output: product_dimension.product_description
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for product_dimension [Cost: 152, Rows: 60K] (PATH ID: 2)
 | |      Projection: public.product_dimension_DBD_2_rep_VMartDesign
 | |      Materialize: product_dimension.product_key
 | |      Execute on: All Nodes
 | +-- Inner -> SELECT [Cost: 248, Rows: 300K] (PATH ID: 3)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for inventory_fact [Cost: 248, Rows: 300K] (PATH ID: 4)
 | | |      Projection: public.inventory_fact_DBD_9_seg_VMartDesign_b0
 | | |      Materialize: inventory_fact.product_key
 | | |      Execute on: All Nodes

优化器生成的带注释查询

SELECT /*+ syntactic_join */ product_dimension.product_key AS product_key, product_dimension.product_description AS product_description
FROM (public.product_dimension AS product_dimension/*+projs('public.product_dimension')*/
ANTI JOIN /*+Distrib(F,L),JType(H)*/ (SELECT inventory_fact.product_key AS "inventory_fact.product_key"
FROM public.inventory_fact AS inventory_fact/*+projs('public.inventory_fact')*/) AS subQ_1
ON (subQ_1."inventory_fact.product_key" = product_dimension.product_key))

7.2 - 复杂联接语义

优化器使用一组关键字来表达复杂交叉联接半联接。所有复杂联接都由关键字 COMPLEX 指示,该关键字插入在关键字 JOIN 之前,例如 CROSS COMPLEX JOIN。复杂半联接的语义有一个额外的要求,在下面有详细说明。

复杂交叉联接

Vertica 使用关键字短语 CROSS COMPLEX JOIN 来描述所有的复杂交叉联接。例如:

输入查询

SELECT
  (SELECT max(sales_quantity) FROM store.store_sales_fact) *
  (SELECT max(sales_quantity) FROM online_sales.online_sales_fact);

查询计划


 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT
   (SELECT max(sales_quantity) FROM store.store_sales_fact) *
   (SELECT max(sales_quantity) FROM online_sales.online_sales_fact);

 Access Path:
 +-JOIN  (CROSS JOIN) [Cost: 4K, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
 |  Execute on: Query Initiator
 | +-- Outer -> JOIN  (CROSS JOIN) [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 2)
 | |      Execute on: Query Initiator
 | | +-- Outer -> STORAGE ACCESS for dual [Cost: 10, Rows: 1] (PATH ID: 3)
 | | |      Projection: v_catalog.dual_p
 | | |      Materialize: dual.dummy
 | | |      Execute on: Query Initiator
 | | +-- Inner -> SELECT [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 4)
 | | |      Execute on: Query Initiator
 | | | +---> GROUPBY NOTHING [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 5)
 | | | |      Aggregates: max(store_sales_fact.sales_quantity)
 | | | |      Execute on: All Nodes
 | | | | +---> STORAGE ACCESS for store_sales_fact [Cost: 1K, Rows: 5M (NO STATISTICS)] (PATH ID: 6)
 | | | | |      Projection: store.store_sales_fact_super
 | | | | |      Materialize: store_sales_fact.sales_quantity
 | | | | |      Execute on: All Nodes
 | +-- Inner -> SELECT [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 7)
 | |      Execute on: Query Initiator
 | | +---> GROUPBY NOTHING [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 8)
 | | |      Aggregates: max(online_sales_fact.sales_quantity)
 | | |      Execute on: All Nodes
 | | | +---> STORAGE ACCESS for online_sales_fact [Cost: 1K, Rows: 5M (NO STATISTICS)] (PATH ID: 9)
 | | | |      Projection: online_sales.online_sales_fact_super
 | | | |      Materialize: online_sales_fact.sales_quantity
 | | | |      Execute on: All Nodes

优化器生成的带注释查询

以下带注释的查询返回与前面显示的输入查询相同的结果。与优化器生成的所有带注释查询一样,您可以直接在 vsql 中执行此查询(无论是按照编写还是修改后):

SELECT /*+syntactic_join,verbatim*/ (subQ_1.max * subQ_2.max) AS "?column?"
FROM ((v_catalog.dual AS dual CROSS COMPLEX JOIN /*+Distrib(L,L),JType(H)*/
(SELECT max(store_sales_fact.sales_quantity) AS max
FROM store.store_sales_fact AS store_sales_fact/*+projs('store.store_sales_fact')*/) AS subQ_1 )
CROSS COMPLEX JOIN /*+Distrib(L,L),JType(H)*/ (SELECT max(online_sales_fact.sales_quantity) AS max
FROM online_sales.online_sales_fact AS online_sales_fact/*+projs('online_sales.online_sales_fact')*/) AS subQ_2 )

复杂半联接

复杂半联接由以下关键字之一表达:

  • SEMI COMPLEX JOIN

  • NULLAWARE ANTI COMPLEX JOIN

  • SEMIALL COMPLEX JOIN

  • ANTI COMPLEX JOIN

额外要求适用于所有复杂半联接:每个子查询的 SELECT 列表都以调用 Vertica 元函数 complex_join_marker() 的虚拟列(标记为 false)结尾。当子查询处理每一行时,complex_join_marker() 返回 truefalse 以指示结果集中包含或排除该行。结果集与这个标志一起返回到外部查询,外部查询可以使用来自这个和其他子查询中的标志来筛选它自己的结果集。

例如,查询优化器将以下输入查询重写为 NULLAWARE ANTI COMPLEX JOIN。该联接返回子查询中其 complex_join_marker() 标志设置为适当布尔值的所有行。

输入查询

SELECT product_dimension.product_description FROM public.product_dimension
   WHERE (NOT (product_dimension.product_key NOT IN (SELECT inventory_fact.qty_in_stock FROM public.inventory_fact)));

查询计划

 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT product_dimension.product_description FROM public.product_dimension
   WHERE (NOT (product_dimension.product_key NOT IN (SELECT inventory_fact.qty_in_stock FROM public.inventory_fact)));

 Access Path:
 +-JOIN HASH [Anti][NotInAnti] [Cost: 3K, Rows: 30K] (PATH ID: 1) Inner (BROADCAST)
 |  Join Cond: (product_dimension.product_key = VAL(2))
 |  Materialize at Output: product_dimension.product_description
 |  Filter: (NOT VAL(2))
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for product_dimension [Cost: 56, Rows: 60K] (PATH ID: 2)
 | |      Projection: public.product_dimension_super
 | |      Materialize: product_dimension.product_key
 | |      Execute on: All Nodes
 | +-- Inner -> SELECT [Cost: 248, Rows: 300K] (PATH ID: 3)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for inventory_fact [Cost: 248, Rows: 300K] (PATH ID: 4)
 | | |      Projection: public.inventory_fact_super
 | | |      Materialize: inventory_fact.qty_in_stock
 | | |      Execute on: All Nodes

优化器生成的带注释查询

以下带注释的查询返回与前面显示的输入查询相同的结果。与优化器生成的所有带注释查询一样,您可以直接在 vsql 中执行此查询(无论是按照编写还是修改后)。例如,您可以通过修改外部查询的谓词对 subQ_1."false" 标志的求值方式来控制外部查询的输出。

SELECT /*+syntactic_join,verbatim*/ product_dimension.product_description AS product_description
FROM (public.product_dimension AS product_dimension/*+projs('public.product_dimension')*/
NULLAWARE ANTI COMPLEX JOIN /*+Distrib(L,B),JType(H)*/
   (SELECT inventory_fact.qty_in_stock AS qty_in_stock, complex_join_marker() AS "false"
    FROM public.inventory_fact AS inventory_fact/*+projs('public.inventory_fact')*/) AS subQ_1
ON (product_dimension.product_key = subQ_1.qty_in_stock)) WHERE (NOT subQ_1."false")

8 - 限制

定向查询支持各种查询,但也有一些例外。Vertica 通过优化器生成的警告处理所有例外。以下部分将这些限制分为几类。

表和投影

存在以下限制:

  • 优化器生成的定向查询不支持引用系统表或数据收集器表的查询。例外情况:V_CATALOG.DUAL 的显式和隐式引用。

  • 优化器生成的定向查询不支持所含表内具有访问策略的查询。

  • 定向查询不支持不含投影的表。

函数

不支持包括以下函数的查询:

运算符和子句

不支持包括以下内容的查询:

  • WITH 子句(启用实体化时

  • 包括日期/时间字面量的查询,这些字面量引用当前时间,如 NOWYESTERDAY

数据类型

不支持包括以下数据类型的查询: