这是本节的多页打印视图。 点击此处打印.

返回本页常规视图.

管理查询

本部分包含以下主题:

  • 查询计划:介绍 Vertica 如何创建和使用查询计划,从而优化对 Vertica 数据库中信息的访问。

  • 定向查询:显示如何保存查询计划信息。

1 - 查询计划

提交查询时,查询优化器序会迅速选择要使用的投影,优化并计划查询执行,以及将 SQL 语句记录到其日志中。此计划会产生一个用来映射出查询所执行步骤的查询计划

查询计划是一系列类似步骤的 路径,Vertica 基于成本的查询优化器使用这些路径执行查询。Vertica 可以为给定查询生成不同的查询计划。对于每个查询计划,查询优化器都评估要查询的数据:行数、列统计,如不同值的数量(基数)、数据在各个节点上的分布。它还评估可用资源,如 CPU 和网络拓扑以及其他环境因素。查询优化器使用此信息制定多个潜在计划。然后它比较各个计划并选择一个,该计划通常为具有最低成本的计划。

优化器将查询计划细分为更小的本地计划并将它们分发到 执行程序节点

在查询计划的最后执行阶段,启动程序节点执行以下任务:

  • 通过分组操作将结果组合起来。

  • 将来自所有执行程序的多个已排序部分结果集合并起来。

  • 设置结果格式,以返回给客户端。

在执行查询之前,可以通过将查询嵌入到 EXPLAIN 语句中来查看其计划;也可以在管理控制台中查看它。

1.1 - 查看查询计划

可以用以下两种方式获得查询计划:

您还可以通过查询系统表 QUERY_PLAN_PROFILES,观察流经查询计划的实时数据流。有关详细信息,请参阅分析查询计划

EXPLAIN 输出选项

默认情况下,EXPLAIN 输出将查询计划表示为层次结构,其中的每个级别或 路径都表示优化器用于执行查询的一个数据库操作。 EXPLAIN 输出还会附加 DOT 语言源,因此您可以使用开源 Graphviz 工具以图形方式显示此输出。

EXPLAIN 支持生成详细JSON 输出的选项。还可以显示分配给每个节点的本地查询计划,它们共同组成整个(全局)查询计划。

EXPLAIN 还支持 ANNOTATED 选项。 EXPLAIN ANNOTATED 返回带有嵌入式优化器提示的查询,这些提示会封装此查询的查询计划。有关使用示例,请参阅结合使用优化器生成的定向查询和自定义定向查询

1.1.1 - EXPLAIN 生成的查询计划

EXPLAIN 返回用于执行指定查询的优化器查询计划。例如:

 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT customer_name, customer_state FROM customer_dimension WHERE customer_state IN ('MA','NH') AND customer_gender='Male' ORDER BY customer_name LIMIT 10;

 Access Path:
 +-SELECT  LIMIT 10 [Cost: 365, Rows: 10] (PATH ID: 0)
 |  Output Only: 10 tuples
 |  Execute on: Query Initiator
 | +---> SORT [TOPK] [Cost: 365, Rows: 544] (PATH ID: 1)
 | |      Order: customer_dimension.customer_name ASC
 | |      Output Only: 10 tuples
 | |      Execute on: Query Initiator
 | | +---> STORAGE ACCESS for customer_dimension [Cost: 326, Rows: 544] (PATH ID: 2)
 | | |      Projection: public.customer_dimension_DBD_1_rep_VMartDesign_node0001
 | | |      Materialize: customer_dimension.customer_state, customer_dimension.customer_name
 | | |      Filter: (customer_dimension.customer_gender = 'Male')
 | | |      Filter: (customer_dimension.customer_state = ANY (ARRAY['MA', 'NH']))
 | | |      Execute on: Query Initiator
 | | |      Runtime Filter: (SIP1(TopK): customer_dimension.customer_name)

您可以使用 EXPLAIN 评估优化器做出的关于给定查询的选择。如果您认为查询性能未达到最佳效果,请通过 Database Designer 运行该查询。有关详细信息,请参阅增量设计缩短查询运行时间

1.1.2 - JSON 格式的查询计划

EXPLAIN JSON 以 JSON 格式返回查询计划。例如:


=> EXPLAIN JSON SELECT customer_name, customer_state FROM customer_dimension
     WHERE customer_state IN ('MA','NH') AND customer_gender='Male' ORDER BY customer_name LIMIT 10;
 ------------------------------
{
     "PARAMETERS" : {
         "QUERY_STRING" : "EXPLAIN JSON SELECT customer_name, customer_state FROM customer_dimension \n
         WHERE customer_state IN ('MA','NH') AND customer_gender='Male' ORDER BY customer_name LIMIT 10;"
     },
     "PLAN" : {
         "PATH_ID" : 0,
         "PATH_NAME" : "SELECT",
         "EXTRA" : " LIMIT 10",
         "COST" : 2114.000000,
         "ROWS" : 10.000000,
         "COST_STATUS" : "NO_STATISTICS",
         "TUPLE_LIMIT" : 10,
         "EXECUTE_NODE" : "Query Initiator",
         "INPUT" : {
             "PATH_ID" : 1,
             "PATH_NAME" : "SORT",
             "EXTRA" : "[TOPK]",
             "COST" : 2114.000000,
             "ROWS" : 49998.000000,
             "COST_STATUS" : "NO_STATISTICS",
             "ORDER" : ["customer_dimension.customer_name", "customer_dimension.customer_state"],
             "TUPLE_LIMIT" : 10,
             "EXECUTE_NODE" : "All Nodes",
             "INPUT" : {
                 "PATH_ID" : 2,
                 "PATH_NAME" : "STORAGE ACCESS",
                 "EXTRA" : "for customer_dimension",
                 "COST" : 252.000000,
                 "ROWS" : 49998.000000,
                 "COST_STATUS" : "NO_STATISTICS",
                 "TABLE" : "public.customer_dimension",
                 "PROJECTION" : "public.customer_dimension_b0",
                 "MATERIALIZE" : ["customer_dimension.customer_name", "customer_dimension.customer_state"],
                 "FILTER" : ["(customer_dimension.customer_state = ANY (ARRAY['MA', 'NH']))", "(customer_dimension.customer_gender = 'Male')"],
                 "EXECUTE_NODE" : "All Nodes"
          "SIP" : "Runtime Filter: (SIP1(TopK): customer_dimension.customer_name)"

             }
         }
     }
 }
(40 rows)

1.1.3 - 详细查询计划

可使用 VERBOSE 选项对 EXPLAIN 进行限定。此选项对默认和 JSON 输出有效,可增加所呈现的查询计划的详细信息量

例如,以下 EXPLAIN 语句指定生成详细输出。添加的信息将以粗体显示:


=> EXPLAIN VERBOSE SELECT customer_name, customer_state FROM customer_dimension
     WHERE customer_state IN ('MA','NH') AND customer_gender='Male' ORDER BY customer_name LIMIT 10;

QUERY PLAN DESCRIPTION:
 ------------------------------

 Opt Vertica Options
 --------------------
 PLAN_OUTPUT_SUPER_VERBOSE

 EXPLAIN VERBOSE SELECT customer_name, customer_state FROM customer_dimension
 WHERE customer_state IN ('MA','NH') AND customer_gender='Male'
 ORDER BY customer_name LIMIT 10;

Access Path:
+-SELECT  LIMIT 10 [Cost: 756.000000, Rows: 10.000000 Disk(B): 0.000000 CPU(B): 0.000000 Memory(B): 0.000000 Netwrk(B): 0.000000 Parallelism: 1.000000] [OutRowSz (B): 274](PATH ID: 0)
|  Output Only: 10 tuples
|  Execute on: Query Initiator
|  Sort Key: (customer_dimension.customer_name)
|  LDISTRIB_UNSEGMENTED
| +---> SORT [TOPK] [Cost: 756.000000, Rows: 9998.000000 Disk(B): 0.000000 CPU(B): 34274697.123457 Memory(B): 2739452.000000 Netwrk(B): 0.000000 Parallelism: 4.000000 (NO STATISTICS)] [OutRowSz (B): 274] (PATH ID: 1)
| |      Order: customer_dimension.customer_name ASC
| |      Output Only: 10 tuples
| |      Execute on: Query Initiator
| |      Sort Key: (customer_dimension.customer_name)
| |      LDISTRIB_UNSEGMENTED
| | +---> STORAGE ACCESS for customer_dimension [Cost: 513.000000, Rows: 9998.000000 Disk(B): 0.000000 CPU(B): 0.000000 Memory(B): 0.000000 Netwrk(B): 0.000000 Parallelism: 4.000000 (NO STATISTICS)] [OutRowSz (B): 274] (PATH ID: 2)
| | |      Column Cost Aspects: [ Disk(B): 7371817.156569 CPU(B): 4914708.578284 Memory(B): 2659466.004399 Netwrk(B): 0.000000 Parallelism: 4.000000 ]
| | |      Projection: public.customer_dimension_P1
| | |      Materialize: customer_dimension.customer_state, customer_dimension.customer_name
| | |      Filter: (customer_dimension.customer_gender = 'Male')/* sel=0.999800 ndv= 500 */
| | |      Filter: (customer_dimension.customer_state = ANY (ARRAY['MA', 'NH']))/* sel=0.999800 ndv= 500 */
| | |      Execute on: All Nodes
| | |      Runtime Filter: (SIP1(TopK): customer_dimension.customer_name)
| | |      Sort Key: (customer_dimension.household_id, customer_dimension.customer_key, customer_dimension.store_membership_card, customer_dimension.customer_type, customer_dimension.customer_region, customer_dimension.title, customer_dimension.number_of_children)
| | |      LDISTRIB_SEGMENTED

1.1.4 - 本地查询计划

EXPLAIN LOCAL (在多节点数据库中)显示分配给每一节点的本地查询计划,它们共同组成整个(全局)查询计划。如果您省略此选项,Vertica 将仅显示全局查询计划。本地查询计划仅以可在 Graphviz 中呈现的 DOT 语言源显示。

例如,以下 EXPLAIN 语句包括 LOCAL 选项:

=> EXPLAIN LOCAL SELECT store_name, store_city, store_state
   FROM store.store_dimension ORDER BY store_state ASC, store_city ASC;

该输出包括 GraphViz 源,用于描述分配给每个节点的本地查询计划。例如,在三节点数据库上,此语句的输出包括一个节点 (v_vmart_node0003) 的以下查询计划的 GraphViz 描述:

 -----------------------------------------------
 PLAN: v_vmart_node0003 (GraphViz Format)
 -----------------------------------------------
 digraph G {
 graph [rankdir=BT, label = "v_vmart_node0003\n", labelloc=t, labeljust=l ordering=out]
 0[label = "NewEENode \nOutBlk=[UncTuple(3)]", color = "green", shape = "box"];
 1[label = "Send\nSend to: v_vmart_node0001\nNet id: 1000\nMerge\n\nUnc: Char(2)\nUnc: Varchar(64)\nUnc: Varchar(64)", color = "green", shape = "box"];
 2[label = "Sort: (keys = A,A,N)\nUnc: Char(2)\nUnc: Varchar(64)\nUnc: Varchar(64)", color = "green", shape = "box"];
 3[label = "ExprEval: \n  store_dimension.store_state\n  store_dimension.store_city\n  store_dimension.store_name\nUnc: Char(2)\nUnc: Varchar(64)\nUnc: Varchar(64)
", color = "green", shape = "box"];
 4[label = "StorageUnionStep: store_dimension_p_b0\nUnc: Varchar(64)\nUnc: Varchar(64)\nUnc: Char(2)", color = "purple", shape = "box"];
 5[label = "ScanStep: store_dimension_p_b0\nstore_key (not emitted)\nstore_name\nstore_city\nstore_state\nUnc: Varchar(64)\nUnc: Varchar(64)\nUnc: Char(2)", color
= "brown", shape = "box"];
 1->0 [label = "0",color = "blue"];
 2->1 [label = "0",color = "blue"];
 3->2 [label = "0",color = "blue"];
 4->3 [label = "0",color = "blue"];
 5->4 [label = "0",color = "blue"];
 }

GraphViz 呈现的此输出如下所示:

1.2 - 查询计划成本预估

查询优化器根据成本预估选择查询计划,并使用来自各种来源的信息开发潜在计划和确定其相对成本。这些包括:

  • 表的行数

  • 列统计信息,包括:不同值(基数)的数量、最小/最大值、值分布和磁盘空间使用情况

  • 可能需要的 I/O 操作最少、CPU 最低、内存最少、网络使用最少的访问路径

  • 可用的合适投影

  • 联接选项:联接类型(合并与散列联接)、联接顺序

  • 查询谓词

  • 群集节点上的数据分段

许多重要的优化器决策均依赖于统计信息,查询优化器使用这些信息来确定执行查询的最终计划。因此,统计信息保持最新非常重要。没有合理的准确统计信息,优化器可能选择不理想的计划,从而导致查询性能受影响。

Vertica 提供有关查询计划中的统计信息的提示。请参阅查询计划统计信息

成本与执行运行时间

尽管成本与查询运行时间相关,但它们并不能 预估 出实际运行时间。例如,如果优化器确定计划 A 的成本为计划 B 的两倍,则计划 A 的运行时间可能会更长。但是,此成本预估并不一定表示计划 A 的运行时间是计划 B 的两倍。

此外,不同查询的计划成本不能直接相比。例如,如果查询 1 的计划 X 的估计成本高于查询 2 的计划 Y 的成本,计划 X 的运行时间不一定就比计划 Y 的运行时间长。

1.3 - 查询计划信息和结构

根据查询和数据库架构的情况,EXPLAIN 输出会包含以下信息:

  • 语句引用的表

  • 估计成本

  • 估计行基数

  • 路径 ID,一个用于链接至错误消息和分析计数器的整数,以便您可以更轻松地解决性能问题。有关详细信息,请参阅分析查询计划

  • 数据操作,如 SORTFILTERLIMITGROUP BY

  • 使用的投影

  • 关于统计的信息—例如,它们是当前的还是超出范围

  • 查询中为操作选择的算法,如 HASH/MERGEGROUPBY HASH/GROUPBY PIPELINED

  • 群集节点间的数据再分发(广播、分段)

示例

在下面的 EXPLAIN 输出中,优化器用三个步骤处理查询,每个步骤通过唯一路径 ID 标识:

  • 0:限制

  • 1:排序

  • 2:存储空间访问和筛选

1.3.1 - 查询计划统计信息

如果您查询统计信息不可用或已过时的表,优化器可能会选择不理想的查询计划。

可通过调用 ANALYZE_STATISTICS 来解决与表统计信息相关的许多问题。此函数可用于更新不同范围的统计信息:一个或多个表列、单个表或所有数据库表。

如果您更新统计信息后发现查询执行仍不理想,请通过 Database Designer 运行查询并选择增量设计作为设计类型。

有关更新数据库统计信息的详细信息,请参阅收集数据库统计信息

查询计划中的统计信息提示

查询计划可通过以下两个提示包含表统计信息相关信息: NO STATISTICSSTALE STATISTICS。例如,以下查询计划片段包括 NO STATISTICS,表示直方图不可用:

| | +-- Outer -> STORAGE ACCESS for fact [Cost: 604, Rows: 10K (NO STATISTICS)]

以下查询计划片段包括 STALE STATISTICS,表示谓词已超出直方图范围:

| | +-- Outer -> STORAGE ACCESS for fact [Cost: 35, Rows: 1 (STALE STATISTICS)]

1.3.2 - Cost 和 Row 路径

以下 EXPLAIN 输出显示 Cost 运算符:

 Access Path: +-SELECT  LIMIT 10 [Cost: 370, Rows: 10] (PATH ID: 0)
 |  Output Only: 10 tuples
 |  Execute on: Query Initiator
 | +---> SORT [Cost: 370, Rows: 544] (PATH ID: 1)
 | |      Order: customer_dimension.customer_name ASC
 | |      Output Only: 10 tuples
 | |      Execute on: Query Initiator
 | | +---> STORAGE ACCESS for customer_dimension [Cost: 331, Rows: 544] (PATH ID: 2)
 | | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | | |      Materialize: customer_dimension.customer_state, customer_dimension.customer_name
 | | |      Filter: (customer_dimension.customer_gender = 'Male')
 | | |      Filter: (customer_dimension.customer_state = ANY (ARRAY['MA', 'NH']))
 | | |      Execute on: Query Initiator

Row 运算符是优化器估计查询将要返回的行数。数字后面的字母表示度量单位(K=千、M=百万、B=十亿、T=万亿),因此,以下查询的输出指示要返回的行数为 50 K

=> EXPLAIN SELECT customer_gender FROM customer_dimension;
 Access Path:
 +-STORAGE ACCESS for customer_dimension [Cost: 17, Rows: 50K (3 RLE)] (PATH ID: 1)
 |  Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 |  Materialize: customer_dimension.customer_gender
 |  Execute on: Query Initiator

STORAGE ACCESS 路径中对 (3 RLE) 的引用表明优化器估计存储访问运算符返回的行数为 50K。由于列为运行长度编码 (RLE),因此返回的 RLE 实际行数仅为三行:

  • 1 行用于女性

  • 1 行用于男性

  • 1 行用于表示未知 (NULL) 性别

1.3.3 - Projection 路径

通过查看文本输出中的 Projection 路径,您可以了解优化器为查询计划选择了哪些 投影

EXPLAIN SELECT
   customer_name,
   customer_state
 FROM customer_dimension
 WHERE customer_state in ('MA','NH')
 AND customer_gender = 'Male'
 ORDER BY customer_name
 LIMIT 10;
 Access Path:
 +-SELECT  LIMIT 10 [Cost: 370, Rows: 10] (PATH ID: 0)
 |  Output Only: 10 tuples
 |  Execute on: Query Initiator
 | +---> SORT [Cost: 370, Rows: 544] (PATH ID: 1)
 | |      Order: customer_dimension.customer_name ASC
 | |      Output Only: 10 tuples
 | |      Execute on: Query Initiator
 | | +---> STORAGE ACCESS for customer_dimension [Cost: 331, Rows: 544] (PATH ID: 2)
 | | |      Projection: public.customer_dimension_DBD_1_rep_vmart_vmart_node0001
 | | |      Materialize: customer_dimension.customer_state, customer_dimension.customer_name
 | | |      Filter: (customer_dimension.customer_gender = 'Male')
 | | |      Filter: (customer_dimension.customer_state = ANY (ARRAY['MA', 'NH']))
 | | |      Execute on: Query Initiator

查询优化器会自动选取最佳投影,但如果没有合理准确的统计信息,则优化器为查询选择的投影或联接顺序可能会不太理想。有关详细信息,请参阅收集统计信息

Vertica 在为计划选择投影时会考虑以下几个方面:

  • 如何在查询中联接各个列

  • 如何对投影进行分组或排序

  • 是否应用 SQL 分析操作

  • 磁盘上投影存储中的所有列信息

当 Vertica 评估每个计划的可能性时,初始成本较高的投影可能会进入到最终计划之中,因为这些投影会减少联接开销。例如,可能会为某个查询提供许多可能的计划,优化器在从中作出选择之前会对这些计划进行评估。为提高效率,优化器会使用复杂的算法删除成本较高的中间部分计划片段。优化器知道中间计划片段可能最初看起来很糟糕(由于存储访问成本很高),但可能会因为它所允许的其他优化而产生出色的最终计划。

如果在统计信息已是最新的情况下查询仍然执行欠佳,请通过 Database Designer 运行该查询。有关详细信息,请参阅增量设计

提示

  • 要对已分段的不同投影进行测试,请在查询中按名称引用投影。

  • 要获得最佳性能,请将查询编写为将列按照投影列的排序方式进行排序。

另请参阅

1.3.4 - Join 路径

与引用两个或更多表的联接查询相似,查询计划中的 Join 步骤有两个输入分支:

  • 左输入,其为联接的外部表

  • 右输入,其为联接的内部表

在以下查询中,T1 表为左输入,因为它在 JOIN 关键字的左侧;T2 表为右输入,因为它在 JOIN 关键字的右侧:

SELECT * FROM T1 JOIN T2 ON T1.x = T2.x;

外部联接与内部联接

如果使用小型表作为联接的内部输入,则查询性能会更好。查询优化器会自动对联接的输入进行重新排序以确保这种情况,除非相关联接为外部联接。

以下示例显示左外部联接的查询及其计划:

=> EXPLAIN SELECT CD.annual_income,OSI.sale_date_key
-> FROM online_sales.online_sales_fact OSI
-> LEFT OUTER JOIN customer_dimension CD ON CD.customer_key = OSI.customer_key;
 Access Path:
 +-JOIN HASH [LeftOuter] [Cost: 4K, Rows: 5M] (PATH ID: 1)
 |  Join Cond: (CD.customer_key = OSI.customer_key)
 |  Materialize at Output: OSI.sale_date_key
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 2)
 | |      Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
 | |      Materialize: OSI.customer_key
 | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for CD [Cost: 264, Rows: 50K] (PATH ID: 3)
 | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | |      Materialize: CD.annual_income, CD.customer_key
 | |      Execute on: All Nodes

以下示例显示完整外部联接的查询及其计划:

=> EXPLAIN SELECT CD.annual_income,OSI.sale_date_key
-> FROM online_sales.online_sales_fact OSI
-> FULL OUTER JOIN customer_dimension CD ON CD.customer_key = OSI.customer_key;
 Access Path:
 +-JOIN HASH [FullOuter] [Cost: 18K, Rows: 5M] (PATH ID: 1) Outer (RESEGMENT) Inner (FILTER)
 |  Join Cond: (CD.customer_key = OSI.customer_key)
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 2)
 | |      Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
 | |      Materialize: OSI.sale_date_key, OSI.customer_key
 | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for CD [Cost: 264, Rows: 50K] (PATH ID: 3)
 | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | |      Materialize: CD.annual_income, CD.customer_key
 | |      Execute on: All Nodes

散列联接和合并联接

Vertica 有两个联接算法可供选择:合并联接和散列联接。系统中给定查询和投影时,优化器会自动选择最适合的算法。

对于以下查询,优化器会选择散列联接。

=> EXPLAIN SELECT CD.annual_income,OSI.sale_date_key
-> FROM online_sales.online_sales_fact OSI
-> INNER JOIN customer_dimension CD ON CD.customer_key = OSI.customer_key;
 Access Path:
 +-JOIN HASH [Cost: 4K, Rows: 5M] (PATH ID: 1)
 |  Join Cond: (CD.customer_key = OSI.customer_key)
 |  Materialize at Output: OSI.sale_date_key
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 2)
 | |      Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
 | |      Materialize: OSI.customer_key
 | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for CD [Cost: 264, Rows: 50K] (PATH ID: 3)
 | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | |      Materialize: CD.annual_income, CD.customer_key
 | |      Execute on: All Nodes

在下一个示例中,优化器会选择合并联接。优化器的首次传递会执行合并联接,因为输入已预先排序,然后它再执行散列联接。

=> EXPLAIN SELECT count(*) FROM online_sales.online_sales_fact OSI
-> INNER JOIN customer_dimension CD ON CD.customer_key = OSI.customer_key
-> INNER JOIN product_dimension PD ON PD.product_key = OSI.product_key;
 Access Path:
 +-GROUPBY NOTHING [Cost: 8K, Rows: 1] (PATH ID: 1)
 |  Aggregates: count(*)
 |  Execute on: All Nodes
 | +---> JOIN HASH [Cost: 7K, Rows: 5M] (PATH ID: 2)
 | |      Join Cond: (PD.product_key = OSI.product_key)
 | |      Materialize at Input: OSI.product_key
 | |      Execute on: All Nodes
 | | +-- Outer -> JOIN MERGEJOIN(inputs presorted) [Cost: 4K, Rows: 5M] (PATH ID: 3)
 | | |      Join Cond: (CD.customer_key = OSI.customer_key)
 | | |      Execute on: All Nodes
 | | | +-- Outer -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 4)
 | | | |      Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
 | | | |      Materialize: OSI.customer_key
 | | | |      Execute on: All Nodes
 | | | +-- Inner -> STORAGE ACCESS for CD [Cost: 132, Rows: 50K] (PATH ID: 5)
 | | | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | | | |      Materialize: CD.customer_key
 | | | |      Execute on: All Nodes
 | | +-- Inner -> STORAGE ACCESS for PD [Cost: 152, Rows: 60K] (PATH ID: 6)
 | | |      Projection: public.product_dimension_DBD_2_rep_vmartdb_design_vmartdb_design_node0001
 | | |      Materialize: PD.product_key
 | | |      Execute on: All Nodes

不等于联接

Vertica 使用等于谓词处理联接的效率非常高。查询计划将等于联接谓词显示为联接条件 (Join Cond)。

=> EXPLAIN SELECT CD.annual_income, OSI.sale_date_key
-> FROM online_sales.online_sales_fact OSI
-> INNER JOIN customer_dimension CD
-> ON CD.customer_key = OSI.customer_key;
 Access Path:
 +-JOIN HASH [Cost: 4K, Rows: 5M] (PATH ID: 1)
 |  Join Cond: (CD.customer_key = OSI.customer_key)
 |  Materialize at Output: OSI.sale_date_key
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 2)
 | |      Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
 | |      Materialize: OSI.customer_key
 | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for CD [Cost: 264, Rows: 50K] (PATH ID: 3)
 | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | |      Materialize: CD.annual_income, CD.customer_key
 | |      Execute on: All Nodes

但是,对不等于联接的处理方式与交叉联接类似,其运行效率较低,您可以通过查看两个查询之间的成本变化对此进行了解:

=> EXPLAIN SELECT CD.annual_income, OSI.sale_date_key
-> FROM online_sales.online_sales_fact OSI
-> INNER JOIN customer_dimension CD
-> ON CD.customer_key < OSI.customer_key;
 Access Path:
 +-JOIN HASH [Cost: 98M, Rows: 5M] (PATH ID: 1)
 |  Join Filter: (CD.customer_key < OSI.customer_key)
 |  Materialize at Output: CD.annual_income
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for CD [Cost: 132, Rows: 50K] (PATH ID: 2)
 | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | |      Materialize: CD.customer_key
 | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 3)
 | |      Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
 | |      Materialize: OSI.sale_date_key, OSI.customer_key
 | |      Execute on: All Nodes

事件序列联接

事件序列联接由 INTERPOLATED 路径指示。

=> EXPLAIN SELECT * FROM hTicks h FULL OUTER JOIN aTicks a -> ON (h.time INTERPOLATE PREVIOUS
 Access Path:
 +-JOIN  (INTERPOLATED) [FullOuter] [Cost: 31, Rows: 4 (NO STATISTICS)] (PATH ID: 1)
   Outer (SORT ON JOIN KEY) Inner (SORT ON JOIN KEY)
 |  Join Cond: (h."time" = a."time")
 |  Execute on: Query Initiator
 | +-- Outer -> STORAGE ACCESS for h [Cost: 15, Rows: 4 (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.hTicks_node0004
 | |      Materialize: h.stock, h."time", h.price
 | |      Execute on: Query Initiator
 | +-- Inner -> STORAGE ACCESS for a [Cost: 15, Rows: 4 (NO STATISTICS)] (PATH ID: 3)
 | |      Projection: public.aTicks_node0004
 | |      Materialize: a.stock, a."time", a.price
 | |      Execute on: Query Initiator

1.3.5 - 路径 ID

PATH ID 是 Vertica 为查询计划内的每个操作(路径)所分配的唯一标识符。以下项共享同一标识符:

路径 ID 可帮助您追踪问题的根本原因。例如,如果查询返回联接错误,请在查询前加上 EXPLAIN 并在查询计划中查找 PATH ID n,以查看查询中的哪个联接有问题。

例如,下面的 EXPLAIN 输出显示优化器查询计划中每个路径的路径 ID:

=> EXPLAIN SELECT * FROM fact JOIN dim ON x=y JOIN ext on y=z;
 Access Path:
 +-JOIN MERGEJOIN(inputs presorted) [Cost: 815, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
 | Join Cond: (dim.y = ext.z)
 | Materialize at Output: fact.x
 | Execute on: All Nodes
 | +-- Outer -> JOIN MERGEJOIN(inputs presorted) [Cost: 408, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | | Join Cond: (fact.x = dim.y)
 | | Execute on: All Nodes
 | | +-- Outer -> STORAGE ACCESS for fact [Cost: 202, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
 | | | Projection: public.fact_super
 | | | Materialize: fact.x
 | | | Execute on: All Nodes
 | | +-- Inner -> STORAGE ACCESS for dim [Cost: 202, Rows: 10K (NO STATISTICS)] (PATH ID: 4)
 | | | Projection: public.dim_super
 | | | Materialize: dim.y
 | | | Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for ext [Cost: 202, Rows: 10K (NO STATISTICS)] (PATH ID: 5)
 | | Projection: public.ext_super
 | | Materialize: ext.z
 | | Execute on: All Nodes

1.3.6 - Filter 路径

Filter 步骤对单个表上的谓词求值。它接受一组行、清除其中的某些行(基于查询中提供的条件)并返回剩余行。例如,优化器可以筛选将要与其他重新分段的联接输入进行联接的联接输入的本地数据。

以下语句查询 customer_dimension 表,并使用 WHERE 子句仅筛选马萨诸塞州和新罕布什尔州的男性客户结果。

EXPLAIN SELECT
  CD.customer_name,
  CD.customer_state,
  AVG(CD.customer_age) AS avg_age,
  COUNT(*) AS count
FROM customer_dimension CD
WHERE CD.customer_state in ('MA','NH') AND CD.customer_gender = 'Male'
GROUP BY CD.customer_state, CD.customer_name;

查询计划输出如下:

 Access Path:
 +-GROUPBY HASH [Cost: 378, Rows: 544] (PATH ID: 1)
 |  Aggregates: sum_float(CD.customer_age), count(CD.customer_age), count(*)
 |  Group By: CD.customer_state, CD.customer_name
 |  Execute on: Query Initiator
 | +---> STORAGE ACCESS for CD [Cost: 372, Rows: 544] (PATH ID: 2)
 | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | |      Materialize: CD.customer_state, CD.customer_name, CD.customer_age
 | |      Filter: (CD.customer_gender = 'Male')
 | |      Filter: (CD.customer_state = ANY (ARRAY['MA', 'NH']))
 | |      Execute on: Query Initiator

1.3.7 - GROUP BY 路径

GROUP BY 操作具有两种算法:

  • GROUPBY HASH 输入不按组列排序,因此 Vertica 用这些组列构建一个哈希表,以便处理聚合和 Group By 表达式。

  • GROUPBY PIPELINED 需要在组中指定的列上对输入进行预分类,这意味着 Vertica 仅需要在内存中保留当前组中的数据。GROUPBY PIPELINED 操作是首选算法,因为它们通常比 GROUPBY HASH 更快并且需要的内存更少。GROUPBY PIPELINED 对按列或 DISTINCT 聚合处理大量高基数组的查询尤为有用。

如果可能,查询优化器会选择更快的算法 GROUPBY PIPELINED 而非 GROUPBY HASH。

1.3.7.1 - GROUPBY HASH 查询计划

下面是一个关于 GROUPBY HASH 操作在 EXPLAIN 输出中的样式的示例。

=> EXPLAIN SELECT COUNT(DISTINCT annual_income)
     FROM customer_dimension
     WHERE customer_region='NorthWest';

该输出显示优化器选择了效率更低的 GROUPBY HASH 方法,这意味着投影未在 annual_income 列上预分类。如果此类投影可用,优化器将选择 GROUPBY PIPELINED 算法。

Access Path:
+-GROUPBY NOTHING [Cost: 256, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
|  Aggregates: count(DISTINCT customer_dimension.annual_income)
| +---> GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 253, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| |      Group By: customer_dimension.annual_income
| | +---> STORAGE ACCESS for customer_dimension [Cost: 227, Rows: 50K (NO STATISTICS)] (PATH ID: 3)
| | |      Projection: public.customer_dimension_super
| | |      Materialize: customer_dimension.annual_income
| | |      Filter: (customer_dimension.customer_region = 'NorthWest'
...

1.3.7.2 - GROUPBY PIPELINED 查询计划

如果投影已按 customer_gender 列排序,优化器将选择更快的 GROUPBY PIPELINED 操作:

 => EXPLAIN SELECT COUNT(distinct customer_gender) from customer_dimension;
Access Path:
 +-GROUPBY NOTHING [Cost: 22, Rows: 1] (PATH ID: 1)
 |  Aggregates: count(DISTINCT customer_dimension.customer_gender)
 |  Execute on: Query Initiator
 | +---> GROUPBY PIPELINED [Cost: 20, Rows: 10K] (PATH ID: 2)
 | |      Group By: customer_dimension.customer_gender
 | |      Execute on: Query Initiator
 | | +---> STORAGE ACCESS for customer_dimension [Cost: 17, Rows: 50K (3 RLE)] (PATH ID: 3)
 | | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | | |      Materialize: customer_dimension.customer_gender
 | | |      Execute on: Query Initiator

同样,使用相等谓词(如在以下查询中)将保留 GROUPBY PIPELINED

=> EXPLAIN SELECT COUNT(DISTINCT annual_income)    FROM customer_dimension
   WHERE customer_gender = 'Female';

 Access Path: +-GROUPBY NOTHING [Cost: 161, Rows: 1] (PATH ID: 1)
 |  Aggregates: count(DISTINCT customer_dimension.annual_income)
 | +---> GROUPBY PIPELINED [Cost: 158, Rows: 10K] (PATH ID: 2)
 | |      Group By: customer_dimension.annual_income
 | | +---> STORAGE ACCESS for customer_dimension [Cost: 144, Rows: 47K] (PATH ID: 3)
 | | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | | |      Materialize: customer_dimension.annual_income
 | | |      Filter: (customer_dimension.customer_gender = 'Female')

1.3.8 - Sort 路径

SORT 运算符根据指定的列列表来排序数据。EXPLAIN 输出指示排序表达式以及排列顺序是升序 (ASC) 还是降序 (DESC)。

例如,以下查询计划显示 SORT 运算符的列列表性质:

EXPLAIN SELECT
  CD.customer_name,
  CD.customer_state,
  AVG(CD.customer_age) AS avg_age,
  COUNT(*) AS count
FROM customer_dimension CD
WHERE CD.customer_state in ('MA','NH')
  AND CD.customer_gender = 'Male'
GROUP BY CD.customer_state, CD.customer_name
ORDER BY avg_age, customer_name;
 Access Path:
 +-SORT [Cost: 422, Rows: 544] (PATH ID: 1)
 |  Order: (<SVAR> / float8(<SVAR>)) ASC, CD.customer_name ASC
 |  Execute on: Query Initiator
 | +---> GROUPBY HASH [Cost: 378, Rows: 544] (PATH ID: 2)
 | |      Aggregates: sum_float(CD.customer_age), count(CD.customer_age), count(*)
 | |      Group By: CD.customer_state, CD.customer_name
 | |      Execute on: Query Initiator
 | | +---> STORAGE ACCESS for CD [Cost: 372, Rows: 544] (PATH ID: 3)
 | | |      Projection: public.customer_dimension_DBD_1_rep_vmart_vmart_node0001
 | | |      Materialize: CD.customer_state, CD.customer_name, CD.customer_age
 | | |      Filter: (CD.customer_gender = 'Male')
 | | |      Filter: (CD.customer_state = ANY (ARRAY['MA', 'NH']))
 | | |      Execute on: Query Initiator

如果将排序顺序更改为降序,查询计划将发生变化:

EXPLAIN SELECT
  CD.customer_name,
  CD.customer_state,
  AVG(CD.customer_age) AS avg_age,
  COUNT(*) AS count
FROM customer_dimension CD
WHERE CD.customer_state in ('MA','NH')
  AND CD.customer_gender = 'Male'
GROUP BY CD.customer_state, CD.customer_name
ORDER BY avg_age DESC, customer_name;
 Access Path:
 +-SORT [Cost: 422, Rows: 544] (PATH ID: 1)
 |  Order: (<SVAR> / float8(<SVAR>)) DESC, CD.customer_name ASC
 |  Execute on: Query Initiator
 | +---> GROUPBY HASH [Cost: 378, Rows: 544] (PATH ID: 2)
 | |      Aggregates: sum_float(CD.customer_age), count(CD.customer_age), count(*)
 | |      Group By: CD.customer_state, CD.customer_name
 | |      Execute on: Query Initiator
 | | +---> STORAGE ACCESS for CD [Cost: 372, Rows: 544] (PATH ID: 3)
 | | |      Projection: public.customer_dimension_DBD_1_rep_vmart_vmart_node0001
 | | |      Materialize: CD.customer_state, CD.customer_name, CD.customer_age
 | | |      Filter: (CD.customer_gender = 'Male')
 | | |      Filter: (CD.customer_state = ANY (ARRAY['MA', 'NH']))
 | | |      Execute on: Query Initiator

1.3.9 - Limit 路径

LIMIT 路径根据查询中的 LIMIT 子句限制结果行数。在具有数千行的查询中使用 LIMIT 子句可能会提高查询性能。

优化器在查询中尽可能向下推动 LIMIT 操作。查询中的一条 LIMIT 子句可以生成多个 Output Only 计划注释。

 => EXPLAIN SELECT COUNT(DISTINCT annual_income) FROM customer_dimension LIMIT 10;
Access Path:
 +-SELECT  LIMIT 10 [Cost: 161, Rows: 10] (PATH ID: 0)
 |  Output Only: 10 tuples
 | +---> GROUPBY NOTHING [Cost: 161, Rows: 1] (PATH ID: 1)
 | |      Aggregates: count(DISTINCT customer_dimension.annual_income)
 | |      Output Only: 10 tuples
 | | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 158, Rows: 10K] (PATH ID: 2)
 | | |      Group By: customer_dimension.annual_income
 | | | +---> STORAGE ACCESS for customer_dimension [Cost: 132, Rows: 50K] (PATH ID: 3)
 | | | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | | | |      Materialize: customer_dimension.annual_income

1.3.10 - Data Redistribution 路径

优化器可通过两种方式重新分布联接数据:

  • 广播

  • 重新分段

广播

广播操作会将中间结果的完整副本发送到群集中的所有节点。在以下情况下,可对联接使用广播。

  • 一个表与其他表相比很小(通常指内部表)。

  • Vertica 可避免其他大型上游重新分段操作。

  • 外部联接或子查询的语义需要复制联接一侧的内容。

例如:

=> EXPLAIN SELECT * FROM T1 LEFT JOIN T2 ON T1.a > T2.y;
 Access Path:
 +-JOIN HASH [LeftOuter] [Cost: 40K, Rows: 10K (NO STATISTICS)] (PATH ID: 1) Inner (BROADCAST)
 |  Join Filter: (T1.a > T2.y)
 |  Materialize at Output: T1.b
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for T1 [Cost: 151, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.T1_b0
 | |      Materialize: T1.a
 | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for T2 [Cost: 302, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
 | |      Projection: public.T2_b0
 | |      Materialize: T2.x, T2.y
 | |      Execute on: All Nodes

重新分段

重新分段操作会获取现有投影或中间关系,并在所有群集节点之间均匀地对数据进行重新分段。在重新分段操作结束时,输入关系中的每一行仅位于一个节点上。如果本地联接数据尚未进行分段,则重新分段将是对 Vertica 中的分布式联接最常使用的操作。有关更多详细信息,请参阅相同分段

例如:

=> CREATE TABLE T1 (a INT, b INT) SEGMENTED BY HASH(a) ALL NODES;
=> CREATE TABLE T2 (x INT, y INT) SEGMENTED BY HASH(x) ALL NODES;
=> EXPLAIN SELECT * FROM T1 JOIN T2 ON T1.a = T2.y;

 ------------------------------ QUERY PLAN DESCRIPTION: ------------------------------
 Access Path:
 +-JOIN HASH [Cost: 639, Rows: 10K (NO STATISTICS)] (PATH ID: 1) Inner (RESEGMENT)
 |  Join Cond: (T1.a = T2.y)
 |  Materialize at Output: T1.b
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for T1 [Cost: 151, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.T1_b0
 | |      Materialize: T1.a
 | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for T2 [Cost: 302, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
 | |      Projection: public.T2_b0
 | |      Materialize: T2.x, T2.y
 | |      Execute on: All Nodes

1.3.11 - 分析函数路径

Vertica 通过将同一查询中的多个 SQL-99 分析函数分组到 Analytic Group 区域,来尝试对这些分析函数进行优化。

对于每个分析组,Vertica 将根据需要执行数据分布式排序和重新分段。

可根据查询计划判断需要多少次排序和重新分段。

例如,以下查询计划显示 FIRST_VALUELAST_VALUE 函数在同一分析组中,因为它们的 OVER 子句相同。相反,ROW_NUMBER() 具有不同的 ORDER BY 子句,因此它在其他分析组中。由于这两个组共享同一 PARTITION BY deal_stage 子句,因此无需在组间对数据重新分段:

EXPLAIN SELECT
  first_value(deal_size) OVER (PARTITION BY deal_stage
  ORDER BY deal_size),
  last_value(deal_size) OVER (PARTITION BY deal_stage
  ORDER BY deal_size),
  row_number() OVER (PARTITION BY deal_stage
  ORDER BY largest_bill_amount)
  FROM customer_dimension;

 Access Path:
 +-ANALYTICAL [Cost: 1K, Rows: 50K] (PATH ID: 1)
 |  Analytic Group
 |   Functions: row_number()
 |   Group Sort: customer_dimension.deal_stage ASC, customer_dimension.largest_bill_amount ASC NULLS LAST
 |  Analytic Group
 |   Functions: first_value(), last_value()
 |   Group Filter: customer_dimension.deal_stage
 |   Group Sort: customer_dimension.deal_stage ASC, customer_dimension.deal_size ASC NULL LAST
 |  Execute on: All Nodes
 | +---> STORAGE ACCESS for customer_dimension [Cost: 263, Rows: 50K]
         (PATH ID: 2)
 | |      Projection: public.customer_dimension_DBD_1_rep_vmart_vmart_node0001
 | |      Materialize: customer_dimension.largest_bill_amount,
          customer_dimension.deal_stage, customer_dimension.deal_size
 | |      Execute on: All Nodes

另请参阅

调用分析函数

1.3.12 - 节点故障信息

Vertica 在群集节点出现故障时对性能进行优化,方法是将故障节点的工作均匀分摊到整个群集的可用节点上。

当群集中的某个节点出现故障时,查询计划会识别将在哪个节点执行查询。为了帮助您快速识别大型群集上的故障节点,当正在运行的节点数少于或等于六个时,EXPLAIN 输出最多列出六个节点;当正在运行的节点数多于六个时,将仅列出故障节点。

下表提供了更多详细信息:

示例

在以下示例中,故障节点为 v_vmart_node0005,节点 v_vmart_node0006 将执行查询的此次运行。

=> EXPLAIN SELECT * FROM test;
QUERY PLAN
-----------------------------------------------------------------------------
QUERY PLAN DESCRIPTION:
------------------------------
EXPLAIN SELECT * FROM my1table;
Access Path:
+-STORAGE ACCESS for my1table [Cost: 10, Rows: 2] (PATH ID: 1)
| Projection: public.my1table_b0
| Materialize: my1table.c1, my1table.c2
| Execute on: All Except v_vmart_node0005
+-STORAGE ACCESS for my1table (REPLACEMENT FOR DOWN NODE) [Cost: 66, Rows: 2]
| Projection: public.my1table_b1
| Materialize: my1table.c1, my1table.c2
| Execute on: v_vmart_node0006

以下示例片段中的 All Permanent Nodes 输出指示节点列表仅适用于永久(非短暂)节点:

=> EXPLAIN SELECT * FROM my2table;
Access Path:
+-STORAGE ACCESS for my2table [Cost: 18, Rows:6 (NO STATISTICS)] (PATH ID: 1)
|  Projection: public.my2tablee_b0
|  Materialize: my2table.x, my2table.y, my2table.z
|  Execute on: All Permanent Nodes

1.3.13 - MERGE 路径

如果 MERGE 语句及其表满足 MERGE 优化中所述的标准,则 Vertica 会为该语句准备优化的查询计划。

使用 EXPLAIN 关键字可以确定 Vertica 是否可以为给定的 MERGE 语句生成优化的查询计划。如果可以进行优化,EXPLAIN 生成的输出将包含 [Semi] 路径,如以下示例片段所示:


...
Access Path:
+-DML DELETE [Cost: 0, Rows: 0]
|  Target Projection: public.A_b1 (DELETE ON CONTAINER)
|  Target Prep:
|  Execute on: All Nodes
| +---> JOIN MERGEJOIN(inputs presorted) [Semi] [Cost: 6, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
         Inner (RESEGMENT)
| |      Join Cond: (A.a1 = VAL(2))
| |      Execute on: All Nodes
| | +-- Outer -> STORAGE ACCESS for A [Cost: 2, Rows: 2 (NO STATISTICS)] (PATH ID: 2)
...

相反,如果 Vertica 无法创建优化计划,EXPLAIN 生成的输出将包含 RightOuter 路径:


...
Access Path: +-DML MERGE
 |  Target Projection: public.locations_b1
 |  Target Projection: public.locations_b0
 |  Target Prep:
 |  Execute on: All Nodes
 | +---> JOIN MERGEJOIN(inputs presorted) [RightOuter] [Cost: 28, Rows: 3 (NO STATISTICS)] (PATH ID: 1) Outer (RESEGMENT) Inner (RESEGMENT)
 | |      Join Cond: (locations.user_id = VAL(2)) AND (locations.location_x = VAL(2)) AND (locations.location_y = VAL(2))
 | |      Execute on: All Nodes
 | | +-- Outer -> STORAGE ACCESS for <No Alias> [Cost: 15, Rows: 2 (NO STATISTICS)] (PATH ID: 2)
...

2 - 定向查询

定向查询封装优化器可用于创建查询计划的信息。定向查询可以实现以下目标:

  • 在计划的升级前保留当前查询计划。大多数情况下,升级 Vertica 后,查询的执行效率会变得更高。在少数并非如此的情况下,您可以使用升级之前所创建的定向查询,以便重新创建早期版本的查询计划。

  • 使您能够创建改善优化工具性能的查询计划。有时,您可能希望影响优化器,使其在执行给定查询时做出更好的选择。例如,您可以选择不同的投影,或强制执行不同的联接顺序。在这种情况下,可使用定向查询来创建一个查询计划,替代优化器可能另外创建的计划。

  • 将输入查询重定向到使用不同语义的查询 — 例如,将联接查询映射到用来查询扁平表的 SELECT 语句

定向查询组件

定向查询为两个组件配对:

  • 输入查询:在此定向查询处于活动状态时,触发使用它的查询。

  • 带注释的查询:具有嵌入式优化器提示的 SQL 语句,其中的提示指导优化器如何为指定的输入查询创建查询计划。这些提示指定重要的查询计划元素,如联接顺序和投影选择。

Vertica 提供两种创建定向查询的方法:

  • 优化器可以从给定输入查询创建带注释的查询,并将这两个查询配对为定向查询。

  • 您可以编写自己的带注释查询并将其与输入查询配对。

有关这两种方法的描述,请参阅创建定向查询

2.1 - 创建定向查询

CREATE DIRECTED QUERY 将输入查询与使用优化器提示进行注释的查询相关联。它将此关联存储在一个唯一标识符下。CREATE DIRECTED QUERY 有两种变体:

  • CREATE DIRECTED QUERY OPTIMIZER 指示查询优化器从指定的输入查询生成带注释的 SQL。带注释的查询包含一些提示,优化器可使用这些提示为输入查询重新创建其当前查询计划。

  • CREATE DIRECTED QUERY CUSTOM 指定用户提供的带注释的查询。Vertica 将带注释的查询与最后一个 SAVE QUERY 语句指定的输入查询相关联。

在这两种情形中,Vertica 都会将带注释的查询与输入查询相关联,并将其关联注册在 query_name 下的系统表 DIRECTED_QUERIES 中。

这两种方法可以一起使用:您可以使用优化器创建的带注释 SQL 作为创建自己的(自定义)定向查询的基础。

2.1.1 - 优化器生成的定向查询

CREATE DIRECTED QUERY OPTIMIZER 将输入查询传递给优化器,优化器从自己的查询计划生成带注释的查询。然后,它将输入查询和带注释的查询配对并将其保存为定向查询。此定向查询可用于处理其他除了作为查询结果筛选依据的谓词字符串以外完全相同的查询。

可在升级之前使用优化器生成的定向查询来捕获查询计划。如果在升级后检测到给定查询的性能下降,这样做将会非常有用。在这种情况下,可使用相应的定向查询重新创建早期查询计划,并将其性能与当前优化器生成的计划相对比。

示例

以下 SQL 语句创建和激活定向查询 findEmployeesCityJobTitle_OPT


=> CREATE DIRECTED QUERY OPTIMIZER 'findEmployeesCityJobTitle_OPT'
     SELECT employee_first_name, employee_last_name FROM public.employee_dimension
     WHERE employee_city='Boston' and job_title='Cashier' ORDER BY employee_last_name, employee_first_name;
CREATE DIRECTED QUERY

=> ACTIVATE DIRECTED QUERY findEmployeesCityJobTitle_OPT;
ACTIVATE DIRECTED QUERY

在激活此定向查询计划后,优化器会使用它为此输入查询的所有后续调用以及其他类似调用生成查询计划。可通过调用 GET DIRECTED QUERY 或查询系统表 DIRECTED_QUERIES 来查看优化器生成的带注释查询:

=> SELECT input_query, annotated_query FROM V_CATALOG.DIRECTED_QUERIES WHERE query_name = 'findEmployeesCityJobTitle_OPT';
-[ RECORD 1 ]---+----------------------------------------------------------------------------
input_query     | SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension
WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/))
ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name
annotated_query | SELECT /*+verbatim*/ employee_dimension.employee_first_name AS employee_first_name, employee_dimension.employee_last_name AS employee_last_name FROM public.employee_dimension AS employee_dimension/*+projs('public.employee_dimension')*/
WHERE (employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)
ORDER BY 2 ASC, 1 ASC

带注释的查询包括以下提示:

  • /*+verbatim*/ 指定完全按编写执行带注释的查询并相应地生成查询计划。
  • /*+projs('public.Emp_Dimension')*/ 指示优化器创建使用投影 public.Emp_Dimension 的查询计划。
  • /*+:v({{< codevar >}}n{{< /codevar >}})*/ (alias of /*+IGNORECONST(n)*/) 多次包含在带注释查询和输入查询中。这些提示限定查询谓词中的两个常量: Boston Cashier. Each :v 提示有一个整数实参 n,该实参将输入查询和带注释查询中的对应常量进行配对: *+:v(1)*/ (用于 Boston)和 /*+:v(2)*/(用于 Cashier)。这些提示告诉优化器在决定是否将此定向查询应用于其他类似的输入查询时忽略这些常量。因此,忽略常量提示可让您对不同的输入查询使用相同的定向查询。

以下查询对 employee_cityjob_title 列使用不同的值,但在其他方面与定向查询 EmployeesCityJobTitle_OPT 的原始输入查询相同:

=> SELECT employee_first_name, employee_last_name FROM public.employee_dimension
     WHERE employee_city = 'San Francisco' and job_title = 'Branch Manager' ORDER BY employee_last_name, employee_first_name;

如果定向查询 EmployeesCityJobTitle_OPT 处于活动状态,则优化器可以将其用于此查询:

=> EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension WHERE employee_city='San Francisco' AND job_title='Branch Manager' ORDER BY employee_last_name, employee_first_name;
 ...
 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------
 EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension WHERE employee_city='San Francisco' AND job_title='Branch Manager' ORDER BY employee_last_name, employee_first_name;

 The following active directed query(query name: findEmployeesCityJobTitle_OPT) is being executed:
 SELECT /*+verbatim*/  employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension employee_dimension/*+projs('public.employee_dimension')*/
 WHERE ((employee_dimension.employee_city = 'San Francisco'::varchar(13)) AND (employee_dimension.job_title = 'Branch Manager'::varchar(14)))
 ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name

 Access Path:
 +-SORT [Cost: 222, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
 |  Order: employee_dimension.employee_last_name ASC, employee_dimension.employee_first_name ASC
 |  Execute on: All Nodes
 | +---> STORAGE ACCESS for employee_dimension [Cost: 60, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.employee_dimension_super
 | |      Materialize: employee_dimension.employee_first_name, employee_dimension.employee_last_name
 | |      Filter: (employee_dimension.employee_city = 'San Francisco')
 | |      Filter: (employee_dimension.job_title = 'Branch Manager')
 | |      Execute on: All Nodes
 ...

2.1.2 - 自定义定向查询

CREATE DIRECTED QUERY CUSTOM 指定带注释的查询并将其与 SAVE QUERY 先前保存的输入查询配对。必须在同一用户会话中发出这两个语句。

例如,您可能希望某查询使用特定投影:

  1. 使用 SAVE QUERY 指定查询:

    => SAVE QUERY SELECT employee_first_name, employee_last_name FROM employee_dimension
        WHERE employee_city='Boston' AND job_title='Cashier';
    SAVE QUERY
    
  2. 使用 CREATE DIRECTED QUERY CUSTOM 创建自定义定向查询,它指定带注释的查询并将其与保存的查询相关联。带注释的查询包含一个 /*+projs*/ 提示,它指示优化器在用户调用保存的查询时使用投影 public.emp_dimension_unseg

    
    => CREATE DIRECTED QUERY CUSTOM 'findBostonCashiers_CUSTOM'
       SELECT employee_first_name, employee_last_name
       FROM employee_dimension /*+Projs('public.emp_dimension_unseg')*/
       WHERE employee_city='Boston' AND job_title='Cashier';
    CREATE DIRECTED QUERY
    
  3. 激活此定向查询:

    => ACTIVATE DIRECTED QUERY findBostonCashiers_CUSTOM;
    ACTIVATE DIRECTED QUERY
    
  4. 激活后,优化器将使用此定向查询为其输入查询的所有后续调用生成查询计划。下面的 EXPLAIN 输出验证优化器使用此定向查询的情况及其指定的投影:

    
    => EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension
       WHERE employee_city='Boston' AND job_title='Cashier';
    
    QUERY PLAN
    ------------------------------
    QUERY PLAN DESCRIPTION:
    ------------------------------
    EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension where employee_city='Boston' AND job_title='Cashier';
    
     The following active directed query(query name: findBostonCashiers_CUSTOM) is being executed:
     SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name
      FROM public.employee_dimension/*+Projs('public.emp_dimension_unseg')*/
      WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6)) AND (employee_dimension.job_title = 'Cashier'::varchar(7)))
    
     Access Path:
     +-STORAGE ACCESS for employee_dimension [Cost: 158, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
     |  Projection: public.emp_dimension_unseg
     |  Materialize: employee_dimension.employee_first_name, employee_dimension.employee_last_name
     |  Filter: (employee_dimension.employee_city = 'Boston')
     |  Filter: (employee_dimension.job_title = 'Cashier')
     |  Execute on: Query Initiator
    

另请参阅

重写联接查询

2.1.3 - 结合使用优化器生成的定向查询和自定义定向查询

可以根据优化器所创建的带注释 SQL 来创建自己的自定义定向查询。当评估优化器创建用于处理给定查询的计划以及测试计划修改内容时,此方法特别有用。

例如,您可能希望修改优化器对以下查询的实施方式:

=> SELECT COUNT(customer_name) Total, customer_region Region
    FROM (store_sales s JOIN customer_dimension c ON c.customer_key = s.customer_key)
    JOIN product_dimension p ON s.product_key = p.product_key
    WHERE p.category_description ilike '%Medical%'
      AND p.product_description ilike '%antibiotics%'
      AND c.customer_age <= 30 AND YEAR(s.sales_date)=2017
    GROUP BY customer_region;

对此查询运行 EXPLAIN 时,您会发现优化器将投影 customers_proj_age 用于 customer_dimension 表。此投影按列 customer_age 进行排序。因此,优化器将根据 customer_key 对表 store_salescustomer_dimension 进行哈希联接。

分析 customer_dimension 表数据后,您观察到大部分客户的年龄不到 30 岁,因此对按 customer_key 排序的 customer_dimension 表使用投影 customer_proj_id 更有意义:

您可以创建一个用于封装此更改的定向查询,如下所示:

  1. 使用 EXPLAIN ANNOTATED 获取由优化器生成的查询注释:

    => \o annotatedQuery
    => EXPLAIN ANNOTATED SELECT COUNT(customer_name) Total, customer_region Region
         FROM (store_sales s JOIN customer_dimension c ON c.customer_key = s.customer_key)
         JOIN product_dimension p ON s.product_key = p.product_key
         WHERE p.category_description ilike '%Medical%'
           AND p.product_description ilike '%antibiotics%'
           AND c.customer_age <= 30 AND YEAR(s.sales_date)=2017
         GROUP BY customer_region;
    => \o
    => \! cat annotatedQuery
    ...
    SELECT /*+syntactic_join,verbatim*/ count(c.customer_name) AS Total, c.customer_region AS Region
     FROM ((public.store_sales AS s/*+projs('public.store_sales_super')*/
        JOIN /*+Distrib(L,B),JType(H)*/ public.customer_dimension AS c/*+projs('public.customers_proj_age')*/
          ON (c.customer_key = s.customer_key))
        JOIN /*+Distrib(L,B),JType(M)*/ public.product_dimension AS p/*+projs('public.product_dimension')*/
          ON (s.product_key = p.product_key))
     WHERE ((date_part('year'::varchar(4), (s.sales_date)::timestamp(0)))::int = 2017)
         AND (c.customer_age <= 30)
         AND ((p.category_description)::varchar(32) ~~* '%Medical%'::varchar(9))
         AND (p.product_description ~~* '%antibiotics%'::varchar(13))
     GROUP BY  /*+GByType(Hash)*/ 2
    (4 rows)
    
  2. 修改带注释的查询:

    
    SELECT /*+syntactic_join,verbatim*/ count(c.customer_name) AS Total, c.customer_region AS Region
     FROM ((public.store_sales AS s/*+projs('public.store_sales_super')*/
        JOIN /*+Distrib(L,B),JType(H)*/ public.customer_dimension AS c/*+projs('public.customer_proj_id')*/
          ON (c.customer_key = s.customer_key))
        JOIN /*+Distrib(L,B),JType(H)*/ public.product_dimension AS p/*+projs('public.product_dimension')*/
          ON (s.product_key = p.product_key))
     WHERE ((date_part('year'::varchar(4), (s.sales_date)::timestamp(0)))::int = 2017)
         AND (c.customer_age <= 30)
         AND ((p.category_description)::varchar(32) ~~* '%Medical%'::varchar(9))
         AND (p.product_description ~~* '%antibiotics%'::varchar(13))
     GROUP BY  /*+GByType(Hash)*/ 2
    
  3. 使用修改后的带注释查询来创建所需的定向查询:

    • 使用 SAVE QUERY 保存所需的输入查询:

      
      => SAVE QUERY SELECT COUNT(customer_name) Total, customer_region Region
          FROM (store_sales s JOIN customer_dimension c ON c.customer_key = s.customer_key)
          JOIN product_dimension p ON s.product_key = p.product_key
          WHERE p.category_description ilike '%Medical%'
            AND p.product_description ilike '%antibiotics%'
            AND c.customer_age <= 30 AND YEAR(s.sales_date)=2017
          GROUP BY customer_region;
      
    • 创建一个自定义定向查询,将保存的输入查询与修改后的带注释查询相关联:

      
      => CREATE DIRECTED QUERY CUSTOM 'getCustomersUnder31'
         SELECT /*+syntactic_join,verbatim*/ count(c.customer_name) AS Total, c.customer_region AS Region
       FROM ((public.store_sales AS s/*+projs('public.store_sales_super')*/
          JOIN /*+Distrib(L,B),JType(H)*/ public.customer_dimension AS c/*+projs('public.customer_proj_id')*/
            ON (c.customer_key = s.customer_key))
          JOIN /*+Distrib(L,B),JType(H)*/ public.product_dimension AS p/*+projs('public.product_dimension')*/
            ON (s.product_key = p.product_key))
       WHERE ((date_part('year'::varchar(4), (s.sales_date)::timestamp(0)))::int = 2017)
           AND (c.customer_age <= 30)
           AND ((p.category_description)::varchar(32) ~~* '%Medical%'::varchar(9))
           AND (p.product_description ~~* '%antibiotics%'::varchar(13))
       GROUP BY  /*+GByType(Hash)*/ 2;
      CREATE DIRECTED QUERY
      
  4. 激活此定向查询:

    => ACTIVATE DIRECTED QUERY getCustomersUnder31;
    ACTIVATE DIRECTED QUERY
    

当优化器处理与此定向查询的输入查询相匹配的查询时,它会使用此定向查询的带注释查询来生成查询计划:

=> EXPLAIN SELECT COUNT(customer_name) Total, customer_region Region
     FROM (store_sales s JOIN customer_dimension c ON c.customer_key = s.customer_key)
     JOIN product_dimension p ON s.product_key = p.product_key
     WHERE p.category_description ilike '%Medical%'
       AND p.product_description ilike '%antibiotics%'
       AND c.customer_age <= 30 AND YEAR(s.sales_date)=2017
     GROUP BY customer_region;

 The following active directed query(query name: getCustomersUnder31) is being executed:
...

2.2 - 设置带注释查询中的提示

定向查询的带注释查询中的提示为优化器提供关于如何执行输入查询的说明。带注释查询支持以下提示:

  • 联接提示 用于指定联接顺序、联接类型和联接数据分布:SYNTACTIC_JOINDISTRIBJTYPEUTYPE

  • 表提示 用于指定在查询计划中包括和排除哪些投影:PROJSSKIP_PROJS

  • :v 及其别名 IGNORECONSTANT 标记谓词字符串常量,当优化器决定是否对给定输入查询使用定向查询时,您希望优化器忽略这些常量。有关详细信息,请参阅忽略定向查询中的常量

  • :c 提示标记不得忽略的谓词常量。

  • VERBATIM 完全按编写的那样强制执行带注释查询。

带注释查询中的其他提示(如 DIRECT 或 LABEL)不起作用。

您可以在 vsql 查询中使用与带注释查询中相同的提示,只有两个例外情况: :v (IGNORECONSTANT) 和 VERBATIM

2.3 - 忽略定向查询中的常量

优化器生成的定向查询通常包括一个或多个 :vIGNORECONSTANT 的别名)提示,这些提示标记谓词字符串常量,当优化器决定是否对给定输入查询使用定向查询时,您希望优化器忽略这些常量。 :v 提示使多个查询能够使用同一个定向查询,但前提是这些查询在除了其谓词字符串以外的所有其他方面都相同。

例如,在以下两个查询中,除了分别为 employee_cityjob_title 列指定的字符串常量 Boston|San FranciscoCashier|Branch Manager 外,所有其他方面都相同:


=> SELECT employee_first_name, employee_last_name FROM public.employee_dimension
     WHERE employee_city='Boston' and job_title ='Cashier' ORDER BY employee_last_name, employee_first_name;

=> SELECT employee_first_name, employee_last_name FROM public.employee_dimension
     WHERE employee_city = 'San Francisco' and job_title = 'Branch Manager' ORDER BY employee_last_name, employee_first_name;

在这种情况下,您从一个查询创建的、由优化器生成的定向查询可用于这两个查询:


=> CREATE DIRECTED QUERY OPTIMIZER 'findEmployeesCityJobTitle_OPT'
     SELECT employee_first_name, employee_last_name FROM public.employee_dimension
     WHERE employee_city='Boston' and job_title='Cashier' ORDER BY employee_last_name, employee_first_name;
CREATE DIRECTED QUERY

=> ACTIVATE DIRECTED QUERY findEmployeesCityJobTitle_OPT;
ACTIVATE DIRECTED QUERY

定向查询的输入查询和带注释查询都包含 :v 提示:

=> SELECT input_query, annotated_query FROM V_CATALOG.DIRECTED_QUERIES WHERE query_name = 'findEmployeesCityJobTitle_OPT';
-[ RECORD 1 ]---+----------------------------------------------------------------------------
input_query     | SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension
WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/))
ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name
annotated_query | SELECT /*+verbatim*/ employee_dimension.employee_first_name AS employee_first_name, employee_dimension.employee_last_name AS employee_last_name FROM public.employee_dimension AS employee_dimension/*+projs('public.employee_dimension')*/
WHERE (employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)
ORDER BY 2 ASC, 1 ASC

输入查询和带注释查询中的提示实参将两个谓词常量配对:

  • /*+:v(1)*/employee_city 的输入查询和带注释查询设置配对。

  • /*+:v(2)*/job_title 的输入查询和带注释查询设置配对。

:v 提示告诉优化器在决定是否可以将此定向查询用于给定输入查询时忽略这两列的值。

例如,以下查询虽然对 employee_cityjob_title 使用不同的值,但在其他方面与用于创建定向查询 EmployeesCityJobTitle_OPT 的查询相同:

=> SELECT employee_first_name, employee_last_name FROM public.employee_dimension
     WHERE employee_city = 'San Francisco' and job_title = 'Branch Manager' ORDER BY employee_last_name, employee_first_name;

如果定向查询 EmployeesCityJobTitle_OPT 处于活动状态,则优化器可以将该定向查询用在以下查询的查询计划中:

=> EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension WHERE employee_city='San Francisco' AND job_title='Branch Manager' ORDER BY employee_last_name, employee_first_name;
 ...
 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------
 EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension WHERE employee_city='San Francisco' AND job_title='Branch Manager' ORDER BY employee_last_name, employee_first_name;

 The following active directed query(query name: findEmployeesCityJobTitle_OPT) is being executed:
 SELECT /*+verbatim*/  employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension employee_dimension/*+projs('public.employee_dimension')*/
 WHERE ((employee_dimension.employee_city = 'San Francisco'::varchar(13)) AND (employee_dimension.job_title = 'Branch Manager'::varchar(14)))
 ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name

 Access Path:
 +-SORT [Cost: 222, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
 |  Order: employee_dimension.employee_last_name ASC, employee_dimension.employee_first_name ASC
 |  Execute on: All Nodes
 | +---> STORAGE ACCESS for employee_dimension [Cost: 60, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.employee_dimension_super
 | |      Materialize: employee_dimension.employee_first_name, employee_dimension.employee_last_name
 | |      Filter: (employee_dimension.employee_city = 'San Francisco')
 | |      Filter: (employee_dimension.job_title = 'Branch Manager')
 | |      Execute on: All Nodes
 ...

映射一对多 :v 提示

目前所显示的示例演示了 :v 提示的一对一配对。您也可以使用 :v 提示将一个输入常数映射到带注释查询中的多个常数。当您想为优化器提供关于如何执行查询(用来联接表)的显式说明时,这种方法特别有用。

例如,以下查询联接了两个表:

SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 8;

在这种情况下,优化器可以推断 S.aT.b 具有相同的值,并相应地实施联接。

<a name="simpleJoinExample"></a>=> CREATE DIRECTED QUERY OPTIMIZER simpleJoin SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 8;
CREATE DIRECTED QUERY
=> SELECT input_query, annotated_query FROM directed_queries WHERE query_name = 'simpleJoin';
-[ RECORD 1 ]---+---------------------------------------------------------------------------------------------------------------------
input_query     | SELECT S.a, T.b FROM (public.S JOIN public.T ON ((S.a = T.b))) WHERE (S.a = 8 /*+:v(1)*/)
annotated_query | SELECT /*+syntactic_join,verbatim*/ S.a AS a, T.b AS b
FROM (public.S AS S/*+projs('public.S')*/ JOIN /*+Distrib(L,L),JType(M)*/ public.T AS T/*+projs('public.T')*/  ON (S.a = T.b))
WHERE (S.a = 8 /*+:v(1)*/) AND (T.b = 8 /*+:v(1)*/)
(1 row)

=> ACTIVATE DIRECTED QUERY simpleJoin;
ACTIVATED DIRECTED QUERY

现在,以下列输入查询为例:

SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 3;

优化器忽略联接谓词常量并使用其查询计划中的定向查询 simpleJoin

 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 3;

 The following active directed query(query name: simpleJoin) is being executed:
 SELECT /*+syntactic_join,verbatim*/  S.a, T.b FROM (public.S S/*+projs('public.S')*/ JOIN /*+Distrib('L', 'L'), JType('
M')*/public.T T/*+projs('public.T')*/ ON ((S.a = T.b))) WHERE ((S.a = 3) AND (T.b = 3))

 Access Path:
 +-JOIN MERGEJOIN(inputs presorted) [Cost: 21, Rows: 4 (NO STATISTICS)] (PATH ID: 1)
 |  Join Cond: (S.a = T.b)
 |  Execute on: Query Initiator
 | +-- Outer -> STORAGE ACCESS for S [Cost: 12, Rows: 4 (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.S_b0
 | |      Materialize: S.a
 | |      Filter: (S.a = 3)
 | |      Execute on: Query Initiator
 | |      Runtime Filter: (SIP1(MergeJoin): S.a)
 | +-- Inner -> STORAGE ACCESS for T [Cost: 8, Rows: 3 (NO STATISTICS)] (PATH ID: 3)
 | |      Projection: public.T_b0
 | |      Materialize: T.b
 | |      Filter: (T.b = 3)
 | |      Execute on: Query Initiator
 ...

在定向查询中保留谓词常量

默认情况下,优化器生成的定向查询在谓词常量上设置 :v 提示。您可以覆盖此行为,方法是使用 :c 提示来标记不得忽略的谓词常量。例如,以下语句创建一个定向查询,该定向查询只能用于联接谓词常量 8 与原始输入查询中相同的输入查询:

=> CREATE DIRECTED QUERY OPTIMIZER simpleJoin_KeepPredicateConstant SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 8 /*+:c*/;
CREATE DIRECTED QUERY
=> ACTIVATE DIRECTED QUERY simpleJoin_KeepPredicateConstant;

以下对系统表 DIRECTED_QUERIES 的查询对定向查询 simpleJoin(在前面的示例中创建)和 simpleJoin_KeepPredicateConstant 进行了比较。与 simpleJoin 不同,simpleJoin_KeepPredicateConstant 的输入查询和带注释查询的联接谓词省略了 :v 提示:



=> SELECT query_name, input_query, annotated_query FROM directed_queries WHERE query_name ILIKE'%simpleJoin%';
-[ RECORD 1 ]---+
query_name      | simpleJoin
input_query     | SELECT S.a, T.b FROM (public.S JOIN public.T ON ((S.a = T.b))) WHERE (S.a = 8 /*+:v(1)*/)
annotated_query | SELECT /*+syntactic_join,verbatim*/ S.a AS a, T.b AS b
FROM (public.S AS S/*+projs('public.S')*/ JOIN /*+Distrib(L,L),JType(M)*/ public.T AS T/*+projs('public.T')*/  ON (S.a = T.b))
WHERE (S.a = 8 /*+:v(1)*/) AND (T.b = 8 /*+:v(1)*/)
-[ RECORD 2 ]---+
query_name      | simpleJoin_KeepPredicateConstant
input_query     | SELECT S.a, T.b FROM (public.S JOIN public.T ON ((S.a = T.b))) WHERE (S.a = 8)
annotated_query | SELECT /*+syntactic_join,verbatim*/ S.a AS a, T.b AS b
FROM (public.S AS S/*+projs('public.S')*/ JOIN /*+Distrib(L,L),JType(M)*/ public.T AS T/*+projs('public.T')*/  ON (S.a = T.b))
WHERE (S.a = 8) AND (T.b = 8)

如果停用定向查询 simpleJoin(否则该定向查询会优先于 simpleJoin_KeepPredicateConstant,因为前者创建的早),Vertica 仅将 simpleJoin_KeepPredicateConstant 应用于输入查询,这里联接谓词常量与原始输入查询中相同。例如,比较以下两个查询计划:

=> EXPLAIN SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 8;
 ...
 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 8;

 The following active directed query(query name: simpleJoin_KeepPredicateConstant) is being executed:
 SELECT /*+syntactic_join,verbatim*/  S.a, T.b FROM (public.S S/*+projs('public.S')*/ JOIN /*+Distrib('L', 'L'), JType('
M')*/public.T T/*+projs('public.T')*/ ON ((S.a = T.b))) WHERE ((S.a = 8) AND (T.b = 8))

 Access Path:
 +-JOIN MERGEJOIN(inputs presorted) [Cost: 21, Rows: 4 (NO STATISTICS)] (PATH ID: 1)
 |  Join Cond: (S.a = T.b)
 ...
=> EXPLAIN SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 3
 ...
 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT * FROM S JOIN T ON S.a = T.b WHERE S.a = 3;

 Access Path:
 +-JOIN MERGEJOIN(inputs presorted) [Cost: 21, Rows: 4 (NO STATISTICS)] (PATH ID: 1)
 |  Join Cond: (S.a = T.b)
 ...

2.4 - 重写查询

您可以使用定向查询来更改给定查询的语义,即用一个查询替换另一个查询。当您对 Vertica 数据库处理的输入查询的内容和格式没有或几乎没有控制权时,这一点尤其重要。您可以将这些查询映射到定向查询,从而重写原始输入以实现最佳执行效果。

以下部分介绍了两个用例:

重写联接查询

许多输入查询联接多个表。您已经确定,在许多情况下,更高效的做法是对多个扁平表中的大量数据进行反向标准化并直接查询这些表。您不能修改输入查询本身。但是,您可以使用定向查询将联接查询重定向到扁平表数据。

例如,以下查询通过联接 VMart 数据库中的三个表来聚合葡萄酒产品的区域销售额:

=> SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
    FROM store.store_sales_fact SF
    JOIN store.store_dimension SD ON SF.store_key=SD.store_key
    JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
  WHERE P.product_description ILIKE '%wine%'
  GROUP BY ROLLUP (SD.store_region, SD.store_city)
  ORDER BY Region,Total DESC;

您可以将联接的表数据合并到一个扁平表中,然后改为查询该表。通过这样做,您可以更快地访问同一数据。您可以使用以下 DDL 语句创建扁平表:

=> CREATE TABLE store.store_sales_wide AS SELECT * FROM store.store_sales_fact;
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_name VARCHAR(64)
     SET USING (SELECT store_name FROM store.store_dimension WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_city varchar(64)
    SET USING (SELECT store_city FROM store.store_dimension  WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_state char(2)
    SET USING (SELECT store_state char FROM store.store_dimension WHERE store.store_sales_wide.store_key=store.store_dimension.store_key)
=> ALTER TABLE store.store_sales_wide ADD COLUMN store_region varchar(64)
    SET USING (SELECT store_region FROM store.store_dimension  WHERE store.store_sales_wide.store_key=store.store_dimension.store_key);
=> ALTER TABLE store.store_sales_wide ADD column product_description VARCHAR(128)
    SET USING (SELECT product_description FROM public.product_dimension
    WHERE store_sales_wide.product_key||store_sales_wide.product_version = product_dimension.product_key||product_dimension.product_version);
=> ALTER TABLE store.store_sales_wide ADD COLUMN sku_number char(32)
    SET USING (SELECT sku_number char FROM product_dimension
    WHERE store_sales_wide.product_key||store_sales_wide.product_version = product_dimension.product_key||product_dimension.product_version);

=> SELECT REFRESH_COLUMNS ('store.store_sales_wide','', 'rebuild');

创建此表并刷新其 SET USING 列后,可以重写前面的查询,如下所示:

=> SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
    FROM store.store_sales_fact SF
    JOIN store.store_dimension SD ON SF.store_key=SD.store_key
    JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
  WHERE P.product_description ILIKE '%wine%'
  GROUP BY ROLLUP (SD.store_region, SD.store_city)
  ORDER BY Region,Total DESC;
  Region   |       City       |  Total
-----------+------------------+---------
 East      |                  | 1679788
 East      | Boston           |  138494
 East      | Elizabeth        |  138071
 East      | Sterling Heights |  137719
 East      | Allentown        |  137122
 East      | New Haven        |  117751
 East      | Lowell           |  102670
 East      | Washington       |   84595
 East      | Charlotte        |   83255
 East      | Waterbury        |   81516
 East      | Erie             |   80784
 East      | Stamford         |   59935
 East      | Hartford         |   59843
 East      | Baltimore        |   55873
 East      | Clarksville      |   54117
 East      | Nashville        |   53757
 East      | Manchester       |   53290
 East      | Columbia         |   52799
 East      | Memphis          |   52648
 East      | Philadelphia     |   29711
 East      | Portsmouth       |   29316
 East      | New York         |   27033
 East      | Cambridge        |   26111
 East      | Alexandria       |   23378
 MidWest   |                  | 1073224
 MidWest   | Lansing          |  145616
 MidWest   | Livonia          |  129349
--More--

查询扁平表更高效;但是,您仍然必须考虑继续使用早期联接语法的输入查询。这可以通过创建自定义定向查询来完成,这会将这些输入查询重定向到以扁平表为目标的语法:

  1. 保存输入查询

    => SAVE QUERY SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
        FROM store.store_sales_fact SF
        JOIN store.store_dimension SD ON SF.store_key=SD.store_key
        JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
      WHERE P.product_description ILIKE '%wine%'
      GROUP BY ROLLUP (SD.store_region, SD.store_city)
      ORDER BY Region,Total DESC;
    SAVE QUERY
    
  2. 将保存的查询映射到具有所需语法的定向查询,然后激活定向查询:

    => CREATE DIRECTED QUERY CUSTOM 'RegionalSalesWine'
        SELECT store_region AS Region,
          store_city AS City,
          SUM(gross_profit_dollar_amount) AS Total
        FROM store.store_sales_wide
        WHERE product_description ILIKE '%wine%'
        GROUP BY ROLLUP (region, city)
        ORDER BY Region,Total DESC;
    CREATE DIRECTED QUERY
    
    => ACTIVATE DIRECTED QUERY RegionalSalesWine;
    ACTIVATE DIRECTED QUERY
    

当定向查询 RegionalSalesWine 处于活动状态时,查询优化器会将所有与原始输入格式相匹配的查询映射到定向查询,如下面的查询计划所示:

=> EXPLAIN SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
     FROM store.store_sales_fact SF
     JOIN store.store_dimension SD ON SF.store_key=SD.store_key
     JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
   WHERE P.product_description ILIKE '%wine%'
   GROUP BY ROLLUP (SD.store_region, SD.store_city)
   ORDER BY Region,Total DESC;
 ...
 The following active directed query(query name: RegionalSalesWine) is being executed:
 SELECT store_sales_wide.store_region AS Region, store_sales_wide.store_city AS City, sum(store_sales_wide.gross_profit_dollar_amount) AS Total
  FROM store.store_sales_wide WHERE (store_sales_wide.product_description ~~* '%wine%'::varchar(6))
  GROUP BY GROUPING SETS((store_sales_wide.store_region, store_sales_wide.store_city), (store_sales_wide.store_region),())
  ORDER BY store_sales_wide.store_region, sum(store_sales_wide.gross_profit_dollar_amount) DESC

 Access Path:
 +-SORT [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
 |  Order: store_sales_wide.store_region ASC, sum(store_sales_wide.gross_profit_dollar_amount) DESC
 |  Execute on: All Nodes
 | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Aggregates: sum(store_sales_wide.gross_profit_dollar_amount)
 | |      Group By: store_sales_wide.store_region, store_sales_wide.store_city
 | |      Grouping Sets: (store_sales_wide.store_region, store_sales_wide.store_city, <SVAR>), (store_sales_wide.store_region, <SVAR>), (<SVAR>)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for store_sales_wide [Cost: 864, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
 | | |      Projection: store.store_sales_wide_b0
 | | |      Materialize: store_sales_wide.gross_profit_dollar_amount, store_sales_wide.store_city, store_sales_wide.store_region
 | | |      Filter: (store_sales_wide.product_description ~~* '%wine%')
 | | |      Execute on: All Nodes

要对定向查询和原始输入查询的执行成本进行比较,请停用定向查询并对原始输入查询使用 EXPLAIN。优化器恢复到为输入查询创建计划,这会产生高得多的成本 - 188K 相对于 2K:

=> DEACTIVATE DIRECTED QUERY RegionalSalesWine;
DEACTIVATE DIRECTED QUERY
=> EXPLAIN SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
     FROM store.store_sales_fact SF
     JOIN store.store_dimension SD ON SF.store_key=SD.store_key
     JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
   WHERE P.product_description ILIKE '%wine%'
   GROUP BY ROLLUP (SD.store_region, SD.store_city)
   ORDER BY Region,Total DESC;
 ...
  Access Path:
 +-SORT [Cost: 188K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
 |  Order: SD.store_region ASC, sum(SF.gross_profit_dollar_amount) DESC
 |  Execute on: All Nodes
 | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 188K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Aggregates: sum(SF.gross_profit_dollar_amount)
 | |      Group By: SD.store_region, SD.store_city
 | |      Grouping Sets: (SD.store_region, SD.store_city, <SVAR>), (SD.store_region, <SVAR>), (<SVAR>)
 | |      Execute on: All Nodes
 | | +---> JOIN HASH [Cost: 12K, Rows: 5M (NO STATISTICS)] (PATH ID: 3) Inner (BROADCAST)
 | | |      Join Cond: (concat((SF.product_key)::varchar, (SF.product_version)::varchar) = concat((P.product_key)::varchar, (P.product_version)::varchar))
 | | |      Materialize at Input: SF.product_key, SF.product_version
 | | |      Materialize at Output: SF.gross_profit_dollar_amount
 | | |      Execute on: All Nodes
 | | | +-- Outer -> JOIN HASH [Cost: 2K, Rows: 5M (NO STATISTICS)] (PATH ID: 4) Inner (BROADCAST)
 | | | |      Join Cond: (SF.store_key = SD.store_key)
 | | | |      Execute on: All Nodes
 | | | | +-- Outer -> STORAGE ACCESS for SF [Cost: 1K, Rows: 5M (NO STATISTICS)] (PATH ID: 5)
 | | | | |      Projection: store.store_sales_fact_super
 | | | | |      Materialize: SF.store_key
 | | | | |      Execute on: All Nodes
 | | | | |      Runtime Filters: (SIP2(HashJoin): SF.store_key), (SIP1(HashJoin): concat((SF.product_key)::varchar, (SF.product_version)::varchar))
 | | | | +-- Inner -> STORAGE ACCESS for SD [Cost: 13, Rows: 250 (NO STATISTICS)] (PATH ID: 6)
 | | | | |      Projection: store.store_dimension_super
 | | | | |      Materialize: SD.store_key, SD.store_city, SD.store_region
 | | | | |      Execute on: All Nodes
 | | | +-- Inner -> STORAGE ACCESS for P [Cost: 201, Rows: 60K (NO STATISTICS)] (PATH ID: 7)
 | | | |      Projection: public.product_dimension_super
 | | | |      Materialize: P.product_key, P.product_version
 | | | |      Filter: (P.product_description ~~* '%wine%')
 | | | |      Execute on: All Nodes

创建查询模板

您可以使用定向查询来实施除了用于筛选查询结果的谓词字符串以外都完全相同的多个查询。例如,定向查询 RegionalSalesWine 仅处理对包含字符串 wineproduct_description 值进行筛选的输入查询。您可以创建此定向查询的修改版本,使其匹配多个输入查询的语法,这些输入查询仅在输入值(例如 tuna)上有所不同。

按以下步骤创建此查询模板:

  1. 创建两个优化器生成的定向查询

    • 从对联接表的原始查询:

      => CREATE DIRECTED QUERY OPTIMIZER RegionalSalesProducts_JoinTables
           SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
             FROM store.store_sales_fact SF
             JOIN store.store_dimension SD ON SF.store_key=SD.store_key
             JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
             WHERE P.product_description ILIKE '%wine%'
           GROUP BY ROLLUP (SD.store_region, SD.store_city)
           ORDER BY Region,Total DESC;
      CREATE DIRECTED QUERY
      
    • 从对扁平表的查询:

      => CREATE DIRECTED QUERY OPTIMIZER RegionalSalesProduct
          SELECT store_region AS Region, store_city AS City, SUM(gross_profit_dollar_amount) AS Total
          FROM store.store_sales_wide
          WHERE product_description ILIKE '%wine%'
          GROUP BY ROLLUP (region, city)
          ORDER BY Region,Total DESC;
      CREATE DIRECTED QUERY
      
  2. 查询系统表 DIRECTED_QUERIES 并复制定向查询 RegionalSalesProducts_JoinTables 的输入查询:

    SELECT input_query FROM directed_queries WHERE query_name = 'RegionalSalesProducts_JoinTables';
    
  3. 通过 SAVE QUERY 使用复制的输入查询:

    SAVE QUERY SELECT SD.store_region AS Region, SD.store_city AS City, sum(SF.gross_profit_dollar_amount) AS Total
      FROM ((store.store_sales_fact SF
      JOIN store.store_dimension SD ON ((SF.store_key = SD.store_key)))
      JOIN public.product_dimension P ON ((concat((SF.product_key)::varchar, (SF.product_version)::varchar) = concat((P.product_key)::varchar, (P.product_version)::varchar))))
      WHERE (P.product_description ~~* '%wine%'::varchar(6) /*+:v(1)*/)
      GROUP BY GROUPING SETS((SD.store_region, SD.store_city), (SD.store_region), ())
      ORDER BY SD.store_region, sum(SF.gross_profit_dollar_amount) DESC
    (1 row)
    
  4. 查询系统表 DIRECTED_QUERIES 并复制定向查询 RegionalSalesProducts_FlatTables 的带注释查询:

    SELECT input_query FROM directed_queries WHERE query_name = 'RegionalSalesProducts_JoinTables';
    
  5. 使用复制的带注释查询创建自定义定向查询

    => CREATE DIRECTED QUERY CUSTOM RegionalSalesProduct  SELECT /*+verbatim*/ store_sales_wide.store_region AS Region, store_sales_wide.store_city AS City, sum(store_sales_wide.gross_profit_dollar_amount) AS Total
         FROM store.store_sales_wide AS store_sales_wide/*+projs('store.store_sales_wide')*/
         WHERE (store_sales_wide.product_description ~~* '%wine%'::varchar(6) /*+:v(1)*/)
         GROUP BY  /*+GByType(Hash)*/ GROUPING SETS((1, 2), (1), ())
         ORDER BY 1 ASC, 3 DESC;
    CREATE DIRECTED QUERY
    
  6. 激活此定向查询:

    ACTIVATE DIRECTED QUERY RegionalSalesProduct;
    

激活此定向查询后,Vertica 可以将其用于和模板相匹配的输入查询中,不同之处仅在于 product_description 的谓词值:


=> EXPLAIN SELECT SD.store_region AS Region, SD.store_city AS City, SUM(SF.gross_profit_dollar_amount) Total
         FROM store.store_sales_fact SF
         JOIN store.store_dimension SD ON SF.store_key=SD.store_key
         JOIN product_dimension P ON SF.product_key||SF.product_version=P.product_key||P.product_version
       WHERE P.product_description ILIKE '%tuna%'
       GROUP BY ROLLUP (SD.store_region, SD.store_city)
       ORDER BY Region,Total DESC;
 ...
 The following active directed query(query name: RegionalSalesProduct) is being executed:
 SELECT /*+verbatim*/  store_sales_wide.store_region AS Region, store_sales_wide.store_city AS City, sum(store_sales_wide.gross_profit_dollar_amount) AS Total
  FROM store.store_sales_wide store_sales_wide/*+projs('store.store_sales_wide')*/
  WHERE (store_sales_wide.product_description ~~* '%tuna%'::varchar(6))
  GROUP BY /*+GByType(Hash)*/  GROUPING SETS((store_sales_wide.store_region, store_sales_wide.store_city), (store_sales_wide.store_region), ())
  ORDER BY store_sales_wide.store_region, sum(store_sales_wide.gross_profit_dollar_amount) DESC

 Access Path:
 +-SORT [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
 |  Order: store_sales_wide.store_region ASC, sum(store_sales_wide.gross_profit_dollar_amount) DESC
 |  Execute on: All Nodes
 | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Aggregates: sum(store_sales_wide.gross_profit_dollar_amount)
 | |      Group By: store_sales_wide.store_region, store_sales_wide.store_city
 | |      Grouping Sets: (store_sales_wide.store_region, store_sales_wide.store_city, <SVAR>), (store_sales_wide.store_region, <SVAR>), (<SVAR>)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for store_sales_wide [Cost: 864, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
 | | |      Projection: store.store_sales_wide_b0
 | | |      Materialize: store_sales_wide.gross_profit_dollar_amount, store_sales_wide.store_city, store_sales_wide.store_region
 | | |      Filter: (store_sales_wide.product_description ~~* '%tuna%')
 | | |      Execute on: All Nodes

当您执行此查询时,它会返回以下结果:

  Region   |       City       |  Total
-----------+------------------+---------
 East      |                  | 1564582
 East      | Elizabeth        |  131328
 East      | Allentown        |  129704
 East      | Boston           |  128188
 East      | Sterling Heights |  125877
 East      | Lowell           |  112133
 East      | New Haven        |  101161
 East      | Waterbury        |   85401
 East      | Washington       |   76127
 East      | Erie             |   73002
 East      | Charlotte        |   67850
 East      | Memphis          |   53650
 East      | Clarksville      |   53416
 East      | Hartford         |   52583
 East      | Columbia         |   51950
 East      | Nashville        |   50031
 East      | Manchester       |   48607
 East      | Baltimore        |   48108
 East      | Stamford         |   47302
 East      | New York         |   30840
 East      | Portsmouth       |   26485
 East      | Alexandria       |   26391
 East      | Philadelphia     |   23092
 East      | Cambridge        |   21356
 MidWest   |                  |  980209
 MidWest   | Lansing          |  130044
 MidWest   | Livonia          |  118740
--More--

2.5.1 - 获取定向查询

可按以下两种方式获取有关定向查询的编录信息:

GET DIRECTED QUERY

GET DIRECTED QUERY 在指定输入查询上查询系统表 DIRECTED_QUERIES。它返回映射到输入查询的所有定向查询的详细信息。

以下 GET DIRECTED QUERY 语句返回映射到以下输入查询 findEmployeesCityJobTitle_OPT 的定向查询:

=> GET DIRECTED QUERY SELECT employee_first_name, employee_last_name from employee_dimension where employee_city='Boston' AND job_title='Cashier' order by employee_last_name, employee_first_name;
-[ RECORD 1 ]---+
query_name      | findEmployeesCityJobTitle_OPT
is_active       | t
vertica_version | Vertica Analytic Database v11.0.1-20210815
comment         | Optimizer-generated directed query
creation_date   | 2021-08-20 14:53:42.323963
annotated_query | SELECT /*+verbatim*/  employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension employee_dimension/*+projs('public.employee_dimension')*/ WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)) ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name

DIRECTED_QUERIES

可以直接查询系统表 DIRECTED_QUERIES。例如:

=> SELECT query_name, is_active FROM V_CATALOG.DIRECTED_QUERIES WHERE query_name ILIKE '%findEmployeesCityJobTitle%';
          query_name           | is_active
-------------------------------+-----------
 findEmployeesCityJobTitle_OPT | t
(1 row)

2.5.2 - 识别处于活动状态的定向查询

多个定向查询可以映射到同一输入查询。系统表 DIRECTED_QUERIES 中的 is_active 列标识处于活动状态的定向查询。如果多个定向查询对于同一输入查询均处于活动状态,优化器则使用要创建的第一个定向查询。在这种情况下,可使用 EXPLAIN 识别哪个定向查询处于活动状态。

DIRECTED_QUERIES 上的以下查询查找所有处于活动状态的定向查询,且其中的输入查询包含被查询表 employee_dimension 的名称:

=> SELECT * FROM directed_queries WHERE input_query ILIKE ('%employee_dimension%') AND is_active='t';
-[ RECORD 1 ]------+
query_name         | findEmployeesCityJobTitle_OPT
is_active          | t
vertica_version    | Vertica Analytic Database v11.0.1-20210815
comment            | Optimizer-generated directed query
save_plans_version | 0
username           | dbadmin
creation_date      | 2021-08-20 14:53:42.323963
since_date         |
input_query        | SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)) ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name
annotated_query    | SELECT /*+verbatim*/  employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension employee_dimension/*+projs('public.employee_dimension')*/ WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)) ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name

如果对输入查询运行 EXPLAIN,它将返回一个确认使用 findEmployeesCityJobTitle_OPT 作为活动定向查询的查询计划:

=> EXPLAIN SELECT employee_first_name, employee_last_name FROM employee_dimension WHERE employee_city='Boston' AND job_title ='Cashier' ORDER BY employee_last_name, employee_first_name;

 The following active directed query(query name: findEmployeesCityJobTitle_OPT) is being executed:
 SELECT /*+verbatim*/  employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension employee_dimension/*+projs('public.employee_dimension')*/ WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6)) AND (employee_dimension.job_title = 'Cashier'::varchar(7))) ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name

2.5.3 - 激活和停用定向查询

优化器仅使用处于活动状态的定向查询。如果多个定向查询共享同一输入查询,优化器将使用要创建的第一个查询。

定向查询分别使用 ACTIVATE DIRECTED QUERYDEACTIVATE DIRECTED QUERY 来激活和停用。例如,以下 ACTIVATE DIRECTED QUERY 语句停用 RegionalSalesProducts_JoinTables 并激活 RegionalSalesProduct


=> DEACTIVATE DIRECTED QUERY RegionalSalesProducts_JoinTables;
DEACTIVATE DIRECTED QUERY;
=> ACTIVATE DIRECTED QUERY RegionalSalesProduct;
ACTIVATE DIRECTED QUERY;

Vertica 为所有会话中的给定查询使用处于活动状态的定向查询,除非它被 DEACTIVATE DIRECTED QUERY 明确停用或被 DROP DIRECTED QUERY 从存储空间中移除。如果在数据库关闭时定向查询处于活动状态,当您重新启动数据库时,Vertica 会自动将其重新激活。

定向查询停用后,查询优化器通过使用另一个定向查询(如果存在这样的定向查询)处理输入查询的后续调用。否则,它会生成自己的查询计划。

2.5.4 - 从编录中导出定向查询

升级到新版本 Vertica 之前,您可以导出定向查询,从而将其用于实现最佳性能对系统而言至关重要的查询。

  1. 配合使用 EXPORT_CATALOG 和实参 DIRECTED_QUERIES 从数据库编录导出所有当前定向查询及其当前激活状态:

    => SELECT EXPORT_CATALOG('../../export_directedqueries', 'DIRECTED_QUERIES');
    EXPORT_CATALOG
    -------------------------------------
    Catalog data exported successfully
    
  2. EXPORT_CATALOG 创建一个脚本以重新创建定向查询,如以下示例所示:

    SAVE QUERY SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)) ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name;
    CREATE DIRECTED QUERY CUSTOM findEmployeesCityJobTitle_OPT COMMENT 'Optimizer-generated directed query' OPTVER 'Vertica Analytic Database v11.0.1-20210815' PSDATE '2021-08-20 14:53:42.323963' SELECT /*+verbatim*/  employee_dimension.employee_first_name, employee_dimension.employee_last_name
    FROM public.employee_dimension employee_dimension/*+projs('public.employee_dimension')*/
    WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/))
    ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name;
    ACTIVATE DIRECTED QUERY findEmployeesCityJobTitle_OPT;
    
  3. 升级完成后,使用 DROP DIRECTED QUERY 从数据库编录中移除每个定向查询。或者,编辑导出脚本并在每个 CREATE DIRECTED QUERY 语句之前插入 DROP DIRECTED QUERY 语句。例如,您可以使用粗体显示的更改来修改先前生成的脚本:

    SAVE QUERY SELECT employee_dimension.employee_first_name, ... ;
    DROP DIRECTED QUERY findEmployeesCityJobTitle_OPT;
    CREATE DIRECTED QUERY CUSTOM findEmployeesCityJobTitle_OPT COMMENT 'Optimizer-generated ... ;
    ACTIVATE DIRECTED QUERY findEmployeesCityJobTitle_OPT;
    
  4. 当您运行此脚本时,Vertica 将重新创建定向查询并还原其激活状态:

    => \i /home/dbadmin/export_directedqueries
    SAVE QUERY
    DROP DIRECTED QUERY
    CREATE DIRECTED QUERY
    ACTIVATE DIRECTED QUERY
    

2.5.5 - 删除定向查询

DROP DIRECTED QUERY 从数据库编录中移除指定的定向查询。如果该定向查询处于活动状态,Vertica 会在移除前将其停用。

例如:

=> DROP DIRECTED QUERY findBostonCashiers_CUSTOM;
DROP DIRECTED QUERY

2.6 - 批量导出查询计划

在升级到新 Vertica 版本之前,您可能希望使用定向查询来保存查询计划,以便可能在新数据库中重新使用。您无法预测哪些查询计划可能会重新使用,因此可能要保存很多(或全部)数据库查询的查询计划。但是,每天运行的查询有数百个。通过重复调用 CREATE DIRECTED QUERY 将每个查询计划保存到数据库编录是不切实际的。此外,这样做会显著增加编录大小,可能还会影响性能。

在此情况下,可绕过数据库编录并将查询计划作为定向查询批量导出到外部 SQL 文件。通过为查询计划存储卸载,可以保存当前数据库中任意数量的查询计划,而不影响编录大小和性能。升级之后,您可以决定在新数据库中保留哪些查询计划,并有选择地导入相应的定向查询。

Vertica 提供一组支持此方法的元函数:

  • EXPORT_DIRECTED_QUERIES 从一组输入查询生成查询计划,并编写用于创建封装这些计划的定向查询的 SQL。

  • IMPORT_DIRECTED_QUERIES 将定向查询从由 EXPORT_DIRECTED_QUERIES 生成的 SQL 文件导入数据库编录中。

2.6.1 - 导出定向查询

可以将任意数量的查询计划作为定向查询批量导出到外部 SQL 文件中,如下所示:

  1. 创建一个 SQL 文件,其中包含要保存其查询计划的输入查询。请参阅下文的输入格式

  2. 对该 SQL 文件调用元函数 EXPORT_DIRECTED_QUERIES。此元函数具有两个实参:

    • 输入查询文件

    • (可选)输出文件的名称。EXPORT_DIRECTED_QUERIES 将用于创建定向查询的 SQL 写入此文件。如果您提供的是空字符串,则 Vertica 会将 SQL 写入标准输出。

例如,以下 EXPORT_DIRECTED_QUERIES 语句指定了输入文件 inputQueries 和输出文件 outputQueries

=> SELECT EXPORT_DIRECTED_QUERIES('/home/dbadmin/inputQueries','/home/dbadmin/outputQueries');
                               EXPORT_DIRECTED_QUERIES
---------------------------------------------------------------------------------------------
 1 queries successfully exported.
Queries exported to /home/dbadmin/outputQueries.

(1 row)

输入文件

您提供给 EXPORT_DIRECTED_QUERIES 的输入文件包含一个或多个输入查询。对于每个输入查询,您可以选择指定两个字段,这两个字段将用于生成的定向查询中:

  • DirQueryName 提供了定向查询的唯一标识符,即一个符合标识符中所述惯例的字符串。

  • DirQueryComment 指定了一个最多 128 个字符并以引号分隔的字符串。

您可以按如下方式设置每个输入查询的格式:

--DirQueryName=query-name
--DirQueryComment='comment'
input-query

例如,一个文件可以指定一个输入查询,如下所示:

/* Query: findEmployeesCityJobTitle_OPT */
/* Comment: This query finds all employees of a given city and job title, ordered by employee name */
SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name, employee_dimension.job_title FROM public.employee_dimension WHERE (employee_dimension.employee_city = 'Boston'::varchar(6)) ORDER BY employee_dimension.job_title;

输出文件

EXPORT_DIRECTED_QUERIES 会将生成用于创建定向查询的 SQL,并将此 SQL 写入到指定文件或标准输出中。对于这两种情况,输出都遵循以下格式:

/* Query: directed-query-name */
/* Comment: directed-query-comment */
SAVE QUERY input-query;
CREATE DIRECTED QUERY CUSTOM 'directed-query-name'
COMMENT 'directed-query-comment'
OPTVER 'vertica-release-num'
PSDATE 'timestamp'
annotated-query

例如,给定前面的输入时,Vertica 会将以下输出写入 /home/dbadmin/outputQueries

/* Query: findEmployeesCityJobTitle_OPT */
/* Comment: This query finds all employees of a given city and job title, ordered by employee name */
SAVE QUERY SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension WHERE ((employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)) ORDER BY employee_dimension.employee_last_name, employee_dimension.employee_first_name;
CREATE DIRECTED QUERY CUSTOM 'findEmployeesCityJobTitle_OPT'
COMMENT 'This query finds all employees of a given city and job title, ordered by employee name'
OPTVER 'Vertica Analytic Database v11.1.0-20220102'
PSDATE '2022-01-06 13:45:17.430254'
SELECT /*+verbatim*/ employee_dimension.employee_first_name AS employee_first_name, employee_dimension.employee_last_name AS employee_last_name
FROM public.employee_dimension AS employee_dimension/*+projs('public.employee_dimension')*/
WHERE (employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) AND (employee_dimension.job_title = 'Cashier'::varchar(7) /*+:v(2)*/)
ORDER BY 2 ASC, 1 ASC;

如果给定的输入查询省略了 DirQueryNameDirQueryComment 字段,则 EXPORT_DIRECTED_QUERIES 会自动生成以下输出:

  • /* Query: Autoname:时间戳 .n */,其中 n 为基于零的整数索引,用于确保具有相同时间戳的自动生成名称具有唯一性。

  • /* Comment: Optimizer-generated directed query */

例如,以下输入文件包含一个 SELECT 语句,并且省略 DirQueryNameDirQueryComment 字段:


SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name
FROM public.employee_dimension WHERE (employee_dimension.employee_city = 'Boston'::varchar(6))
ORDER BY employee_dimension.job_title;

给定这个文件,EXPORT_DIRECTED_QUERIES 会返回一个关于缺少输入字段的警告,它还会将该警告写入错误文件

> SELECT EXPORT_DIRECTED_QUERIES('/home/dbadmin/inputQueries2','/home/dbadmin/outputQueries3');
                                              EXPORT_DIRECTED_QUERIES
--------------------------------------------------------------------------------------------------------------
 1 queries successfully exported.
1 warning message was generated.
Queries exported to /home/dbadmin/outputQueries3.
See error report, /home/dbadmin/outputQueries3.err for details.
(1 row)

输出文件包含以下内容:

/* Query: Autoname:2022-01-06 14:11:23.071559.0 */
/* Comment: Optimizer-generated directed query */
SAVE QUERY SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name FROM public.employee_dimension WHERE (employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/) ORDER BY employee_dimension.job_title;
CREATE DIRECTED QUERY CUSTOM 'Autoname:2022-01-06 14:11:23.071559.0'
COMMENT 'Optimizer-generated directed query'
OPTVER 'Vertica Analytic Database v11.1.0-20220102'
PSDATE '2022-01-06 14:11:23.071559'
SELECT /*+verbatim*/ employee_dimension.employee_first_name AS employee_first_name, employee_dimension.employee_last_name AS employee_last_name
FROM public.employee_dimension AS employee_dimension/*+projs('public.employee_dimension')*/
WHERE (employee_dimension.employee_city = 'Boston'::varchar(6) /*+:v(1)*/)
ORDER BY employee_dimension.job_title ASC;

错误文件

如果在执行 EXPORT_DIRECTED_QUERIES 期间发生任何错误或警告,它会返回一条类似于以下内容的消息:

1 queries successfully exported.
1 warning message was generated.
Queries exported to /home/dbadmin/outputQueries.
See error report, /home/dbadmin/outputQueries.err for details.

EXPORT_DIRECTED_QUERIES 会将所有错误或警告都写入到一个在与输出文件相同的路径下所创建的文件中,并使用输出文件的基名。

在上一示例中,输出文件名为 /home/dbadmin/outputQueries,因此 EXPORT_DIRECTED_QUERIES 会将错误写入 /home/dbadmin/outputQueries.err

错误文件可以捕获许多错误和警告,例如,可以捕获 EXPORT_DIRECTED_QUERIES 无法创建定向查询的所有实例。在以下示例中,错误文件包含一个警告,指示没有为指定的输入查询提供名称字段,并且还记录了自动为该查询生成的名称:


----------------------------------------------------------------------------------------------------
WARNING: Name field not supplied. Using auto-generated name: 'Autoname:2016-10-13 09:44:33.527548.0'
Input Query: SELECT employee_dimension.employee_first_name, employee_dimension.employee_last_name, employee_dimension.job_title FROM public.employee_dimension WHERE (employee_dimension.employee_city = 'Boston'::varchar(6)) ORDER BY employee_dimension.job_title;
END WARNING

2.6.2 - 导入定向查询

确定您希望在当前数据库中使用的已导出查询计划后,使用 IMPORT_DIRECTED_QUERIES 将其导入。您为此函数提供您使用 EXPORT_DIRECTED_QUERIES 创建的导出文件的名称以及您希望导入的定向查询的名称。例如:


=> SELECT IMPORT_DIRECTED_QUERIES('/home/dbadmin/outputQueries','FindEmployeesBoston');
                                    IMPORT_DIRECTED_QUERIES
------------------------------------------------------------------------------------------
 1 directed queries successfully imported.
To activate a query named 'my_query1':
=> ACTIVATE DIRECTED QUERY 'my_query1';

(1 row)

导入所需的定向查询后,必须使用 ACTIVATE DIRECTED QUERY 将其激活,然后才能使用它们创建查询计划。

2.7 - 半联接和交叉联接语义

Vertica 优化器在定向查询中使用多个关键字来重新创建交叉联接和半联接子查询。它还支持使用一组额外的关键字来表示复杂交叉联接和半联接。您还可以在直接从 vsql 中执行的查询中使用这些关键字。

有关详细信息,请参阅以下主题:

2.7.1 - 半联接子查询语义

Vertica 优化器在定向查询中使用多个关键字来重新创建与特定搜索运算符的半联接子查询,例如 ANYNOT IN

  • [SEMI JOIN](#SEMI)
  • [NULLAWARE ANTI JOIN](#NULLAWARE)
  • [SEMIALL JOIN](#SEMIALL)
  • [ANTI JOIN](#ANTI)

SEMI JOIN

重新创建用来执行 semi-join 的查询,其中包含以 INEXISTANY 运算符开头的子查询。

输入查询

SELECT product_description FROM product_dimension
  WHERE product_dimension.product_key IN (SELECT qty_in_stock from inventory_fact);

查询计划


QUERY PLAN DESCRIPTION:
------------------------------

explain SELECT product_description FROM product_dimension WHERE product_dimension.product_key IN (SELECT qty_in_stock from inventory_fact);

 Access Path:
 +-JOIN HASH [Semi] [Cost: 1K, Rows: 30K] (PATH ID: 1) Outer (FILTER) Inner (RESEGMENT)
 |  Join Cond: (product_dimension.product_key = VAL(2))
 |  Materialize at Output: product_dimension.product_description
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for product_dimension [Cost: 152, Rows: 60K] (PATH ID: 2)
 | |      Projection: public.product_dimension
 | |      Materialize: product_dimension.product_key
 | |      Execute on: All Nodes
 | |      Runtime Filter: (SIP1(HashJoin): product_dimension.product_key)
 | +-- Inner -> SELECT [Cost: 248, Rows: 300K] (PATH ID: 3)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for inventory_fact [Cost: 248, Rows: 300K] (PATH ID: 4)
 | | |      Projection: public.inventory_fact_b0
 | | |      Materialize: inventory_fact.qty_in_stock
 | | |      Execute on: All Nodes

优化器生成的带注释查询

SELECT /*+ syntactic_join */ product_dimension.product_description AS product_description
FROM (public.product_dimension AS product_dimension/*+projs('public.product_dimension')*/ 
SEMI JOIN /*+Distrib(F,R),JType(H)*/ (SELECT inventory_fact.qty_in_stock AS qty_in_stock
FROM public.inventory_fact AS inventory_fact/*+projs('public.inventory_fact')*/) AS subQ_1
ON (product_dimension.product_key = subQ_1.qty_in_stock))

NULLAWARE ANTI JOIN

重新创建用来执行 null-aware anti-join 的查询,其中包含以 NOT IN!=ALL 运算符开头的子查询。

输入查询

SELECT product_description FROM product_dimension
WHERE product_dimension.product_key NOT IN (SELECT qty_in_stock from inventory_fact);

查询计划

 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT product_description FROM product_dimension WHERE product_dimension.product_key not IN (SELECT qty_in_sto
ck from inventory_fact);

 Access Path:
 +-JOIN HASH [Anti][NotInAnti] [Cost: 7K, Rows: 30K] (PATH ID: 1) Inner (BROADCAST)
 |  Join Cond: (product_dimension.product_key = VAL(2))
 |  Materialize at Output: product_dimension.product_description
 |  Execute on: Query Initiator
 | +-- Outer -> STORAGE ACCESS for product_dimension [Cost: 152, Rows: 60K] (PATH ID: 2)
 | |      Projection: public.product_dimension_DBD_2_rep_VMartDesign
 | |      Materialize: product_dimension.product_key
 | |      Execute on: Query Initiator
 | +-- Inner -> SELECT [Cost: 248, Rows: 300K] (PATH ID: 3)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for inventory_fact [Cost: 248, Rows: 300K] (PATH ID: 4)
 | | |      Projection: public.inventory_fact_DBD_9_seg_VMartDesign_b0
 | | |      Materialize: inventory_fact.qty_in_stock
 | | |      Execute on: All Nodes

优化器生成的带注释查询

SELECT /*+ syntactic_join */ product_dimension.product_description AS product_description
FROM (public.product_dimension AS product_dimension/*+projs('public.product_dimension')*/ 
NULLAWARE ANTI JOIN /*+Distrib(L,B),JType(H)*/ (SELECT inventory_fact.qty_in_stock AS qty_in_stock
FROM public.inventory_fact AS inventory_fact/*+projs('public.inventory_fact')*/) AS subQ_1
ON (product_dimension.product_key = subQ_1.qty_in_stock))

SEMIALL JOIN

重新创建用来执行 semi-all join 的查询,其中包含以 ALL 运算符开头的子查询。

输入查询

SELECT product_key, product_description FROM product_dimension
  WHERE product_dimension.product_key > ALL (SELECT product_key from inventory_fact);

查询计划


QUERY PLAN DESCRIPTION:
------------------------------

explain SELECT product_key, product_description FROM product_dimension WHERE product_dimension.product_key > ALL (SELECT product_key from inventory_fact);

Access Path:
+-JOIN HASH [Semi][All] [Cost: 7M, Rows: 30K] (PATH ID: 1) Outer (FILTER) Inner (BROADCAST)
|  Join Filter: (product_dimension.product_key > VAL(2))
|  Materialize at Output: product_dimension.product_description
|  Execute on: All Nodes
| +-- Outer -> STORAGE ACCESS for product_dimension [Cost: 152, Rows: 60K] (PATH ID: 2)
| |      Projection: public.product_dimension
| |      Materialize: product_dimension.product_key
| |      Execute on: All Nodes
| +-- Inner -> SELECT [Cost: 248, Rows: 300K] (PATH ID: 3)
| |      Execute on: All Nodes
| | +---> STORAGE ACCESS for inventory_fact [Cost: 248, Rows: 300K] (PATH ID: 4)
| | |      Projection: public.inventory_fact_b0
| | |      Materialize: inventory_fact.product_key
| | |      Execute on: All Nodes

优化器生成的带注释查询


SELECT /*+ syntactic_join */ product_dimension.product_key AS product_key, product_dimension.product_description AS product_description
FROM (public.product_dimension AS product_dimension/*+projs('public.product_dimension')*/ 
SEMIALL JOIN /*+Distrib(F,B),JType(H)*/ (SELECT inventory_fact.product_key AS product_key FROM public.inventory_fact AS inventory_fact/*+projs('public.inventory_fact')*/) AS subQ_1
ON (product_dimension.product_key > subQ_1.product_key))

ANTI JOIN

重新创建用来执行 anti-join 的查询,其中包含以 NOT EXISTS 运算符开头的子查询。

输入查询

SELECT product_key, product_description FROM product_dimension
WHERE NOT EXISTS (SELECT inventory_fact.product_key FROM inventory_fact
WHERE inventory_fact.product_key=product_dimension.product_key);

查询计划


 QUERY PLAN DESCRIPTION:
 ------------------------------

 explain SELECT product_key, product_description FROM product_dimension WHERE NOT EXISTS (SELECT inventory_fact.product_
key FROM inventory_fact WHERE inventory_fact.product_key=product_dimension.product_key);

 Access Path:
 +-JOIN HASH [Anti] [Cost: 703, Rows: 30K] (PATH ID: 1) Outer (FILTER)
 |  Join Cond: (VAL(1) = product_dimension.product_key)
 |  Materialize at Output: product_dimension.product_description
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for product_dimension [Cost: 152, Rows: 60K] (PATH ID: 2)
 | |      Projection: public.product_dimension_DBD_2_rep_VMartDesign
 | |      Materialize: product_dimension.product_key
 | |      Execute on: All Nodes
 | +-- Inner -> SELECT [Cost: 248, Rows: 300K] (PATH ID: 3)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for inventory_fact [Cost: 248, Rows: 300K] (PATH ID: 4)
 | | |      Projection: public.inventory_fact_DBD_9_seg_VMartDesign_b0
 | | |      Materialize: inventory_fact.product_key
 | | |      Execute on: All Nodes

优化器生成的带注释查询

SELECT /*+ syntactic_join */ product_dimension.product_key AS product_key, product_dimension.product_description AS product_description
FROM (public.product_dimension AS product_dimension/*+projs('public.product_dimension')*/
ANTI JOIN /*+Distrib(F,L),JType(H)*/ (SELECT inventory_fact.product_key AS "inventory_fact.product_key"
FROM public.inventory_fact AS inventory_fact/*+projs('public.inventory_fact')*/) AS subQ_1
ON (subQ_1."inventory_fact.product_key" = product_dimension.product_key))

2.7.2 - 复杂联接语义

优化器使用一组关键字来表达复杂交叉联接半联接。所有复杂联接都由关键字 COMPLEX 指示,该关键字插入在关键字 JOIN 之前,例如 CROSS COMPLEX JOIN。复杂半联接的语义有一个额外的要求,在下面有详细说明。

复杂交叉联接

Vertica 使用关键字短语 CROSS COMPLEX JOIN 来描述所有的复杂交叉联接。例如:

输入查询

SELECT
  (SELECT max(sales_quantity) FROM store.store_sales_fact) *
  (SELECT max(sales_quantity) FROM online_sales.online_sales_fact);

查询计划


 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT
   (SELECT max(sales_quantity) FROM store.store_sales_fact) *
   (SELECT max(sales_quantity) FROM online_sales.online_sales_fact);

 Access Path:
 +-JOIN  (CROSS JOIN) [Cost: 4K, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
 |  Execute on: Query Initiator
 | +-- Outer -> JOIN  (CROSS JOIN) [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 2)
 | |      Execute on: Query Initiator
 | | +-- Outer -> STORAGE ACCESS for dual [Cost: 10, Rows: 1] (PATH ID: 3)
 | | |      Projection: v_catalog.dual_p
 | | |      Materialize: dual.dummy
 | | |      Execute on: Query Initiator
 | | +-- Inner -> SELECT [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 4)
 | | |      Execute on: Query Initiator
 | | | +---> GROUPBY NOTHING [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 5)
 | | | |      Aggregates: max(store_sales_fact.sales_quantity)
 | | | |      Execute on: All Nodes
 | | | | +---> STORAGE ACCESS for store_sales_fact [Cost: 1K, Rows: 5M (NO STATISTICS)] (PATH ID: 6)
 | | | | |      Projection: store.store_sales_fact_super
 | | | | |      Materialize: store_sales_fact.sales_quantity
 | | | | |      Execute on: All Nodes
 | +-- Inner -> SELECT [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 7)
 | |      Execute on: Query Initiator
 | | +---> GROUPBY NOTHING [Cost: 2K, Rows: 1 (NO STATISTICS)] (PATH ID: 8)
 | | |      Aggregates: max(online_sales_fact.sales_quantity)
 | | |      Execute on: All Nodes
 | | | +---> STORAGE ACCESS for online_sales_fact [Cost: 1K, Rows: 5M (NO STATISTICS)] (PATH ID: 9)
 | | | |      Projection: online_sales.online_sales_fact_super
 | | | |      Materialize: online_sales_fact.sales_quantity
 | | | |      Execute on: All Nodes

优化器生成的带注释查询

以下带注释的查询返回与前面显示的输入查询相同的结果。与优化器生成的所有带注释查询一样,您可以直接在 vsql 中执行此查询(无论是按照编写还是修改后):

SELECT /*+syntactic_join,verbatim*/ (subQ_1.max * subQ_2.max) AS "?column?"
FROM ((v_catalog.dual AS dual CROSS COMPLEX JOIN /*+Distrib(L,L),JType(H)*/
(SELECT max(store_sales_fact.sales_quantity) AS max
FROM store.store_sales_fact AS store_sales_fact/*+projs('store.store_sales_fact')*/) AS subQ_1 )
CROSS COMPLEX JOIN /*+Distrib(L,L),JType(H)*/ (SELECT max(online_sales_fact.sales_quantity) AS max
FROM online_sales.online_sales_fact AS online_sales_fact/*+projs('online_sales.online_sales_fact')*/) AS subQ_2 )

复杂半联接

复杂半联接由以下关键字之一表达:

  • SEMI COMPLEX JOIN

  • NULLAWARE ANTI COMPLEX JOIN

  • SEMIALL COMPLEX JOIN

  • ANTI COMPLEX JOIN

额外要求适用于所有复杂半联接:每个子查询的 SELECT 列表都以调用 Vertica 元函数 complex_join_marker() 的虚拟列(标记为 false)结尾。当子查询处理每一行时,complex_join_marker() 返回 truefalse 以指示结果集中包含或排除该行。结果集与这个标志一起返回到外部查询,外部查询可以使用来自这个和其他子查询中的标志来筛选它自己的结果集。

例如,查询优化器将以下输入查询重写为 NULLAWARE ANTI COMPLEX JOIN。该联接返回子查询中其 complex_join_marker() 标志设置为适当布尔值的所有行。

输入查询

SELECT product_dimension.product_description FROM public.product_dimension
   WHERE (NOT (product_dimension.product_key NOT IN (SELECT inventory_fact.qty_in_stock FROM public.inventory_fact)));

查询计划

 QUERY PLAN DESCRIPTION:
 ------------------------------

 EXPLAIN SELECT product_dimension.product_description FROM public.product_dimension
   WHERE (NOT (product_dimension.product_key NOT IN (SELECT inventory_fact.qty_in_stock FROM public.inventory_fact)));

 Access Path:
 +-JOIN HASH [Anti][NotInAnti] [Cost: 3K, Rows: 30K] (PATH ID: 1) Inner (BROADCAST)
 |  Join Cond: (product_dimension.product_key = VAL(2))
 |  Materialize at Output: product_dimension.product_description
 |  Filter: (NOT VAL(2))
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for product_dimension [Cost: 56, Rows: 60K] (PATH ID: 2)
 | |      Projection: public.product_dimension_super
 | |      Materialize: product_dimension.product_key
 | |      Execute on: All Nodes
 | +-- Inner -> SELECT [Cost: 248, Rows: 300K] (PATH ID: 3)
 | |      Execute on: All Nodes
 | | +---> STORAGE ACCESS for inventory_fact [Cost: 248, Rows: 300K] (PATH ID: 4)
 | | |      Projection: public.inventory_fact_super
 | | |      Materialize: inventory_fact.qty_in_stock
 | | |      Execute on: All Nodes

优化器生成的带注释查询

以下带注释的查询返回与前面显示的输入查询相同的结果。与优化器生成的所有带注释查询一样,您可以直接在 vsql 中执行此查询(无论是按照编写还是修改后)。例如,您可以通过修改外部查询的谓词对 subQ_1."false" 标志的求值方式来控制外部查询的输出。

SELECT /*+syntactic_join,verbatim*/ product_dimension.product_description AS product_description
FROM (public.product_dimension AS product_dimension/*+projs('public.product_dimension')*/
NULLAWARE ANTI COMPLEX JOIN /*+Distrib(L,B),JType(H)*/
   (SELECT inventory_fact.qty_in_stock AS qty_in_stock, complex_join_marker() AS "false"
    FROM public.inventory_fact AS inventory_fact/*+projs('public.inventory_fact')*/) AS subQ_1
ON (product_dimension.product_key = subQ_1.qty_in_stock)) WHERE (NOT subQ_1."false")

2.8 - 限制

定向查询支持各种查询,但也有一些例外。Vertica 通过优化器生成的警告处理所有例外。以下部分将这些限制分为几类。

表和投影

存在以下限制:

  • 优化器生成的定向查询不支持引用系统表或数据收集器表的查询。例外情况:V_CATALOG.DUAL 的显式和隐式引用。

  • 优化器生成的定向查询不支持所含表内具有访问策略的查询。

  • 定向查询不支持不含投影的表。

函数

不支持包括以下函数的查询:

运算符和子句

不支持包括以下内容的查询:

  • WITH 子句(启用实体化时

  • 包括日期/时间字面量的查询,这些字面量引用当前时间,如 NOWYESTERDAY

数据类型

不支持包括以下数据类型的查询: