这是本节的多页打印视图。 点击此处打印.

返回本页常规视图.

查询计划信息和结构

根据查询和数据库架构的情况,EXPLAIN 输出会包含以下信息:

  • 语句引用的表

  • 估计成本

  • 估计行基数

  • 路径 ID,一个用于链接至错误消息和分析计数器的整数,以便您可以更轻松地解决性能问题。有关详细信息,请参阅分析查询计划

  • 数据操作,如 SORTFILTERLIMITGROUP BY

  • 使用的投影

  • 关于统计的信息—例如,它们是当前的还是超出范围

  • 查询中为操作选择的算法,如 HASH/MERGEGROUPBY HASH/GROUPBY PIPELINED

  • 群集节点间的数据再分发(广播、分段)

示例

在下面的 EXPLAIN 输出中,优化器用三个步骤处理查询,每个步骤通过唯一路径 ID 标识:

  • 0:限制

  • 1:排序

  • 2:存储空间访问和筛选

1 - 查询计划统计信息

如果您查询统计信息不可用或已过时的表,优化器可能会选择不理想的查询计划。

可通过调用 ANALYZE_STATISTICS 来解决与表统计信息相关的许多问题。此函数可用于更新不同范围的统计信息:一个或多个表列、单个表或所有数据库表。

如果您更新统计信息后发现查询执行仍不理想,请通过 Database Designer 运行查询并选择增量设计作为设计类型。

有关更新数据库统计信息的详细信息,请参阅收集数据库统计信息

查询计划中的统计信息提示

查询计划可通过以下两个提示包含表统计信息相关信息: NO STATISTICSSTALE STATISTICS。例如,以下查询计划片段包括 NO STATISTICS,表示直方图不可用:

| | +-- Outer -> STORAGE ACCESS for fact [Cost: 604, Rows: 10K (NO STATISTICS)]

以下查询计划片段包括 STALE STATISTICS,表示谓词已超出直方图范围:

| | +-- Outer -> STORAGE ACCESS for fact [Cost: 35, Rows: 1 (STALE STATISTICS)]

2 - Cost 和 Row 路径

以下 EXPLAIN 输出显示 Cost 运算符:

 Access Path: +-SELECT  LIMIT 10 [Cost: 370, Rows: 10] (PATH ID: 0)
 |  Output Only: 10 tuples
 |  Execute on: Query Initiator
 | +---> SORT [Cost: 370, Rows: 544] (PATH ID: 1)
 | |      Order: customer_dimension.customer_name ASC
 | |      Output Only: 10 tuples
 | |      Execute on: Query Initiator
 | | +---> STORAGE ACCESS for customer_dimension [Cost: 331, Rows: 544] (PATH ID: 2)
 | | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | | |      Materialize: customer_dimension.customer_state, customer_dimension.customer_name
 | | |      Filter: (customer_dimension.customer_gender = 'Male')
 | | |      Filter: (customer_dimension.customer_state = ANY (ARRAY['MA', 'NH']))
 | | |      Execute on: Query Initiator

Row 运算符是优化器估计查询将要返回的行数。数字后面的字母表示度量单位(K=千、M=百万、B=十亿、T=万亿),因此,以下查询的输出指示要返回的行数为 50 K

=> EXPLAIN SELECT customer_gender FROM customer_dimension;
 Access Path:
 +-STORAGE ACCESS for customer_dimension [Cost: 17, Rows: 50K (3 RLE)] (PATH ID: 1)
 |  Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 |  Materialize: customer_dimension.customer_gender
 |  Execute on: Query Initiator

STORAGE ACCESS 路径中对 (3 RLE) 的引用表明优化器估计存储访问运算符返回的行数为 50K。由于列为运行长度编码 (RLE),因此返回的 RLE 实际行数仅为三行:

  • 1 行用于女性

  • 1 行用于男性

  • 1 行用于表示未知 (NULL) 性别

3 - Projection 路径

通过查看文本输出中的 Projection 路径,您可以了解优化器为查询计划选择了哪些 投影

EXPLAIN SELECT
   customer_name,
   customer_state
 FROM customer_dimension
 WHERE customer_state in ('MA','NH')
 AND customer_gender = 'Male'
 ORDER BY customer_name
 LIMIT 10;
 Access Path:
 +-SELECT  LIMIT 10 [Cost: 370, Rows: 10] (PATH ID: 0)
 |  Output Only: 10 tuples
 |  Execute on: Query Initiator
 | +---> SORT [Cost: 370, Rows: 544] (PATH ID: 1)
 | |      Order: customer_dimension.customer_name ASC
 | |      Output Only: 10 tuples
 | |      Execute on: Query Initiator
 | | +---> STORAGE ACCESS for customer_dimension [Cost: 331, Rows: 544] (PATH ID: 2)
 | | |      Projection: public.customer_dimension_DBD_1_rep_vmart_vmart_node0001
 | | |      Materialize: customer_dimension.customer_state, customer_dimension.customer_name
 | | |      Filter: (customer_dimension.customer_gender = 'Male')
 | | |      Filter: (customer_dimension.customer_state = ANY (ARRAY['MA', 'NH']))
 | | |      Execute on: Query Initiator

查询优化器会自动选取最佳投影,但如果没有合理准确的统计信息,则优化器为查询选择的投影或联接顺序可能会不太理想。有关详细信息,请参阅收集统计信息

Vertica 在为计划选择投影时会考虑以下几个方面:

  • 如何在查询中联接各个列

  • 如何对投影进行分组或排序

  • 是否应用 SQL 分析操作

  • 磁盘上投影存储中的所有列信息

当 Vertica 评估每个计划的可能性时,初始成本较高的投影可能会进入到最终计划之中,因为这些投影会减少联接开销。例如,可能会为某个查询提供许多可能的计划,优化器在从中作出选择之前会对这些计划进行评估。为提高效率,优化器会使用复杂的算法删除成本较高的中间部分计划片段。优化器知道中间计划片段可能最初看起来很糟糕(由于存储访问成本很高),但可能会因为它所允许的其他优化而产生出色的最终计划。

如果在统计信息已是最新的情况下查询仍然执行欠佳,请通过 Database Designer 运行该查询。有关详细信息,请参阅增量设计

提示

  • 要对已分段的不同投影进行测试,请在查询中按名称引用投影。

  • 要获得最佳性能,请将查询编写为将列按照投影列的排序方式进行排序。

另请参阅

4 - Join 路径

与引用两个或更多表的联接查询相似,查询计划中的 Join 步骤有两个输入分支:

  • 左输入,其为联接的外部表

  • 右输入,其为联接的内部表

在以下查询中,T1 表为左输入,因为它在 JOIN 关键字的左侧;T2 表为右输入,因为它在 JOIN 关键字的右侧:

SELECT * FROM T1 JOIN T2 ON T1.x = T2.x;

外部联接与内部联接

如果使用小型表作为联接的内部输入,则查询性能会更好。查询优化器会自动对联接的输入进行重新排序以确保这种情况,除非相关联接为外部联接。

以下示例显示左外部联接的查询及其计划:

=> EXPLAIN SELECT CD.annual_income,OSI.sale_date_key
-> FROM online_sales.online_sales_fact OSI
-> LEFT OUTER JOIN customer_dimension CD ON CD.customer_key = OSI.customer_key;
 Access Path:
 +-JOIN HASH [LeftOuter] [Cost: 4K, Rows: 5M] (PATH ID: 1)
 |  Join Cond: (CD.customer_key = OSI.customer_key)
 |  Materialize at Output: OSI.sale_date_key
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 2)
 | |      Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
 | |      Materialize: OSI.customer_key
 | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for CD [Cost: 264, Rows: 50K] (PATH ID: 3)
 | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | |      Materialize: CD.annual_income, CD.customer_key
 | |      Execute on: All Nodes

以下示例显示完整外部联接的查询及其计划:

=> EXPLAIN SELECT CD.annual_income,OSI.sale_date_key
-> FROM online_sales.online_sales_fact OSI
-> FULL OUTER JOIN customer_dimension CD ON CD.customer_key = OSI.customer_key;
 Access Path:
 +-JOIN HASH [FullOuter] [Cost: 18K, Rows: 5M] (PATH ID: 1) Outer (RESEGMENT) Inner (FILTER)
 |  Join Cond: (CD.customer_key = OSI.customer_key)
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 2)
 | |      Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
 | |      Materialize: OSI.sale_date_key, OSI.customer_key
 | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for CD [Cost: 264, Rows: 50K] (PATH ID: 3)
 | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | |      Materialize: CD.annual_income, CD.customer_key
 | |      Execute on: All Nodes

散列联接和合并联接

Vertica 有两个联接算法可供选择:合并联接和散列联接。系统中给定查询和投影时,优化器会自动选择最适合的算法。

对于以下查询,优化器会选择散列联接。

=> EXPLAIN SELECT CD.annual_income,OSI.sale_date_key
-> FROM online_sales.online_sales_fact OSI
-> INNER JOIN customer_dimension CD ON CD.customer_key = OSI.customer_key;
 Access Path:
 +-JOIN HASH [Cost: 4K, Rows: 5M] (PATH ID: 1)
 |  Join Cond: (CD.customer_key = OSI.customer_key)
 |  Materialize at Output: OSI.sale_date_key
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 2)
 | |      Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
 | |      Materialize: OSI.customer_key
 | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for CD [Cost: 264, Rows: 50K] (PATH ID: 3)
 | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | |      Materialize: CD.annual_income, CD.customer_key
 | |      Execute on: All Nodes

在下一个示例中,优化器会选择合并联接。优化器的首次传递会执行合并联接,因为输入已预先排序,然后它再执行散列联接。

=> EXPLAIN SELECT count(*) FROM online_sales.online_sales_fact OSI
-> INNER JOIN customer_dimension CD ON CD.customer_key = OSI.customer_key
-> INNER JOIN product_dimension PD ON PD.product_key = OSI.product_key;
 Access Path:
 +-GROUPBY NOTHING [Cost: 8K, Rows: 1] (PATH ID: 1)
 |  Aggregates: count(*)
 |  Execute on: All Nodes
 | +---> JOIN HASH [Cost: 7K, Rows: 5M] (PATH ID: 2)
 | |      Join Cond: (PD.product_key = OSI.product_key)
 | |      Materialize at Input: OSI.product_key
 | |      Execute on: All Nodes
 | | +-- Outer -> JOIN MERGEJOIN(inputs presorted) [Cost: 4K, Rows: 5M] (PATH ID: 3)
 | | |      Join Cond: (CD.customer_key = OSI.customer_key)
 | | |      Execute on: All Nodes
 | | | +-- Outer -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 4)
 | | | |      Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
 | | | |      Materialize: OSI.customer_key
 | | | |      Execute on: All Nodes
 | | | +-- Inner -> STORAGE ACCESS for CD [Cost: 132, Rows: 50K] (PATH ID: 5)
 | | | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | | | |      Materialize: CD.customer_key
 | | | |      Execute on: All Nodes
 | | +-- Inner -> STORAGE ACCESS for PD [Cost: 152, Rows: 60K] (PATH ID: 6)
 | | |      Projection: public.product_dimension_DBD_2_rep_vmartdb_design_vmartdb_design_node0001
 | | |      Materialize: PD.product_key
 | | |      Execute on: All Nodes

不等于联接

Vertica 使用等于谓词处理联接的效率非常高。查询计划将等于联接谓词显示为联接条件 (Join Cond)。

=> EXPLAIN SELECT CD.annual_income, OSI.sale_date_key
-> FROM online_sales.online_sales_fact OSI
-> INNER JOIN customer_dimension CD
-> ON CD.customer_key = OSI.customer_key;
 Access Path:
 +-JOIN HASH [Cost: 4K, Rows: 5M] (PATH ID: 1)
 |  Join Cond: (CD.customer_key = OSI.customer_key)
 |  Materialize at Output: OSI.sale_date_key
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 2)
 | |      Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
 | |      Materialize: OSI.customer_key
 | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for CD [Cost: 264, Rows: 50K] (PATH ID: 3)
 | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | |      Materialize: CD.annual_income, CD.customer_key
 | |      Execute on: All Nodes

但是,对不等于联接的处理方式与交叉联接类似,其运行效率较低,您可以通过查看两个查询之间的成本变化对此进行了解:

=> EXPLAIN SELECT CD.annual_income, OSI.sale_date_key
-> FROM online_sales.online_sales_fact OSI
-> INNER JOIN customer_dimension CD
-> ON CD.customer_key < OSI.customer_key;
 Access Path:
 +-JOIN HASH [Cost: 98M, Rows: 5M] (PATH ID: 1)
 |  Join Filter: (CD.customer_key < OSI.customer_key)
 |  Materialize at Output: CD.annual_income
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for CD [Cost: 132, Rows: 50K] (PATH ID: 2)
 | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | |      Materialize: CD.customer_key
 | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for OSI [Cost: 3K, Rows: 5M] (PATH ID: 3)
 | |      Projection: online_sales.online_sales_fact_DBD_12_seg_vmartdb_design_vmartdb_design
 | |      Materialize: OSI.sale_date_key, OSI.customer_key
 | |      Execute on: All Nodes

事件序列联接

事件序列联接由 INTERPOLATED 路径指示。

=> EXPLAIN SELECT * FROM hTicks h FULL OUTER JOIN aTicks a -> ON (h.time INTERPOLATE PREVIOUS
 Access Path:
 +-JOIN  (INTERPOLATED) [FullOuter] [Cost: 31, Rows: 4 (NO STATISTICS)] (PATH ID: 1)
   Outer (SORT ON JOIN KEY) Inner (SORT ON JOIN KEY)
 |  Join Cond: (h."time" = a."time")
 |  Execute on: Query Initiator
 | +-- Outer -> STORAGE ACCESS for h [Cost: 15, Rows: 4 (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.hTicks_node0004
 | |      Materialize: h.stock, h."time", h.price
 | |      Execute on: Query Initiator
 | +-- Inner -> STORAGE ACCESS for a [Cost: 15, Rows: 4 (NO STATISTICS)] (PATH ID: 3)
 | |      Projection: public.aTicks_node0004
 | |      Materialize: a.stock, a."time", a.price
 | |      Execute on: Query Initiator

5 - 路径 ID

PATH ID 是 Vertica 为查询计划内的每个操作(路径)所分配的唯一标识符。以下项共享同一标识符:

路径 ID 可帮助您追踪问题的根本原因。例如,如果查询返回联接错误,请在查询前加上 EXPLAIN 并在查询计划中查找 PATH ID n,以查看查询中的哪个联接有问题。

例如,下面的 EXPLAIN 输出显示优化器查询计划中每个路径的路径 ID:

=> EXPLAIN SELECT * FROM fact JOIN dim ON x=y JOIN ext on y=z;
 Access Path:
 +-JOIN MERGEJOIN(inputs presorted) [Cost: 815, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
 | Join Cond: (dim.y = ext.z)
 | Materialize at Output: fact.x
 | Execute on: All Nodes
 | +-- Outer -> JOIN MERGEJOIN(inputs presorted) [Cost: 408, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | | Join Cond: (fact.x = dim.y)
 | | Execute on: All Nodes
 | | +-- Outer -> STORAGE ACCESS for fact [Cost: 202, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
 | | | Projection: public.fact_super
 | | | Materialize: fact.x
 | | | Execute on: All Nodes
 | | +-- Inner -> STORAGE ACCESS for dim [Cost: 202, Rows: 10K (NO STATISTICS)] (PATH ID: 4)
 | | | Projection: public.dim_super
 | | | Materialize: dim.y
 | | | Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for ext [Cost: 202, Rows: 10K (NO STATISTICS)] (PATH ID: 5)
 | | Projection: public.ext_super
 | | Materialize: ext.z
 | | Execute on: All Nodes

6 - Filter 路径

Filter 步骤对单个表上的谓词求值。它接受一组行、清除其中的某些行(基于查询中提供的条件)并返回剩余行。例如,优化器可以筛选将要与其他重新分段的联接输入进行联接的联接输入的本地数据。

以下语句查询 customer_dimension 表,并使用 WHERE 子句仅筛选马萨诸塞州和新罕布什尔州的男性客户结果。

EXPLAIN SELECT
  CD.customer_name,
  CD.customer_state,
  AVG(CD.customer_age) AS avg_age,
  COUNT(*) AS count
FROM customer_dimension CD
WHERE CD.customer_state in ('MA','NH') AND CD.customer_gender = 'Male'
GROUP BY CD.customer_state, CD.customer_name;

查询计划输出如下:

 Access Path:
 +-GROUPBY HASH [Cost: 378, Rows: 544] (PATH ID: 1)
 |  Aggregates: sum_float(CD.customer_age), count(CD.customer_age), count(*)
 |  Group By: CD.customer_state, CD.customer_name
 |  Execute on: Query Initiator
 | +---> STORAGE ACCESS for CD [Cost: 372, Rows: 544] (PATH ID: 2)
 | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | |      Materialize: CD.customer_state, CD.customer_name, CD.customer_age
 | |      Filter: (CD.customer_gender = 'Male')
 | |      Filter: (CD.customer_state = ANY (ARRAY['MA', 'NH']))
 | |      Execute on: Query Initiator

7 - GROUP BY 路径

GROUP BY 操作具有两种算法:

  • GROUPBY HASH 输入不按组列排序,因此 Vertica 用这些组列构建一个哈希表,以便处理聚合和 Group By 表达式。

  • GROUPBY PIPELINED 需要在组中指定的列上对输入进行预分类,这意味着 Vertica 仅需要在内存中保留当前组中的数据。GROUPBY PIPELINED 操作是首选算法,因为它们通常比 GROUPBY HASH 更快并且需要的内存更少。GROUPBY PIPELINED 对按列或 DISTINCT 聚合处理大量高基数组的查询尤为有用。

如果可能,查询优化器会选择更快的算法 GROUPBY PIPELINED 而非 GROUPBY HASH。

7.1 - GROUPBY HASH 查询计划

下面是一个关于 GROUPBY HASH 操作在 EXPLAIN 输出中的样式的示例。

=> EXPLAIN SELECT COUNT(DISTINCT annual_income)
     FROM customer_dimension
     WHERE customer_region='NorthWest';

该输出显示优化器选择了效率更低的 GROUPBY HASH 方法,这意味着投影未在 annual_income 列上预分类。如果此类投影可用,优化器将选择 GROUPBY PIPELINED 算法。

Access Path:
+-GROUPBY NOTHING [Cost: 256, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
|  Aggregates: count(DISTINCT customer_dimension.annual_income)
| +---> GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 253, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| |      Group By: customer_dimension.annual_income
| | +---> STORAGE ACCESS for customer_dimension [Cost: 227, Rows: 50K (NO STATISTICS)] (PATH ID: 3)
| | |      Projection: public.customer_dimension_super
| | |      Materialize: customer_dimension.annual_income
| | |      Filter: (customer_dimension.customer_region = 'NorthWest'
...

7.2 - GROUPBY PIPELINED 查询计划

如果投影已按 customer_gender 列排序,优化器将选择更快的 GROUPBY PIPELINED 操作:

 => EXPLAIN SELECT COUNT(distinct customer_gender) from customer_dimension;
Access Path:
 +-GROUPBY NOTHING [Cost: 22, Rows: 1] (PATH ID: 1)
 |  Aggregates: count(DISTINCT customer_dimension.customer_gender)
 |  Execute on: Query Initiator
 | +---> GROUPBY PIPELINED [Cost: 20, Rows: 10K] (PATH ID: 2)
 | |      Group By: customer_dimension.customer_gender
 | |      Execute on: Query Initiator
 | | +---> STORAGE ACCESS for customer_dimension [Cost: 17, Rows: 50K (3 RLE)] (PATH ID: 3)
 | | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | | |      Materialize: customer_dimension.customer_gender
 | | |      Execute on: Query Initiator

同样,使用相等谓词(如在以下查询中)将保留 GROUPBY PIPELINED

=> EXPLAIN SELECT COUNT(DISTINCT annual_income)    FROM customer_dimension
   WHERE customer_gender = 'Female';

 Access Path: +-GROUPBY NOTHING [Cost: 161, Rows: 1] (PATH ID: 1)
 |  Aggregates: count(DISTINCT customer_dimension.annual_income)
 | +---> GROUPBY PIPELINED [Cost: 158, Rows: 10K] (PATH ID: 2)
 | |      Group By: customer_dimension.annual_income
 | | +---> STORAGE ACCESS for customer_dimension [Cost: 144, Rows: 47K] (PATH ID: 3)
 | | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | | |      Materialize: customer_dimension.annual_income
 | | |      Filter: (customer_dimension.customer_gender = 'Female')

8 - Sort 路径

SORT 运算符根据指定的列列表来排序数据。EXPLAIN 输出指示排序表达式以及排列顺序是升序 (ASC) 还是降序 (DESC)。

例如,以下查询计划显示 SORT 运算符的列列表性质:

EXPLAIN SELECT
  CD.customer_name,
  CD.customer_state,
  AVG(CD.customer_age) AS avg_age,
  COUNT(*) AS count
FROM customer_dimension CD
WHERE CD.customer_state in ('MA','NH')
  AND CD.customer_gender = 'Male'
GROUP BY CD.customer_state, CD.customer_name
ORDER BY avg_age, customer_name;
 Access Path:
 +-SORT [Cost: 422, Rows: 544] (PATH ID: 1)
 |  Order: (<SVAR> / float8(<SVAR>)) ASC, CD.customer_name ASC
 |  Execute on: Query Initiator
 | +---> GROUPBY HASH [Cost: 378, Rows: 544] (PATH ID: 2)
 | |      Aggregates: sum_float(CD.customer_age), count(CD.customer_age), count(*)
 | |      Group By: CD.customer_state, CD.customer_name
 | |      Execute on: Query Initiator
 | | +---> STORAGE ACCESS for CD [Cost: 372, Rows: 544] (PATH ID: 3)
 | | |      Projection: public.customer_dimension_DBD_1_rep_vmart_vmart_node0001
 | | |      Materialize: CD.customer_state, CD.customer_name, CD.customer_age
 | | |      Filter: (CD.customer_gender = 'Male')
 | | |      Filter: (CD.customer_state = ANY (ARRAY['MA', 'NH']))
 | | |      Execute on: Query Initiator

如果将排序顺序更改为降序,查询计划将发生变化:

EXPLAIN SELECT
  CD.customer_name,
  CD.customer_state,
  AVG(CD.customer_age) AS avg_age,
  COUNT(*) AS count
FROM customer_dimension CD
WHERE CD.customer_state in ('MA','NH')
  AND CD.customer_gender = 'Male'
GROUP BY CD.customer_state, CD.customer_name
ORDER BY avg_age DESC, customer_name;
 Access Path:
 +-SORT [Cost: 422, Rows: 544] (PATH ID: 1)
 |  Order: (<SVAR> / float8(<SVAR>)) DESC, CD.customer_name ASC
 |  Execute on: Query Initiator
 | +---> GROUPBY HASH [Cost: 378, Rows: 544] (PATH ID: 2)
 | |      Aggregates: sum_float(CD.customer_age), count(CD.customer_age), count(*)
 | |      Group By: CD.customer_state, CD.customer_name
 | |      Execute on: Query Initiator
 | | +---> STORAGE ACCESS for CD [Cost: 372, Rows: 544] (PATH ID: 3)
 | | |      Projection: public.customer_dimension_DBD_1_rep_vmart_vmart_node0001
 | | |      Materialize: CD.customer_state, CD.customer_name, CD.customer_age
 | | |      Filter: (CD.customer_gender = 'Male')
 | | |      Filter: (CD.customer_state = ANY (ARRAY['MA', 'NH']))
 | | |      Execute on: Query Initiator

9 - Limit 路径

LIMIT 路径根据查询中的 LIMIT 子句限制结果行数。在具有数千行的查询中使用 LIMIT 子句可能会提高查询性能。

优化器在查询中尽可能向下推动 LIMIT 操作。查询中的一条 LIMIT 子句可以生成多个 Output Only 计划注释。

 => EXPLAIN SELECT COUNT(DISTINCT annual_income) FROM customer_dimension LIMIT 10;
Access Path:
 +-SELECT  LIMIT 10 [Cost: 161, Rows: 10] (PATH ID: 0)
 |  Output Only: 10 tuples
 | +---> GROUPBY NOTHING [Cost: 161, Rows: 1] (PATH ID: 1)
 | |      Aggregates: count(DISTINCT customer_dimension.annual_income)
 | |      Output Only: 10 tuples
 | | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 158, Rows: 10K] (PATH ID: 2)
 | | |      Group By: customer_dimension.annual_income
 | | | +---> STORAGE ACCESS for customer_dimension [Cost: 132, Rows: 50K] (PATH ID: 3)
 | | | |      Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001
 | | | |      Materialize: customer_dimension.annual_income

10 - Data Redistribution 路径

优化器可通过两种方式重新分布联接数据:

  • 广播

  • 重新分段

广播

广播操作会将中间结果的完整副本发送到群集中的所有节点。在以下情况下,可对联接使用广播。

  • 一个表与其他表相比很小(通常指内部表)。

  • Vertica 可避免其他大型上游重新分段操作。

  • 外部联接或子查询的语义需要复制联接一侧的内容。

例如:

=> EXPLAIN SELECT * FROM T1 LEFT JOIN T2 ON T1.a > T2.y;
 Access Path:
 +-JOIN HASH [LeftOuter] [Cost: 40K, Rows: 10K (NO STATISTICS)] (PATH ID: 1) Inner (BROADCAST)
 |  Join Filter: (T1.a > T2.y)
 |  Materialize at Output: T1.b
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for T1 [Cost: 151, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.T1_b0
 | |      Materialize: T1.a
 | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for T2 [Cost: 302, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
 | |      Projection: public.T2_b0
 | |      Materialize: T2.x, T2.y
 | |      Execute on: All Nodes

重新分段

重新分段操作会获取现有投影或中间关系,并在所有群集节点之间均匀地对数据进行重新分段。在重新分段操作结束时,输入关系中的每一行仅位于一个节点上。如果本地联接数据尚未进行分段,则重新分段将是对 Vertica 中的分布式联接最常使用的操作。有关更多详细信息,请参阅相同分段

例如:

=> CREATE TABLE T1 (a INT, b INT) SEGMENTED BY HASH(a) ALL NODES;
=> CREATE TABLE T2 (x INT, y INT) SEGMENTED BY HASH(x) ALL NODES;
=> EXPLAIN SELECT * FROM T1 JOIN T2 ON T1.a = T2.y;

 ------------------------------ QUERY PLAN DESCRIPTION: ------------------------------
 Access Path:
 +-JOIN HASH [Cost: 639, Rows: 10K (NO STATISTICS)] (PATH ID: 1) Inner (RESEGMENT)
 |  Join Cond: (T1.a = T2.y)
 |  Materialize at Output: T1.b
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for T1 [Cost: 151, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.T1_b0
 | |      Materialize: T1.a
 | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for T2 [Cost: 302, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
 | |      Projection: public.T2_b0
 | |      Materialize: T2.x, T2.y
 | |      Execute on: All Nodes

11 - 分析函数路径

Vertica 通过将同一查询中的多个 SQL-99 分析函数分组到 Analytic Group 区域,来尝试对这些分析函数进行优化。

对于每个分析组,Vertica 将根据需要执行数据分布式排序和重新分段。

可根据查询计划判断需要多少次排序和重新分段。

例如,以下查询计划显示 FIRST_VALUELAST_VALUE 函数在同一分析组中,因为它们的 OVER 子句相同。相反,ROW_NUMBER() 具有不同的 ORDER BY 子句,因此它在其他分析组中。由于这两个组共享同一 PARTITION BY deal_stage 子句,因此无需在组间对数据重新分段:

EXPLAIN SELECT
  first_value(deal_size) OVER (PARTITION BY deal_stage
  ORDER BY deal_size),
  last_value(deal_size) OVER (PARTITION BY deal_stage
  ORDER BY deal_size),
  row_number() OVER (PARTITION BY deal_stage
  ORDER BY largest_bill_amount)
  FROM customer_dimension;

 Access Path:
 +-ANALYTICAL [Cost: 1K, Rows: 50K] (PATH ID: 1)
 |  Analytic Group
 |   Functions: row_number()
 |   Group Sort: customer_dimension.deal_stage ASC, customer_dimension.largest_bill_amount ASC NULLS LAST
 |  Analytic Group
 |   Functions: first_value(), last_value()
 |   Group Filter: customer_dimension.deal_stage
 |   Group Sort: customer_dimension.deal_stage ASC, customer_dimension.deal_size ASC NULL LAST
 |  Execute on: All Nodes
 | +---> STORAGE ACCESS for customer_dimension [Cost: 263, Rows: 50K]
         (PATH ID: 2)
 | |      Projection: public.customer_dimension_DBD_1_rep_vmart_vmart_node0001
 | |      Materialize: customer_dimension.largest_bill_amount,
          customer_dimension.deal_stage, customer_dimension.deal_size
 | |      Execute on: All Nodes

另请参阅

调用分析函数

12 - 节点故障信息

Vertica 在群集节点出现故障时对性能进行优化,方法是将故障节点的工作均匀分摊到整个群集的可用节点上。

当群集中的某个节点出现故障时,查询计划会识别将在哪个节点执行查询。为了帮助您快速识别大型群集上的故障节点,当正在运行的节点数少于或等于六个时,EXPLAIN 输出最多列出六个节点;当正在运行的节点数多于六个时,将仅列出故障节点。

下表提供了更多详细信息:

示例

在以下示例中,故障节点为 v_vmart_node0005,节点 v_vmart_node0006 将执行查询的此次运行。

=> EXPLAIN SELECT * FROM test;
QUERY PLAN
-----------------------------------------------------------------------------
QUERY PLAN DESCRIPTION:
------------------------------
EXPLAIN SELECT * FROM my1table;
Access Path:
+-STORAGE ACCESS for my1table [Cost: 10, Rows: 2] (PATH ID: 1)
| Projection: public.my1table_b0
| Materialize: my1table.c1, my1table.c2
| Execute on: All Except v_vmart_node0005
+-STORAGE ACCESS for my1table (REPLACEMENT FOR DOWN NODE) [Cost: 66, Rows: 2]
| Projection: public.my1table_b1
| Materialize: my1table.c1, my1table.c2
| Execute on: v_vmart_node0006

以下示例片段中的 All Permanent Nodes 输出指示节点列表仅适用于永久(非短暂)节点:

=> EXPLAIN SELECT * FROM my2table;
Access Path:
+-STORAGE ACCESS for my2table [Cost: 18, Rows:6 (NO STATISTICS)] (PATH ID: 1)
|  Projection: public.my2tablee_b0
|  Materialize: my2table.x, my2table.y, my2table.z
|  Execute on: All Permanent Nodes

13 - MERGE 路径

如果 MERGE 语句及其表满足 MERGE 优化中所述的标准,则 Vertica 会为该语句准备优化的查询计划。

使用 EXPLAIN 关键字可以确定 Vertica 是否可以为给定的 MERGE 语句生成优化的查询计划。如果可以进行优化,EXPLAIN 生成的输出将包含 [Semi] 路径,如以下示例片段所示:


...
Access Path:
+-DML DELETE [Cost: 0, Rows: 0]
|  Target Projection: public.A_b1 (DELETE ON CONTAINER)
|  Target Prep:
|  Execute on: All Nodes
| +---> JOIN MERGEJOIN(inputs presorted) [Semi] [Cost: 6, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
         Inner (RESEGMENT)
| |      Join Cond: (A.a1 = VAL(2))
| |      Execute on: All Nodes
| | +-- Outer -> STORAGE ACCESS for A [Cost: 2, Rows: 2 (NO STATISTICS)] (PATH ID: 2)
...

相反,如果 Vertica 无法创建优化计划,EXPLAIN 生成的输出将包含 RightOuter 路径:


...
Access Path: +-DML MERGE
 |  Target Projection: public.locations_b1
 |  Target Projection: public.locations_b0
 |  Target Prep:
 |  Execute on: All Nodes
 | +---> JOIN MERGEJOIN(inputs presorted) [RightOuter] [Cost: 28, Rows: 3 (NO STATISTICS)] (PATH ID: 1) Outer (RESEGMENT) Inner (RESEGMENT)
 | |      Join Cond: (locations.user_id = VAL(2)) AND (locations.location_x = VAL(2)) AND (locations.location_y = VAL(2))
 | |      Execute on: All Nodes
 | | +-- Outer -> STORAGE ACCESS for <No Alias> [Cost: 15, Rows: 2 (NO STATISTICS)] (PATH ID: 2)
...