Manually choosing an ECS strategy
When the number of nodes in a subcluster is greater than the number of database shards, the Vertica query optimizer uses elastic crunch scaling (ECS) to involve all nodes in processing queries. For each shard, the optimizer divides responsibility for processing shard data among its subscribing nodes, using one of the following strategies:
Strategy | Description |
---|---|
I/O-optimized | Optimizer divides the list of ROS containers in the shard among the subscribing nodes. Use this strategy when nodes must fetch the data for the query from communal storage, rather than the depot. Nodes only fetch the ROS containers they need to resolve the query from communal storage, reducing the amount of data each needs to transfer from communal storage. Due to the arbitrary division of data among the nodes, this strategy does not support query optimizations that rely on data segmentation. |
Compute-optimized | Optimizer uses data segmentation to assign portions to each subscribing node. The nodes scan the entire shard, but use sub-segment filtering to find their assigned segments of the data. Use this strategy when most data for the query is in the depot, because nodes must scan the entire contents of the shard. Because this strategy uses data segmentation, it supports optimizations such as local joins that the I/O-optimized strategy cannot. |
The optimizer automatically chooses a strategy based on whether the query can take advantage of data segmentation. You can tell which strategy the optimizer chooses for a query by using EXPLAIN. The top of the plan explanation states whether ECS is preserving segmentation. For example, this simple query on a single table does not need to use segmentation, so it uses the I/O-optimized strategy:
=> EXPLAIN SELECT employee_last_name,
employee_first_name,employee_age
FROM employee_dimension
ORDER BY employee_age DESC;
QUERY PLAN
--------------------------------------------------------------------------------
------------------------------
QUERY PLAN DESCRIPTION:
The execution of this query involves non-participating nodes.
Crunch scaling strategy does not preserve data segmentation
------------------------------
. . .
A more complex query using a JOIN results in ECS preserving data segmentation by using the compute-optimized strategy. The query plan tells you that segmentation is preserved:
=> EXPLAIN SELECT sales_quantity, sales_dollar_amount, transaction_type, cc_name
FROM online_sales.online_sales_fact
INNER JOIN online_sales.call_center_dimension
ON (online_sales.online_sales_fact.call_center_key
= online_sales.call_center_dimension.call_center_key
AND sale_date_key = 156)
ORDER BY sales_dollar_amount DESC;
QUERY PLAN
--------------------------------------------------------------------------------
------------------------------
QUERY PLAN DESCRIPTION:
The execution of this query involves non-participating nodes.
Crunch scaling strategy preserves data segmentation
------------------------------
. . .
In most cases, the optimizer chooses the best strategy to use to split the data among the nodes subscribing to the same shard. However, you might occasionally find that some queries perform poorly. In these cases, the query can embed the ECSMODE hint to specify which strategy to use, or even disable ECS.
Setting the ECS strategy for individual queries
You can use the ECSMODE hint in a query to force the optimizer to use a specific ECS strategy (or disable ECS entirely). The ECSMODE hint takes one of the following arguments:
-
AUTO
: The optimizer chooses the strategy to use, useful only if ECS mode is set at the session level (see Setting the ECS Strategy for the Session or Database). -
IO_OPTIMIZED
: Use I/O-optimized strategy. -
COMPUTE_OPTIMIZED
: Use compute-optimized strategy. -
NONE
: Disable use of ECS for this query. Only participating nodes are involved in query execution; collaborating nodes are not.
The following example shows the query plan for a simple single-table query that is forced to use the compute-optimized strategy:
=> EXPLAIN SELECT /*+ECSMode(COMPUTE_OPTIMIZED)*/ employee_last_name,
employee_first_name,employee_age
FROM employee_dimension
ORDER BY employee_age DESC;
QUERY PLAN
--------------------------------------------------------------------------------
------------------------------
QUERY PLAN DESCRIPTION:
The execution of this query involves non-participating nodes.
Crunch scaling strategy preserves data segmentation
------------------------------
. . .
This example disable ECS in a six-node cluster in a three-shard database:
=> EXPLAIN SELECT /*+ECSMode(NONE)*/ employee_last_name,
employee_first_name,employee_age
FROM employee_dimension
ORDER BY employee_age DESC;
QUERY PLAN
--------------------------------------------------------------------------------
------------------------------
QUERY PLAN DESCRIPTION:
------------------------------
EXPLAIN SELECT /*+ECSMode(NONE)*/ employee_last_name,
employee_first_name,employee_age
FROM employee_dimension
ORDER BY employee_age DESC;
Access Path:
+-SORT [Cost: 243, Rows: 10K] (PATH ID: 1)
| Order: employee_dimension.employee_age DESC
| Execute on: v_verticadb_node0007, v_verticadb_node0004, v_verticadb_node0005
| +---> STORAGE ACCESS for employee_dimension [Cost: 71, Rows: 10K] (PATH ID: 2)
| | Projection: public.employee_dimension_DBD_8_seg_vmart_b0
| | Materialize: employee_dimension.employee_first_name,
| | employee_dimension.employee_last_name, employee_dimension.employee_age
| | Execute on: v_verticadb_node0007, v_verticadb_node0004,
| | v_verticadb_node0005
. . .
Note that this query plan lacks the "this query involves non-participating nodes" statement, indicating that it does not use ECS. It also lists just three participating nodes. These nodes are marked as participating in the V_CATALOG.SESSION_SUBSCRIPTIONS system table.
Setting the ECS strategy for the session or database
You can use the ECSMode configuration parameter to set the ECS strategy for the current session. This parameter accepts the same values as the ECSMODE hint except NONE, which is valid only for individual queries.
The following example demonstrates using the configuration parameter to force a simple query to use the COMPUTE_OPTIMIZED strategy. It then sets the parameter back to its default value of AUTO:
=> EXPLAIN SELECT employee_first_name,employee_age
FROM employee_dimension ORDER BY employee_age DESC;
QUERY PLAN
--------------------------------------------------------------------------------
------------------------------
QUERY PLAN DESCRIPTION:
The execution of this query involves non-participating nodes.
Crunch scaling strategy does not preserve data segmentation
------------------------------
. . .
=> ALTER SESSION SET ECSMode = 'COMPUTE_OPTIMIZED';
ALTER SESSION
=> EXPLAIN SELECT employee_first_name,employee_age
FROM employee_dimension ORDER BY employee_age DESC;
QUERY PLAN
--------------------------------------------------------------------------------
------------------------------
QUERY PLAN DESCRIPTION:
The execution of this query involves non-participating nodes.
Crunch scaling strategy preserves data segmentation
------------------------------
. . .
=> ALTER SESSION SET ECSMode = 'AUTO';
ALTER SESSION
Individual query hints override the session-level settings. This example sets the session default to use COMPUTE_OPTIMIZED, then restores the default behavior for a query by using the ECSMode hint with the value AUTO:
=> ALTER SESSION SET ECSMode = 'COMPUTE_OPTIMIZED';
ALTER SESSION
=> EXPLAIN SELECT /*+ECSMode(AUTO)*/ employee_first_name,employee_age
FROM employee_dimension ORDER BY employee_age DESC;
QUERY PLAN
-----------------------------------------------------------------------------
------------------------------
QUERY PLAN DESCRIPTION:
The execution of this query involves non-participating nodes.
Crunch scaling strategy does not preserve data segmentation
------------------------------
Note that setting the ECSMode hint to AUTO let the optimizer pick the I/O-optimized strategy (which does not preserve segmentation) instead of using the compute-optimized strategy set at the session level.
You can also set the ECS strategy at the database level using ALTER DATABASE. However, doing so overrides the Vertica optimizer's settings for all users in all subclusters that use ECS. Before setting the ECS strategy at the database level, verify that the majority of the queries run by all users of the ECS-enabled subclusters must have the optimizer's default behavior overridden. If not, then use the session or query-level settings to override the optimizer for just the queries that benefit from a specific strategy.