This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Tuple mover

The Tuple Mover manages ROS data storage.

1: Mergeout

1.1: Mergeout request types and precedence
1.2: Scheduled mergeout
1.3: User-invoked mergeout
1.4: Partition mergeout
1.5: Deletion marker mergeout
1.6: Disabling mergeout on specific tables
1.7: Purging ROS containers
1.8: Mergeout strata algorithm

2: Managing the tuple mover

The Tuple Mover manages ROS data storage. On mergeout, it combines small ROS containers into larger ones and purges deleted data. The Tuple Mover automatically performs these tasks in the background.

The database mode affects which nodes perform Tuple Mover operations:

In an Enterprise Mode database, all nodes run the Tuple Mover to perform mergeout operations on the data they store.
In Eon Mode, the primary subscriber to each shard plans Tuple Mover mergeout operations on the ROS containers in the shard. It can delegate the execution of this plan to another node in the cluster.

Tuple Mover operations typically require no intervention. However, Vertica provides various ways to adjust Tuple Mover behavior. For details, see Managing the tuple mover.

The tuple mover in Eon Mode databases

In Eon Mode, the Tuple Mover's operations are broken into two parts: mergeout planning and mergeout execution. Mergeout planning is always carried out by the primary subscribers of the shards involved in the mergeout. These primary subscribers are part of same the primary subcluster. As part of its mergeout planning, the primary subscriber chooses a node to execute the mergeout plan. It uses two criteria to decide which node should execute the mergeout:

Only nodes that have memory allocated to their TM resource pool are eligible to perform a mergeout. The primary subscriber ignores all nodes in subclusters whose TM pool's MEMORYSIZE and MAXMEMORYSIZE settings are 0.
From the group of nodes able to execute a mergeout, the primary subscriber chooses the node that has the most ROS containers in its depot that are involved in the mergeout.

Limiting which subclusters perform mergeout tasks

You can prevent a secondary subcluster from being assigned mergeout tasks by changing the MEMORYSIZE and MAXMEMORYSIZE settings of the its TM pool to 0. These settings prevent the primary subscribers from assigning mergeout tasks to nodes in the subcluster.

Important

Primary subclusters must always be able to execute mergeout tasks. Only change these settings on secondary subclusters.

For example, this statement prevents the subcluster named dashboard from running mergeout tasks.

=> ALTER RESOURCE POOL TM FOR SUBCLUSTER dashboard MEMORYSIZE '0%'
   MAXMEMORYSIZE '0%';

1 - Mergeout

DML activities such as COPY and data partitioning generate new ROS containers that typically require consolidation, while deleting and repartitioning data requires reorganization of existing containers.

Mergeout is a Tuple Mover process that consolidates ROS containers and purges deleted records. DML activities such as COPY and data partitioning generate new ROS containers that typically require consolidation, while deleting and repartitioning data requires reorganization of existing containers. The Tuple Mover constantly monitors these activities, and executes mergeout as needed to consolidate and reorganize containers. By doing so, the Tuple Mover seeks to avoid two problems:

Performance degradation when column data is fragmented across multiple ROS containers.
Risk of ROS pushback when ROS containers for a given projection increase faster than the Tuple Mover can handle them. A projection can have up to 1024 ROS containers; when it reaches that limit, Vertica starts to return ROS pushback errors on all attempts to query the projection.

1.1 - Mergeout request types and precedence

The Tuple Mover constantly monitors all activity that generates new ROS containers.

The Tuple Mover constantly monitors all activity that generates new ROS containers. As it does so, it creates mergeout requests and queues them according to type. These types include, in descending order of precedence:

RECOMPUTE_LIMITS: Sets criteria used by the Tuple Mover to determine when to queue new merge requests for a projection. This request type is queued in two cases:
- When a projection is created.
- When an existing projection changes—for example, a column is added or dropped, or a configuration parameter changes that affects ROS storage for that projection, such as ActivePartitionCount.
MERGEOUT: Consolidate new containers. These containers typically contain data from recent load activity or table partitioning.
DVMERGEOUT: Consolidate data marked for deletion, or delete vectors.
PURGE: Purge aged-out delete vectors from containers.

The Tuple Mover also monitors how frequently containers are created for each projection, to determine which projections might be at risk from ROS pushback. Intense DML activity on projections typically causes a high rate of container creation. The Tuple Mover monitors MERGEOUT and DVMERGEOUT requests and, within each set, prioritizes them according to their level of projection activity. Mergeout requests for projections with the highest rate of container creation get priority for immediate execution.

Note

The Tuple Mover often postpones mergeout for projections with a low level of load activity. Until a projection meets the internal threshold for queuing mergeout requests, mergeout from those projections is liable to remain on hold.

1.2 - Scheduled mergeout

At regular intervals set by configuration parameter MergeOutInterval, the Tuple Mover checks the mergeout request queue for pending requests:.

At regular intervals set by configuration parameter MergeOutInterval, the Tuple Mover checks the mergeout request queue for pending requests:

If the queue contains mergeout requests, the Tuple Mover does nothing and goes back to sleep.
If the queue is empty, the Tuple Mover:
- Processes pending storage location move requests.
- Checks for new unqueued purge requests and adds them to the queue.
It then goes back to sleep.

By default, this parameter is set to 600 (seconds).

Important

Scheduled mergeout is independent of the Tuple Mover service that continuously monitors mergeout requests and executes them as needed.

1.3 - User-invoked mergeout

You can invoke mergeout at any time on one or more projections, by calling Vertica meta-function DO_TM_TASK:.

You can invoke mergeout at any time on one or more projections, by calling Vertica meta-function DO_TM_TASK:

DO_TM_TASK('mergeout'[, '[[database.]schema.]{table | projection} ]')

The function scans the database catalog within the specified scope to identify outstanding mergeout tasks. If no table or projection is specified, DO_TM_TASK scans the entire catalog. Unlike the continuous TM service, which runs in the TM resource pool, DO_TM_TASK runs in the GENERAL pool. If DO_TM_TASK executes mergeout tasks that are pending in the merge request queue, the TM service removes these tasks from the queue with no action taken.

1.4 - Partition mergeout

Vertica keeps data from different table partitions or partition groups separate on disk.

Vertica keeps data from different table partitions or partition groups separate on disk. The Tuple Mover adheres to this separation policy when it consolidates ROS containers. When a partition is first created, it typically has frequent data loads and requires regular activity from the Tuple Mover. As a partition ages, it commonly transitions to a mostly read-only workload and requires much less activity.

The Tuple Mover has two different policies for managing these different partition workloads:

Active partition is the partition that was most recently created. The Tuple Mover uses a strata-based algorithm that seeks to minimize the number of times individual tuples undergo mergeout. A table's active partition count identifies how many partitions are active for that table.
Inactive partitions are those that were not most recently created. The Tuple Mover consolidates ROS containers to a minimal set while avoiding merging containers whose size exceeds MaxMrgOutROSSizeMB.

Note

If you invoke mergeout with the Vertica meta-function DO_TM_TASK, all partitions are consolidated into the smallest possible number of containers, including active partitions.

For details on how the Tuple Mover identifies active partitions, see Active and inactive partitions.

Partition mergeout thread allocation

The TM resource pool sets the number of threads that are available for mergeout with its MAXCONCURRENCY parameter. By default , this parameter is set to 7. Vertica allocates half the threads to active partitions, and the remaining half to active and inactive partitions. If MAXCONCURRENCY is set to an uneven integer, Vertica rounds up to favor active partitions.

For example, if MAXCONCURRENCY is set to 7, then Vertica allocates four threads exclusively to active partitions, and allocates the remaining three threads to active and inactive partitions as needed. If additional threads are required to avoid ROS pushback, increase MAXCONCURRENCY with ALTER RESOURCE POOL.

1.5 - Deletion marker mergeout

When you delete data from the database, Vertica does not remove it.

When you delete data from the database, Vertica does not remove it. Instead, it marks the data as deleted. Using many DELETE statements to mark a small number of rows relative to the size of a table can result in creating many small containers—delete vectors—to hold data marked for deletion. Each delete vector container consumes resources, so a large number of such containers can adversely impact performance, especially during recovery.

After the Tuple Mover performs a mergeout, it looks for deletion marker containers that hold few entries. If such containers exist, the Tuple Mover merges them together into a single, larger container. This process helps lower the overhead of tracking deleted data by freeing resources used by multiple, individual containers. The Tuple Mover does not purge or otherwise affect the deleted data, but consolidates delete vectors for greater efficiency.

Tip

Query system table DELETE_VECTORS to view the number and size of containers that store deleted data.

1.6 - Disabling mergeout on specific tables

By default, mergeout is enabled for all tables and their projections.

By default, mergeout is enabled for all tables and their projections. You can disable mergeout on a table with ALTER TABLE. For example:

=> ALTER TABLE public.store_orders_temp SET MERGEOUT 0;
ALTER TABLE

In general, it is useful to disable mergeout on tables that you create to serve a temporary purpose—for example, staging tables that are used to archive old partition data, or swap partitions between tables—which are deleted soon after the task is complete. By doing so, you avoid the mergeout-related overhead that the table would otherwise incur.

You can query system table TABLES to identify tables that have mergeout disabled:


=> SELECT table_schema, table_name, is_mergeout_enabled FROM v_catalog.tables WHERE is_mergeout_enabled= 0;
 table_schema |    table_name     | is_mergeout_enabled
--------------+-------------------+---------------------
 public       | store_orders_temp | f
(1 row)

1.7 - Purging ROS containers

Vertica periodically checks ROS storage containers to determine whether delete vectors are eligible for purge, as follows:.

Vertica periodically checks ROS storage containers to determine whether delete vectors are eligible for purge, as follows:

Counts the number of aged-out delete vectors in each container—that is, delete vectors that are equal to or earlier than the ancient history mark (AHM) epoch.
Calculates the percentage of aged-out delete vectors relative to the total number of records in the same ROS container.
If this percentage exceeds the threshold set by configuration parameter PurgeMergeoutPercent (by default, 20 percent), Vertica automatically performs a mergeout on the ROS container that permanently removes all aged-out delete vectors. Vertica uses the TM resource pool's MAXCONCURRENCY setting to determine how many threads are available for the mergeout operation.

You can also manually purge all aged-out delete vectors from ROS containers with two Vertica meta-functions:

Both functions remove all aged-out delete vectors from ROS containers, regardless of how many are in a given container.

1.8 - Mergeout strata algorithm

The mergeout operation uses a strata-based algorithm to verify that each tuple is subjected to a mergeout operation a small, constant number of times, despite the process used to load the data.

The mergeout operation uses a strata-based algorithm to verify that each tuple is subjected to a mergeout operation a small, constant number of times, despite the process used to load the data. The mergeout operation uses this algorithm to choose which ROS containers to merge for non-partitioned tables and for active partitions in partitioned tables.

Vertica builds strata for each active partition and for projections anchored to non-partitioned tables. The number of strata, the size of each stratum, and the maximum number of ROS containers in a stratum is computed based on disk size, memory, and the number of columns in a projection.

Merging small ROS containers before merging larger ones provides the maximum benefit during the mergeout process. The algorithm begins at stratum 0 and moves upward. It checks to see if the number of ROS containers in a stratum has reached a value equal to or greater than the maximum ROS containers allowed per stratum. The default value is 32. If the algorithm finds that a stratum is full, it marks the projections and the stratum as eligible for mergeout.

2 - Managing the tuple mover

The Tuple Mover is preconfigured to handle typical workloads.

The Tuple Mover is preconfigured to handle typical workloads. However, some situations might require you to adjust Tuple Mover behavior. You can do so in various ways:

[Configure the TM resource pool](#Configur).
Manage active data partitions.

Configuring the TM resource pool

The Tuple Mover uses the built-in TM resource pool to handle its workload. Several settings of this resource pool can be adjusted to facilitate handling of high volume loads:

MEMORYSIZE
MAXMEMORYSIZE
MAXCONCURRENCY
PLANNEDCONCURRENCY

MEMORYSIZE

Specifies how much memory is reserved for the TM pool per node. The TM pool can grow beyond this lower limit by borrowing from the GENERAL pool. By default, this parameter is set to 5% of available memory. If MEMORYSIZE of the GENERAL resource pool is also set to a percentage, the TM pool can compete with it for memory. This value must always be less than or equal to MAXMEMORYSIZE setting.

Caution

Increasing MEMORYSIZE to a large percentage can cause regressions in memory-sensitive queries that run in the GENERAL pool.

MAXMEMORYSIZE

Sets the upper limit of memory that can be allocated to the TM pool. The TM pool can grow beyond the value set by MEMORYSIZE by borrowing memory from the GENERAL pool. This value must always be equal to or greater than the MEMORYSIZE setting.

In an Eon Mode database, if you set this value to 0 on a subcluster level, the Tuple Mover is disabled on the subcluster.

Important

Never set the TM pool's MAXMEMORYSIZE to 0 on a primary subcluster. Primary subclusters must always run the Tuple Mover.

MAXCONCURRENCY

Sets across all nodes the maximum number of concurrent execution slots available to TM pool. In databases created in Vertica releases ≥9.3, the default value is 7. In databases created in earlier versions, the default is 3.This setting specifies the maximum number of merges that can occur simultaneously on multiple threads.

PLANNEDCONCURRENCY

Specifies the preferred number queries to execute concurrently in the resource pool, across all nodes, by default set to 6. The Resource Manager uses PLANNEDCONCURRENCY to calculate the target memory that is available to a given query:

TM-memory-size / PLANNEDCONCURRENCY

The PLANNEDCONCURRENCY setting must be proportional to the size of RAM, the CPU, and the storage subsystem. Depending on the storage type, increasing PLANNEDCONCURRENCY for Tuple Mover threads might create a storage I/O bottleneck. Monitor the storage subsystem; if it becomes saturated with long I/O queues, more than two I/O queues, and long latency in read and write, adjust the PLANNEDCONCURRENCY parameter to keep the storage subsystem resources below saturation level.

Managing active data partitions

The Tuple Mover assumes that all loads and updates to a partitioned table are targeted to one or more partitions that it identifies as active. In general, the partitions with the largest partition keys—typically, the most recently created partitions—are regarded as active. As the partition ages, its workload typically shrinks and becomes mostly read-only.

You can specify how many partitions are active for partitioned tables at two levels, in ascending order of precedence:

Configuration parameter ActivePartitionCount determines how many partitions are active for partitioned tables in the database. By default, ActivePartitionCount is set to 1. The Tuple Mover applies this setting to all tables that do not set their own active partition count.
Individual tables can supersede ActivePartitionCount by setting their own active partition count with CREATE TABLE and ALTER TABLE.

For details, see Active and inactive partitions.

Tuple mover

The tuple mover in Eon Mode databases

Limiting which subclusters perform mergeout tasks

Important

1 - Mergeout

1.1 - Mergeout request types and precedence

Note

1.2 - Scheduled mergeout

Important

1.3 - User-invoked mergeout

1.4 - Partition mergeout

Note

Partition mergeout thread allocation

1.5 - Deletion marker mergeout

Tip

1.6 - Disabling mergeout on specific tables

1.7 - Purging ROS containers

1.8 - Mergeout strata algorithm

2 - Managing the tuple mover

Configuring the TM resource pool

MEMORYSIZE

Caution

MAXMEMORYSIZE

Important

MAXCONCURRENCY

PLANNEDCONCURRENCY

Managing active data partitions

See also