This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Mergeout

DML activities such as COPY and data partitioning generate new ROS containers that typically require consolidation, while deleting and repartitioning data requires reorganization of existing containers.

Mergeout is a Tuple Mover process that consolidates ROS containers and purges deleted records. DML activities such as COPY and data partitioning generate new ROS containers that typically require consolidation, while deleting and repartitioning data requires reorganization of existing containers. The Tuple Mover constantly monitors these activities, and executes mergeout as needed to consolidate and reorganize containers. By doing so, the Tuple Mover seeks to avoid two problems:

  • Performance degradation when column data is fragmented across multiple ROS containers.

  • Risk of ROS pushback when ROS containers for a given projection increase faster than the Tuple Mover can handle them. A projection can have up to 1024 ROS containers; when it reaches that limit, Vertica starts to return ROS pushback errors on all attempts to query the projection.

1 - Mergeout request types and precedence

The Tuple Mover constantly monitors all activity that generates new ROS containers.

The Tuple Mover constantly monitors all activity that generates new ROS containers. As it does so, it creates mergeout requests and queues them according to type. These types include, in descending order of precedence:

  1. RECOMPUTE_LIMITS: Sets criteria used by the Tuple Mover to determine when to queue new merge requests for a projection. This request type is queued in two cases:

    • When a projection is created.

    • When an existing projection changes—for example, a column is added or dropped, or a configuration parameter changes that affects ROS storage for that projection, such as ActivePartitionCount.

  2. MERGEOUT: Consolidate new containers. These containers typically contain data from recent load activity or table partitioning.

  3. DVMERGEOUT: Consolidate data marked for deletion, or delete vectors.

  4. PURGE: Purge aged-out delete vectors from containers.

The Tuple Mover also monitors how frequently containers are created for each projection, to determine which projections might be at risk from ROS pushback. Intense DML activity on projections typically causes a high rate of container creation. The Tuple Mover monitors MERGEOUT and DVMERGEOUT requests and, within each set, prioritizes them according to their level of projection activity. Mergeout requests for projections with the highest rate of container creation get priority for immediate execution.

2 - Scheduled mergeout

At regular intervals set by configuration parameter MergeOutInterval, the Tuple Mover checks the mergeout request queue for pending requests:.

At regular intervals set by configuration parameter MergeOutInterval, the Tuple Mover checks the mergeout request queue for pending requests:

  1. If the queue contains mergeout requests, the Tuple Mover does nothing and goes back to sleep.

  2. If the queue is empty, the Tuple Mover:

    • Processes pending storage location move requests.

    • Checks for new unqueued purge requests and adds them to the queue.

    It then goes back to sleep.

By default, this parameter is set to 600 (seconds).

3 - User-invoked mergeout

You can invoke mergeout at any time on one or more projections, by calling Vertica meta-function DO_TM_TASK:.

You can invoke mergeout at any time on one or more projections, by calling Vertica meta-function DO_TM_TASK:

DO_TM_TASK('mergeout'[, '[[database.]schema.]{table | projection} ]')

The function scans the database catalog within the specified scope to identify outstanding mergeout tasks. If no table or projection is specified, DO_TM_TASK scans the entire catalog. Unlike the continuous TM service, which runs in the TM resource pool, DO_TM_TASK runs in the GENERAL pool. If DO_TM_TASK executes mergeout tasks that are pending in the merge request queue, the TM service removes these tasks from the queue with no action taken.

4 - Partition mergeout

Vertica keeps data from different table partitions or partition groups separate on disk.

Vertica keeps data from different table partitions or partition groups separate on disk. The Tuple Mover adheres to this separation policy when it consolidates ROS containers. When a partition is first created, it typically has frequent data loads and requires regular activity from the Tuple Mover. As a partition ages, it commonly transitions to a mostly read-only workload and requires much less activity.

The Tuple Mover has two different policies for managing these different partition workloads:

  • Active partition is the partition that was most recently created. The Tuple Mover uses a strata-based algorithm that seeks to minimize the number of times individual tuples undergo mergeout. A table's active partition count identifies how many partitions are active for that table.

  • Inactive partitions are those that were not most recently created. The Tuple Mover consolidates ROS containers to a minimal set while avoiding merging containers whose size exceeds MaxMrgOutROSSizeMB.

For details on how the Tuple Mover identifies active partitions, see Active and inactive partitions.

Partition mergeout thread allocation

The TM resource pool sets the number of threads that are available for mergeout with its MAXCONCURRENCY parameter. By default , this parameter is set to 7. Vertica allocates half the threads to active partitions, and the remaining half to active and inactive partitions. If MAXCONCURRENCY is set to an uneven integer, Vertica rounds up to favor active partitions.

For example, if MAXCONCURRENCY is set to 7, then Vertica allocates four threads exclusively to active partitions, and allocates the remaining three threads to active and inactive partitions as needed. If additional threads are required to avoid ROS pushback, increase MAXCONCURRENCY with ALTER RESOURCE POOL.

5 - Deletion marker mergeout

When you delete data from the database, Vertica does not remove it.

When you delete data from the database, Vertica does not remove it. Instead, it marks the data as deleted. Using many DELETE statements to mark a small number of rows relative to the size of a table can result in creating many small containers—delete vectors—to hold data marked for deletion. Each delete vector container consumes resources, so a large number of such containers can adversely impact performance, especially during recovery.

After the Tuple Mover performs a mergeout, it looks for deletion marker containers that hold few entries. If such containers exist, the Tuple Mover merges them together into a single, larger container. This process helps lower the overhead of tracking deleted data by freeing resources used by multiple, individual containers. The Tuple Mover does not purge or otherwise affect the deleted data, but consolidates delete vectors for greater efficiency.

6 - Disabling mergeout on specific tables

By default, mergeout is enabled for all tables and their projections.

By default, mergeout is enabled for all tables and their projections. You can disable mergeout on a table with ALTER TABLE. For example:

=> ALTER TABLE public.store_orders_temp SET MERGEOUT 0;
ALTER TABLE

In general, it is useful to disable mergeout on tables that you create to serve a temporary purpose—for example, staging tables that are used to archive old partition data, or swap partitions between tables—which are deleted soon after the task is complete. By doing so, you avoid the mergeout-related overhead that the table would otherwise incur.

You can query system table TABLES to identify tables that have mergeout disabled:


=> SELECT table_schema, table_name, is_mergeout_enabled FROM v_catalog.tables WHERE is_mergeout_enabled= 0;
 table_schema |    table_name     | is_mergeout_enabled
--------------+-------------------+---------------------
 public       | store_orders_temp | f
(1 row)

7 - Purging ROS containers

Vertica periodically checks ROS storage containers to determine whether delete vectors are eligible for purge, as follows:.

Vertica periodically checks ROS storage containers to determine whether delete vectors are eligible for purge, as follows:

  1. Counts the number of aged-out delete vectors in each container—that is, delete vectors that are equal to or earlier than the ancient history mark (AHM) epoch.

  2. Calculates the percentage of aged-out delete vectors relative to the total number of records in the same ROS container.

  3. If this percentage exceeds the threshold set by configuration parameter PurgeMergeoutPercent (by default, 20 percent), Vertica automatically performs a mergeout on the ROS container that permanently removes all aged-out delete vectors. Vertica uses the TM resource pool's MAXCONCURRENCY setting to determine how many threads are available for the mergeout operation.

You can also manually purge all aged-out delete vectors from ROS containers with two Vertica meta-functions:

Both functions remove all aged-out delete vectors from ROS containers, regardless of how many are in a given container.

8 - Mergeout strata algorithm

The mergeout operation uses a strata-based algorithm to verify that each tuple is subjected to a mergeout operation a small, constant number of times, despite the process used to load the data.

The mergeout operation uses a strata-based algorithm to verify that each tuple is subjected to a mergeout operation a small, constant number of times, despite the process used to load the data. The mergeout operation uses this algorithm to choose which ROS containers to merge for non-partitioned tables and for active partitions in partitioned tables.

Vertica builds strata for each active partition and for projections anchored to non-partitioned tables. The number of strata, the size of each stratum, and the maximum number of ROS containers in a stratum is computed based on disk size, memory, and the number of columns in a projection.

Merging small ROS containers before merging larger ones provides the maximum benefit during the mergeout process. The algorithm begins at stratum 0 and moves upward. It checks to see if the number of ROS containers in a stratum has reached a value equal to or greater than the maximum ROS containers allowed per stratum. The default value is 32. If the algorithm finds that a stratum is full, it marks the projections and the stratum as eligible for mergeout.