Elastic cluster

Elastic Cluster is an Enterprise Mode-only feature.

1: Scaling factor
2: Viewing scaling factor settings
3: Setting the scaling factor
4: Local data segmentation

4.1: Enabling and disabling local segmentation

5: Elastic cluster best practices

Note

Elastic Cluster is an Enterprise Mode-only feature. For scaling your database under Eon Mode, see Scaling your Eon Mode database.

You can scale your cluster up or down to meet the needs of your database. The most common case is to add nodes to your database cluster to accommodate more data and provide better query performance. However, you can scale down your cluster if you find that it is over-provisioned, or if you need to divert hardware for other uses.

You scale your cluster by adding or removing nodes. Nodes can be added or removed without shutting down or restarting the database. After adding a node or before removing a node, Vertica begins a rebalancing process that moves data around the cluster to populate the new nodes or move data off nodes about to be removed from the database. During this process, nodes can exchange data that are not being added or removed to maintain robust intelligent K-safety. If Vertica determines that the data cannot be rebalanced in a single iteration due to lack of disk space, then the rebalance operation spans multiple iterations.

To help make data rebalancing due to cluster scaling more efficient, Vertica locally segments data storage on each node so it can be easily moved to other nodes in the cluster. When a new node is added to the cluster, existing nodes in the cluster give up some of their data segments to populate the new node. They also exchange segments to minimize the number of nodes that any one node depends upon. This strategy minimizes the number of nodes that might become critical when a node fails. When a node is removed from the cluster, its storage containers are moved to other nodes in the cluster (which also relocates data segments to minimize how many nodes might become critical when a node fails). This method of breaking data into portable segments is referred to as elastic cluster, as it facilitates enlarging or shrinking the cluster.

The alternative to elastic cluster is re-segmenting all projection data and redistributing it evenly among all database nodes any time a node is added or removed. This method requires more processing and more disk space, as it requires all data in all projections to be dumped and reloaded.

Elastic cluster scaling factor

In a new installation, each node has a scaling factor that specifies the number of local segments (see Scaling factor). Rebalance efficiently redistributes data by relocating local segments provided that, after nodes are added or removed, there are sufficient local segments in the cluster to redistribute the data evenly (determined by MAXIMUM_SKEW_PERCENT). For example, if the scaling factor = 8, and there are initially 5 nodes, then there are a total of 40 local segments cluster-wide.

If you add two additional nodes (seven nodes) Vertica relocates five local segments on two nodes, and six such segments on five nodes, resulting in roughly a 16.7 percent skew. Rebalance relocates local segments only if the resulting skew is less than the allowed threshold, as determined by MAXIMUM_SKEW_PERCENT. Otherwise, segmentation space (and hence data, if uniformly distributed over this space) is evenly distributed among the seven nodes, and new local segment boundaries are drawn for each node, such that each node again has eight local segments.

Note

By default, the scaling factor only has an effect while Vertica rebalances the database. While rebalancing, each node breaks the projection segments it contains into storage containers, which it then moves to other nodes if necessary. After rebalancing, the data is recombined into ROS containers. It is possible to have Vertica always group data into storage containers. See Local data segmentation for more information.

Enabling elastic cluster

You enable elastic cluster with ENABLE_ELASTIC_CLUSTER. Query the ELASTIC_CLUSTER system table to verify that elastic cluster is enabled:

=> SELECT is_enabled FROM ELASTIC_CLUSTER;
 is_enabled
------------
 t
(1 row)

1 - Scaling factor

To avoid an increased number of ROS containers, do not enable local segmentation and do not change the scaling factor.

2 - Viewing scaling factor settings

To view the scaling factor, query the ELASTIC_CLUSTER table:.

To view the scaling factor, query the ELASTIC_CLUSTER table:

=> SELECT scaling_factor FROM ELASTIC_CLUSTER;
scaling_factor
---------------
            4
(1 row)

=> SELECT SET_SCALING_FACTOR(6);
 SET_SCALING_FACTOR
--------------------
 SET
(1 row)

=> SELECT scaling_factor FROM ELASTIC_CLUSTER;
 scaling_factor
---------------
             6
(1 row)

3 - Setting the scaling factor

Use the SET_SCALING_FACTOR function to change your database's scaling factor.

The scaling factor determines the number of storage containers that Vertica uses to store each projection across the database during rebalancing when local segmentation is enabled. When setting the scaling factor, follow these guidelines:

The number of storage containers should be greater than or equal to the number of partitions multiplied by the number of local segments:

num-storage-containers>= (num-partitions*num-local-segments )
Set the scaling factor high enough so rebalance can transfer local segments to satisfy the skew threshold, but small enough so the number of storage containers does not result in too many ROS containers, and cause ROS pushback. The maximum number of ROS containers (by default 1024) is set by configuration parameter ContainersPerProjectionLimit.

Use the SET_SCALING_FACTOR function to change your database's scaling factor. The scaling factor can be an integer between 1 and 32.

=> SELECT SET_SCALING_FACTOR(12);
SET_SCALING_FACTOR
--------------------
 SET
(1 row)

4 - Local data segmentation

By default, the scaling factor only has an effect when Vertica rebalances the database.

By default, the scaling factor only has an effect when Vertica rebalances the database. During rebalancing, nodes break the projection segments they contain into storage containers which they can quickly move to other nodes.

This process is more efficient than re-segmenting the entire projection (in particular, less free disk space is required), but it still has significant overhead, since storage containers have to be separated into local segments, some of which are then transferred to other nodes. This overhead is not a problem if you rarely add or remove nodes from your database.

However, if your database is growing rapidly and is constantly busy, you may find the process of adding nodes becomes disruptive. In this case, you can enable local segmentation, which tells Vertica to always segment its data based on the scaling factor, so the data is always broken into containers that are easily moved. Having the data segmented in this way dramatically speeds up the process of adding or removing nodes, since the data is always in a state that can be quickly relocated to another node. The rebalancing process that Vertica performs after adding or removing a node just has to decide which storage containers to relocate, instead of first having to first break the data into storage containers.

Local data segmentation increases the number of storage containers stored on each node. This is not an issue unless a table contains many partitions. For example, if the table is partitioned by day and contains one or more years. If local data segmentation is enabled, then each of these table partitions is broken into multiple local storage segments, which potentially results in a huge number of files which can lead to ROS "pushback." Consider your table partitions and the effect enabling local data segmentation may have before enabling the feature.

4.1 - Enabling and disabling local segmentation

To enable local segmentation, use the ENABLE_LOCAL_SEGMENTS function.

To enable local segmentation, use the ENABLE_LOCAL_SEGMENTS function. To disable local segmentation, use the DISABLE_LOCAL_SEGMENTATION function:

=> SELECT ENABLE_LOCAL_SEGMENTS();
ENABLE_LOCAL_SEGMENTS
-----------------------
 ENABLED
(1 row)


=> SELECT is_local_segment_enabled FROM elastic_cluster;
 is_enabled
------------
 t
(1 row)

=> SELECT DISABLE_LOCAL_SEGMENTS();
 DISABLE_LOCAL_SEGMENTS
------------------------
 DISABLED
(1 row)

=> SELECT is_local_segment_enabled FROM ELASTIC_CLUSTER;
 is_enabled
------------
 f
(1 row)

5 - Elastic cluster best practices

The following are some best practices with regard to local segmentation.

Note

You should always perform a database backup before and after performing any of the operations discussed in this topic. You need to back up before changing any elastic cluster or local segmentation settings to guard against a hardware failure causing the rebalance process to leave the database in an unusable state. You should perform a full backup of the database after the rebalance procedure to avoid having to rebalance the database again if you need to restore from a backup.

When to enable local data segmentation

Local data segmentation can significantly speed up the process of resizing your cluster. You should enable local data segmentation if:

your database does not contain tables with hundreds of partitions.
the number of nodes in the database cluster is a power of two.
you plan to expand or contract the size of your cluster.

Local segmentation can result in an excessive number of storage containers with tables that have hundreds of partitions, or in clusters with a non-power-of-two number of nodes. If your database has these two features, take care when enabling local segmentation.