Duplicating a subcluster

Subclusters have many settings you can tune to get them to work just the way you want.

Subclusters have many settings you can tune to get them to work just the way you want. After you have tuned a subcluster, you may want additional subclusters that are configured the same way. For example, suppose you have a subcluster that you have tuned to perform analytics workloads. To improve query throughput, you can create several more subclusters configured exactly like it. Instead of creating the new subclusters and then manually configuring them from scratch, you can duplicate the existing subcluster (called the source subcluster) to a new subcluster (the target subcluster).

When you create a new subcluster based on another subcluster, Vertica copies most of the source subcluster's settings. See below for a list of the settings that Vertica copies. These settings are both on the node level and the subcluster level.

Requirements for the target subcluster

You must have a set of hosts in your database cluster that you will use as the target of the subcluster duplication. Vertica forms these hosts into a target subcluster that receives most of the settings of the source subcluster. The hosts for the target subcluster must meet the following requirements:

  • They must be part of your database cluster but not part of your database. For example, you can use hosts you have dropped from a subcluster or whose subcluster you have removed. Vertica returns an error if you attempt to duplicate a subcluster onto one or more nodes that are currently participating in the database.

  • The number of nodes you supply for the target subcluster must equal the number of nodes in the source subcluster. When duplicating the subcluster, Vertica performs a 1:1 copy of some node-level settings from each node in the source subcluster to a corresponding node in the target.

  • The RAM and disk allocation for the hosts in the target subcluster should be at least the same as the source nodes. Technically, your target nodes can have less RAM or disk space than the source nodes. However, you will usually see performance issues in the new subcluster because the settings of the original subcluster will not be tuned for the resources of the target subcluster.

You can duplicate a subcluster even if some of the nodes in the source subcluster or hosts in the target are down. If nodes in the target are down, they use the catalog Vertica copied from the source node when they recover.

Duplication of subcluster-level settings

The following table lists the subcluster-level settings that Vertica copies from the source subcluster to the target.

Setting Type Setting Details
Basic subcluster settings Whether the subcluster is a primary or secondary subcluster.
Large cluster settings The number of control nodes in the subcluster.
Resource pool settings
  • Vertica creates a new resource pool for every subcluster-specific resource pool in the source subcluster.
  • Subcluster-specific resource pool cascade settings are copied from the source subcluster and are applied to the newly-created resource pool for the target subcluster.

  • Subcluster-level overrides on global resource pools settings such as MEMORYSIZE. See Managing workload resources in an Eon Mode database for more information.

  • Grants on resource pools are copied from the source subcluster.

Connection load balancing settings

If the source subcluster is part of a subcluster-based load balancing group (you created the load balancing group using CREATE LOAD BALANCE GROUP...WITH SUBCLUSTER) the new subcluster is added to the group. See Creating Connection Load Balance Groups.

Storage policy settings Table and table partition pinning policies are copied from the source to the target subcluster. See Pinning Depot Objects for more information. Any existing storage policies on the target subcluster are dropped before the policies are copied from the source.

Vertica does not copy the following subcluster settings:

Setting Type Setting Details
Basic subcluster settings
  • Subcluster name (you must provide a new name for the target subcluster).

  • If the source is the default subcluster, the setting is not copied to the target. Your Vertica database has a single default subcluster. If Vertica copied this value, the source subcluster could no longer be the default.

Connection load balancing settings

Address-based load balancing groups are not duplicated for the target subcluster.

For example, suppose you created a load balancing group for the source subcluster by adding the network addresses of all subcluster's nodes . In this case, Vertica does not create a load balancing group for the target subcluster because it does not duplicate the network addresses of the source nodes (see the next section). Because it does not copy the addresses, it cannot not create an address-based group.

Duplication of node-level settings

When Vertica duplicates a subcluster, it maps each node in the source subcluster to a node in the destination subcluster. Then it copies relevant node-level settings from each individual source node to the corresponding target node.

For example, suppose you have a three-node subcluster consisting of nodes named node01, node02, and node03. The target subcluster has nodes named node04, node05, and node06. In this case, Vertica copies the settings from node01 to node04, from node02 to node05, and from node03 to node06.

The node-level settings that Vertica copies from the source nodes to the target nodes are:

Setting Type Setting Details
Configuration parameters

Vertica copies the value of configuration parameters that you have set at the node level in the source node to the target node. For example, suppose you set CompressCatalogOnDisk on the source node using the statement:

ALTER NODE node01 SET CompressCatalogOnDisk = 0;

If you then duplicated the subcluster containing node01, the setting is copied to the target node.

Eon Mode settings
  • Shard subscriptions are copied from the source node to the target.

  • Whether the node is the participating primary node for the shard.

Storage location settings

The DATA, TEMP, DEPOT, and USER storage location paths on the source node are duplicated on the target node. When duplicating node-specific paths (such as DATA or DEPOT) the path names are adjusted for the new node name. For example, suppose node 1 has a depot path of /vertica/depot/vmart/v_vmart_node0001_depot. If Vertica duplicates node 1 to node 4, it adjusts the path to /vertica/depot/vmart/v_vmart_node0004_depot.

Large cluster settings

Control node assignments are copied from the source node to the target node:

  • If the source node is a control node, then the target node is made into a control node.

  • If the source node depends on a control node, then the target node becomes a dependent of the corresponding control node in the new subcluster.

Vertica does not copy the following node-level settings:

Setting Type Setting Details
Connection load balancing settings Network Addresses are not copied. The destination node's network addresses do not depend on the settings of the source node. Therefore, Vertica cannot determine what the target node's addresses should be.
Depot settings Depot-related configuration parameters that can be set on a node level (such as FileDeletionServiceInterval) are not copied from the source node to the target node.

Using admintools to duplicate a subcluster

To duplicate a subcluster, you use the same admintools db_add_subcluster tool that you use to create a new subcluster (see Creating subclusters). In addition to the required options to create a subcluster (the list of hosts, name for the new subcluster, database name, and so on), you also pass the --like option with the name of the source subcluster you want to duplicate.

The following examples demonstrate duplicating a three-node subcluster named analytics_1. The first example examines some of the settings in the analytics_1 subcluster:

  • An override of the global TM resource pool's memory size.

  • Its own resource pool named analytics

  • Its membership in a subcluster-based load balancing group named analytics

=> SELECT name, subcluster_name, memorysize FROM SUBCLUSTER_RESOURCE_POOL_OVERRIDES;
 name | subcluster_name | memorysize
------+-----------------+------------
 tm   | analytics_1     | 0%
(1 row)

=> SELECT name, subcluster_name, memorysize, plannedconcurrency
      FROM resource_pools WHERE subcluster_name IS NOT NULL;
      name      | subcluster_name | memorysize | plannedconcurrency
----------------+-----------------+------------+--------------------
 analytics_pool | analytics_1     | 70%        | 8
(1 row)

=> SELECT * FROM LOAD_BALANCE_GROUPS;
   name    |   policy   |  filter   |    type    | object_name
-----------+------------+-----------+------------+-------------
 analytics | ROUNDROBIN | 0.0.0.0/0 | Subcluster | analytics_1
(1 row)

The following example calls admintool's db_add_subcluster tool to duplicate the analytics_1 subcluster onto a set of three hosts to create a subcluster named analytics_2.

$ admintools -t db_add_subcluster -d verticadb \
             -s 10.11.12.13,10.11.12.14,10.11.12.15 \
          -p mypassword --like=analytics_1 -c analytics_2

Creating new subcluster 'analytics_2'
Adding new hosts to 'analytics_2'
Eon database detected, creating new depot locations for newly added nodes
Creating depot locations for 1 nodes
 Warning when creating depot location for node: v_verticadb_node0007
 WARNING: Target node v_verticadb_node0007 is down, so depot size has been
          estimated from depot location on initiator. As soon as the node comes
          up, its depot size might be altered depending on its disk size
Eon database detected, creating new depot locations for newly added nodes
Creating depot locations for 1 nodes
 Warning when creating depot location for node: v_verticadb_node0008
 WARNING: Target node v_verticadb_node0008 is down, so depot size has been
          estimated from depot location on initiator. As soon as the node comes
          up, its depot size might be altered depending on its disk size
Eon database detected, creating new depot locations for newly added nodes
Creating depot locations for 1 nodes
 Warning when creating depot location for node: v_verticadb_node0009
 WARNING: Target node v_verticadb_node0009 is down, so depot size has been
          estimated from depot location on initiator. As soon as the node comes
          up, its depot size might be altered depending on its disk size
Cloning subcluster properties
NOTICE: Nodes in subcluster analytics_1 have network addresses, you
might need to configure network addresses for nodes in subcluster
analytics_2 in order to get load balance groups to work correctly.

    Replicating configuration to all nodes
    Generating new configuration information and reloading spread
    Starting nodes:
        v_verticadb_node0007 (10.11.12.81)
        v_verticadb_node0008 (10.11.12.209)
        v_verticadb_node0009 (10.11.12.186)
    Starting Vertica on all nodes. Please wait, databases with a large catalog
         may take a while to initialize.
    Checking database state for newly added nodes
    Node Status: v_verticadb_node0007: (DOWN) v_verticadb_node0008:
                 (DOWN) v_verticadb_node0009: (DOWN)
    Node Status: v_verticadb_node0007: (INITIALIZING) v_verticadb_node0008:
                 (INITIALIZING) v_verticadb_node0009: (INITIALIZING)
    Node Status: v_verticadb_node0007: (UP) v_verticadb_node0008:
                 (UP) v_verticadb_node0009: (UP)
Syncing catalog on verticadb with 2000 attempts.
    Multi-node DB add completed
Nodes added to subcluster analytics_2 successfully.
Subcluster added to verticadb successfully.

Re-running the queries in the first part of the example shows that the settings from analytics_1 have been duplicated in analytics_2:

=> SELECT name, subcluster_name, memorysize FROM SUBCLUSTER_RESOURCE_POOL_OVERRIDES;
 name | subcluster_name | memorysize
------+-----------------+------------
 tm   | analytics_1     | 0%
 tm   | analytics_2     | 0%
(2 rows)

=> SELECT name, subcluster_name, memorysize, plannedconcurrency
       FROM resource_pools WHERE subcluster_name IS NOT NULL;
      name      | subcluster_name | memorysize |  plannedconcurrency
----------------+-----------------+------------+--------------------
 analytics_pool | analytics_1     | 70%        | 8
 analytics_pool | analytics_2     | 70%        | 8
(2 rows)

=> SELECT * FROM LOAD_BALANCE_GROUPS;
   name    |   policy   |  filter   |    type    | object_name
-----------+------------+-----------+------------+-------------
 analytics | ROUNDROBIN | 0.0.0.0/0 | Subcluster | analytics_2
 analytics | ROUNDROBIN | 0.0.0.0/0 | Subcluster | analytics_1
(2 rows)

As noted earlier, even though analytics_2 subcluster is part of the analytics load balancing group, its nodes do not have network addresses defined for them. Until you define network addresses for the nodes, Vertica cannot redirect client connections to them.