Duplicating a subcluster
Subclusters have many settings you can tune to get them to work just the way you want. After you have tuned a subcluster, you may want additional subclusters that are configured the same way. For example, suppose you have a subcluster that you have tuned to perform analytics workloads. To improve query throughput, you can create several more subclusters configured exactly like it. Instead of creating the new subclusters and then manually configuring them from scratch, you can duplicate the existing subcluster (called the source subcluster) to a new subcluster (the target subcluster).
When you create a new subcluster based on another subcluster, Vertica copies most of the source subcluster's settings. See below for a list of the settings that Vertica copies. These settings are both on the node level and the subcluster level.
Note
After you duplicate a subcluster, the target is not connected to the source in any way. Any changes you make to the source subcluster's settings after duplication are not copied to the target. The subclusters are completely independent after duplication.Requirements for the target subcluster
You must have a set of hosts in your database cluster that you will use as the target of the subcluster duplication. Vertica forms these hosts into a target subcluster that receives most of the settings of the source subcluster. The hosts for the target subcluster must meet the following requirements:
-
They must be part of your database cluster but not part of your database. For example, you can use hosts you have dropped from a subcluster or whose subcluster you have removed. Vertica returns an error if you attempt to duplicate a subcluster onto one or more nodes that are currently participating in the database.
Tip
If you want to duplicate the settings of a subcluster to another subcluster, remove the target subcluster (see Removing subclusters). Then duplicate the source subcluster onto the hosts of the now-removed target subcluster. -
The number of nodes you supply for the target subcluster must equal the number of nodes in the source subcluster. When duplicating the subcluster, Vertica performs a 1:1 copy of some node-level settings from each node in the source subcluster to a corresponding node in the target.
-
The RAM and disk allocation for the hosts in the target subcluster should be at least the same as the source nodes. Technically, your target nodes can have less RAM or disk space than the source nodes. However, you will usually see performance issues in the new subcluster because the settings of the original subcluster will not be tuned for the resources of the target subcluster.
You can duplicate a subcluster even if some of the nodes in the source subcluster or hosts in the target are down. If nodes in the target are down, they use the catalog Vertica copied from the source node when they recover.
Duplication of subcluster-level settings
The following table lists the subcluster-level settings that Vertica copies from the source subcluster to the target.
Setting Type | Setting Details |
---|---|
Basic subcluster settings | Whether the subcluster is a primary or secondary subcluster. |
Large cluster settings | The number of control nodes in the subcluster. |
Resource pool settings |
NoteDuplicating a subcluster can fail due to subcluster-specific resource pools. If creating the subcluster-specific resource pools leave less than 25% of the total memory free for the general pool, Vertica stops the duplication and reports an error.
|
Connection load balancing settings |
If the source subcluster is part of a subcluster-based load balancing group (you created the load balancing group using CREATE LOAD BALANCE GROUP...WITH SUBCLUSTER) the new subcluster is added to the group. See Creating Connection Load Balance Groups. ImportantVertica adds the new subcluster to the subcluster-based load balancing group. However, it does not create network addresses for the nodes in the target subcluster. Load balancing policies cannot direct connections to the new subcluster until you create network addresses for the nodes in the target subcluster. See Creating network addresses for the steps you must take. |
Storage policy settings | Table and table partition pinning policies are copied from the source to the target subcluster. See Pinning Depot Objects for more information. Any existing storage policies on the target subcluster are dropped before the policies are copied from the source. |
Vertica does not copy the following subcluster settings:
Setting Type | Setting Details |
---|---|
Basic subcluster settings |
|
Connection load balancing settings |
Address-based load balancing groups are not duplicated for the target subcluster. For example, suppose you created a load balancing group for the source subcluster by adding the network addresses of all subcluster's nodes . In this case, Vertica does not create a load balancing group for the target subcluster because it does not duplicate the network addresses of the source nodes (see the next section). Because it does not copy the addresses, it cannot not create an address-based group. |
Duplication of node-level settings
When Vertica duplicates a subcluster, it maps each node in the source subcluster to a node in the destination subcluster. Then it copies relevant node-level settings from each individual source node to the corresponding target node.
For example, suppose you have a three-node subcluster consisting of nodes named node01, node02, and node03. The target subcluster has nodes named node04, node05, and node06. In this case, Vertica copies the settings from node01 to node04, from node02 to node05, and from node03 to node06.
The node-level settings that Vertica copies from the source nodes to the target nodes are:
Setting Type | Setting Details |
---|---|
Configuration parameters |
Vertica copies the value of configuration parameters that you have set at the node level in the source node to the target node. For example, suppose you set CompressCatalogOnDisk on the source node using the statement:
If you then duplicated the subcluster containing node01, the setting is copied to the target node. |
Eon Mode settings |
|
Storage location settings |
The DATA, TEMP, DEPOT, and USER storage location paths on the source node are duplicated on the target node. When duplicating node-specific paths (such as DATA or DEPOT) the path names are adjusted for the new node name. For example, suppose node 1 has a depot path of ImportantThe directories for these storage locations on the target node must be empty. They must also have the correct file permissions to allow Vertica to read and write to them. Vertica does not duplicate a storage location if it cannot access its directory on the target node or if the directory is not empty. In this case, the target node will not have the location defined after the duplication process finishes. Admintools does not warn you if any locations were not duplicated. If you find that storage locations have not been duplicated on one or more target nodes, you must fix the issues with the directories on the target nodes. Then re-run the duplication command. |
Large cluster settings |
Control node assignments are copied from the source node to the target node:
|
Vertica does not copy the following node-level settings:
Setting Type | Setting Details |
---|---|
Connection load balancing settings | Network Addresses are not copied. The destination node's network addresses do not depend on the settings of the source node. Therefore, Vertica cannot determine what the target node's addresses should be. |
Depot settings | Depot-related configuration parameters that can be set on a node level (such as FileDeletionServiceInterval) are not copied from the source node to the target node. |
Using admintools to duplicate a subcluster
To duplicate a subcluster, you use the same admintools db_add_subcluster
tool that you use to create a new subcluster (see Creating subclusters). In addition to the required options to create a subcluster (the list of hosts, name for the new subcluster, database name, and so on), you also pass the --like
option with the name of the source subcluster you want to duplicate.
Important
When you use the--like
option, you cannot use the --is-secondary
or --control-set-size
options. Vertica determines whether the new subcluster is secondary and the number of control nodes it contains based on the source subcluster. If you supply these options along with the --like
option, admintools returns an error.
The following examples demonstrate duplicating a three-node subcluster named analytics_1. The first example examines some of the settings in the analytics_1 subcluster:
-
An override of the global TM resource pool's memory size.
-
Its own resource pool named analytics
-
Its membership in a subcluster-based load balancing group named analytics
=> SELECT name, subcluster_name, memorysize FROM SUBCLUSTER_RESOURCE_POOL_OVERRIDES;
name | subcluster_name | memorysize
------+-----------------+------------
tm | analytics_1 | 0%
(1 row)
=> SELECT name, subcluster_name, memorysize, plannedconcurrency
FROM resource_pools WHERE subcluster_name IS NOT NULL;
name | subcluster_name | memorysize | plannedconcurrency
----------------+-----------------+------------+--------------------
analytics_pool | analytics_1 | 70% | 8
(1 row)
=> SELECT * FROM LOAD_BALANCE_GROUPS;
name | policy | filter | type | object_name
-----------+------------+-----------+------------+-------------
analytics | ROUNDROBIN | 0.0.0.0/0 | Subcluster | analytics_1
(1 row)
The following example calls admintool's db_add_subcluster
tool to duplicate the analytics_1 subcluster onto a set of three hosts to create a subcluster named analytics_2.
$ admintools -t db_add_subcluster -d verticadb \
-s 10.11.12.13,10.11.12.14,10.11.12.15 \
-p mypassword --like=analytics_1 -c analytics_2
Creating new subcluster 'analytics_2'
Adding new hosts to 'analytics_2'
Eon database detected, creating new depot locations for newly added nodes
Creating depot locations for 1 nodes
Warning when creating depot location for node: v_verticadb_node0007
WARNING: Target node v_verticadb_node0007 is down, so depot size has been
estimated from depot location on initiator. As soon as the node comes
up, its depot size might be altered depending on its disk size
Eon database detected, creating new depot locations for newly added nodes
Creating depot locations for 1 nodes
Warning when creating depot location for node: v_verticadb_node0008
WARNING: Target node v_verticadb_node0008 is down, so depot size has been
estimated from depot location on initiator. As soon as the node comes
up, its depot size might be altered depending on its disk size
Eon database detected, creating new depot locations for newly added nodes
Creating depot locations for 1 nodes
Warning when creating depot location for node: v_verticadb_node0009
WARNING: Target node v_verticadb_node0009 is down, so depot size has been
estimated from depot location on initiator. As soon as the node comes
up, its depot size might be altered depending on its disk size
Cloning subcluster properties
NOTICE: Nodes in subcluster analytics_1 have network addresses, you
might need to configure network addresses for nodes in subcluster
analytics_2 in order to get load balance groups to work correctly.
Replicating configuration to all nodes
Generating new configuration information and reloading spread
Starting nodes:
v_verticadb_node0007 (10.11.12.81)
v_verticadb_node0008 (10.11.12.209)
v_verticadb_node0009 (10.11.12.186)
Starting Vertica on all nodes. Please wait, databases with a large catalog
may take a while to initialize.
Checking database state for newly added nodes
Node Status: v_verticadb_node0007: (DOWN) v_verticadb_node0008:
(DOWN) v_verticadb_node0009: (DOWN)
Node Status: v_verticadb_node0007: (INITIALIZING) v_verticadb_node0008:
(INITIALIZING) v_verticadb_node0009: (INITIALIZING)
Node Status: v_verticadb_node0007: (UP) v_verticadb_node0008:
(UP) v_verticadb_node0009: (UP)
Syncing catalog on verticadb with 2000 attempts.
Multi-node DB add completed
Nodes added to subcluster analytics_2 successfully.
Subcluster added to verticadb successfully.
Re-running the queries in the first part of the example shows that the settings from analytics_1 have been duplicated in analytics_2:
=> SELECT name, subcluster_name, memorysize FROM SUBCLUSTER_RESOURCE_POOL_OVERRIDES;
name | subcluster_name | memorysize
------+-----------------+------------
tm | analytics_1 | 0%
tm | analytics_2 | 0%
(2 rows)
=> SELECT name, subcluster_name, memorysize, plannedconcurrency
FROM resource_pools WHERE subcluster_name IS NOT NULL;
name | subcluster_name | memorysize | plannedconcurrency
----------------+-----------------+------------+--------------------
analytics_pool | analytics_1 | 70% | 8
analytics_pool | analytics_2 | 70% | 8
(2 rows)
=> SELECT * FROM LOAD_BALANCE_GROUPS;
name | policy | filter | type | object_name
-----------+------------+-----------+------------+-------------
analytics | ROUNDROBIN | 0.0.0.0/0 | Subcluster | analytics_2
analytics | ROUNDROBIN | 0.0.0.0/0 | Subcluster | analytics_1
(2 rows)
As noted earlier, even though analytics_2 subcluster is part of the analytics load balancing group, its nodes do not have network addresses defined for them. Until you define network addresses for the nodes, Vertica cannot redirect client connections to them.