Creating sandboxes

In order to create a sandbox for a secondary subcluster, all nodes in the subcluster must have a status of UP and collectively provide full-subscription coverage for all shards.

To sandbox the first subcluster in a sandbox, use the sandbox_subcluster admintools command:

$ adminTools -t sandbox_subcluster -h
Usage: sandbox_subcluster [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of database to be modified
  -p DBPASSWORD, --password=DBPASSWORD
                        Database password in single quotes
  -c SCNAME, --subcluster=SCNAME
                        Name of subcluster to be sandboxed
  -b SBNAME, --sandbox=SBNAME
                        Name of the sandbox
  --timeout=NONINTERACTIVE_TIMEOUT
                        set a timeout (in seconds) to wait for actions to
                        complete ('never') will wait forever (implicitly sets
                        -i)
  -i, --noprompts       do not stop and wait for user input(default false).
                        Setting this implies a timeout of 20 min.

At command runtime, under a global catalog lock (GCLX), the nodes in the specified subcluster create a checkpoint of the catalog. When these nodes auto-restart in the sandbox cluster, they form a primary subcluster that uses the data and catalog checkpoint from the main cluster. After the nodes successfully restart, the sandbox cluster and the main cluster are mutually isolated and can diverge.

While the nodes in the main cluster sync their metadata to /path-to-communal-storage/metadata/db_name, the nodes in the sandbox sync to /path-to-communal-storage/metadata/sandbox_name.

Note

Though the sandbox cluster and main cluster are mutually isolated, the nodes in both clusters remain in the same Spread ring. However, if the main cluster and sandbox cluster partition from each other—which can be caused by events like Spread communication problems—the two clusters are not affected.

You can perform standard database operations and queries, such as loading data or creating new tables, in either cluster without affecting the other cluster. For example, dropping a table in the sandbox cluster does not drop the table in the main cluster, and vice versa.

Because both clusters reference the same data files, neither cluster can delete files that existed at the time of sandbox creation. However, the sandbox can remove files that it creates after spin-off from the main cluster. Files in the main cluster are queued for removal, but they are not processed until all active sandboxes are removed.

You cannot nest sandboxes, remove a sandboxed subcluster from the database, or add or remove nodes to an existing sandbox, but you can have multiple individual sandboxes active at the same time.

Adding subclusters to existing sandboxes

You can add additional secondary subclusters to existing sandbox clusters using the SANDBOX_SUBCLUSTER function. Added subclusters must be secondary, cannot be a member of any other sandbox cluster, and all member nodes must have a status of UP.

When sandboxing additional subclusters, you must first call the SANDBOX_SUBCLUSTER function in the sandbox cluster and then in the main cluster, providing the same sandbox cluster and subcluster names in both calls. This makes both clusters aware of the status of the sandboxed subcluster. After the function is called in both clusters, the subcluster automatically restarts, joins the sandbox cluster as a secondary subcluster, and subscribes to the sandbox cluster's shards.

Examples

The following command sandboxes the sc01 secondary subcluster into a sandbox named sand:

$ admintools -t sandbox_subcluster -d verticadb -p password -c sc_01 -b sand

Validating sandboxing conditions

Sandboxing subcluster sc_01 as sand...
Subcluster 'sc_01' has been sandboxed to 'sand'. It is going to auto-restart and re-form.

Checking for sandboxed nodes to be UP...
    Node Status: v_verticadb_node0004: (DOWN) v_verticadb_node0005: (DOWN) v_verticadb_node0006: (DOWN)
    Node Status: v_verticadb_node0004: (UP) v_verticadb_node0005: (UP) v_verticadb_node0006: (UP)

Sandboxing complete. Subcluster sc_01 is ready for use

If you query the NODES system table from the main cluster, you can see that the sc_01 nodes have a status of UNKNOWN and are listed as member of the sand sandbox:

=> SELECT node_name, subcluster_name, node_state, sandbox FROM NODES;
      node_name       |  subcluster_name   | node_state | sandbox
----------------------+--------------------+------------+---------
 v_verticadb_node0001 | default_subcluster | UP         |
 v_verticadb_node0002 | default_subcluster | UP         |
 v_verticadb_node0003 | default_subcluster | UP         |
 v_verticadb_node0004 | sc_01              | UNKNOWN    | sand
 v_verticadb_node0005 | sc_01              | UNKNOWN    | sand
 v_verticadb_node0006 | sc_01              | UNKNOWN    | sand
 v_verticadb_node0007 | sc_02              | UP         | 
 v_verticadb_node0008 | sc_02              | UP         | 
 v_verticadb_node0009 | sc_02              | UP         | 
(9 rows)

Note

If the sandboxed and main clusters are sufficiently incompatible or are Spread-partitioned from each other, the status of the sandboxed nodes might appear as DOWN in the above query.

When you issue the same query on one of the sandboxed nodes, the table shows that the sandboxed nodes are UP and the nodes from the main cluster are UNKNOWN, confirming that the cluster is successfully sandboxed and isolated from the main cluster:

=> SELECT node_name, subcluster_name, node_state, sandbox FROM NODES;
      node_name       |  subcluster_name   | node_state | sandbox
----------------------+--------------------+------------+---------
 v_verticadb_node0001 | default_subcluster | UNKNOWN    |
 v_verticadb_node0002 | default_subcluster | UNKNOWN    |
 v_verticadb_node0003 | default_subcluster | UNKNOWN    |
 v_verticadb_node0004 | sc_01              | UP         | sand
 v_verticadb_node0005 | sc_01              | UP         | sand
 v_verticadb_node0006 | sc_01              | UP         | sand
 v_verticadb_node0007 | sc_02              | UNKNOWN    | 
 v_verticadb_node0008 | sc_02              | UNKNOWN    | 
 v_verticadb_node0009 | sc_02              | UNKNOWN    | 
(9 rows)

You can now perform standard database operations in either cluster without impacting the other cluster. For instance, if you create a machine learning dataset named train_data in the sandboxed subcluster, the new table does not propagate to the main cluster:

--In the sandboxed subcluster
=> CREATE TABLE train_data(time timestamp, Temperature float);
CREATE TABLE

=> COPY train_data FROM LOCAL 'daily-min-temperatures.csv' DELIMITER ',';
 Rows Loaded
-------------
3650
(1 row)

=> SELECT * FROM train_data LIMIT 5;
        time         | Temperature
---------------------+-------------
 1981-01-27 00:00:00 |        19.4
 1981-02-20 00:00:00 |        15.7
 1981-02-27 00:00:00 |        17.5
 1981-03-04 00:00:00 |          16
 1981-04-24 00:00:00 |        11.5
(5 rows)

--In the main cluster
=> SELECT * FROM train_data LIMIT 5;
ERROR 4566:  Relation "train_data" does not exist

Similarly, if you drop a table in the main cluster, the table is not subsequently dropped in the sandboxed cluster:

--In the main cluster
=> SELECT * FROM transaction_data LIMIT 5;
 first_name | last_name |   store   |  cost   | fraud
------------+-----------+-----------+---------+-------
 Adam       | Rice      | Gembucket | 8757.35 | FALSE
 Alan       | Gordon    | Wrapsafe  | 3874.48 | FALSE
 Albert     | Harvey    | Treeflex  | 1558.27 | FALSE
 Andrea     | Bryant    | Greenlam  |  1148.2 | FALSE
 Andrew     | Simmons   | Home Ing  | 8400.03 | FALSE
(5 rows)

=> DROP TABLE transaction_data;
DROP TABLE

--In the sandboxed subcluster
=> SELECT * FROM transaction_data LIMIT 5;
 first_name | last_name |   store   |  cost   | fraud
------------+-----------+-----------+---------+-------
 Adam       | Rice      | Gembucket | 8757.35 | FALSE
 Alan       | Gordon    | Wrapsafe  | 3874.48 | FALSE
 Albert     | Harvey    | Treeflex  | 1558.27 | FALSE
 Andrea     | Bryant    | Greenlam  |  1148.2 | FALSE
 Andrew     | Simmons   | Home Ing  | 8400.03 | FALSE
(5 rows)

To add an additional secondary subcluster to the sandbox cluster, call SANDBOX_SUBCLUSTER in the sandbox cluster and then in the main cluster:

--In the sandbox cluster
=> SELECT SANDBOX_SUBCLUSTER('sand', 'sc_02', '');
                                                sandbox_subcluster                                                      
------------------------------------------------------------------------------------------------------------------------------
 Subcluster 'sc_02' has been added to 'sand'. When the same command is executed in the main cluster, it can join the sandbox.
(1 row)

--In the main cluster
=> SELECT SANDBOX_SUBCLUSTER('sand', 'sc_02', '');
                                    sandbox_subcluster                                     
-------------------------------------------------------------------------------------------
 Subcluster 'sc_02' has been sandboxed to 'sand'. It is going to auto-restart and re-form.
(1 row)

--In the sandbox cluster
=> SELECT node_name, subcluster_name, node_state, sandbox FROM NODES;
      node_name       |  subcluster_name   | node_state | sandbox
----------------------+--------------------+------------+---------
 v_verticadb_node0001 | default_subcluster | UNKNOWN    |
 v_verticadb_node0002 | default_subcluster | UNKNOWN    |
 v_verticadb_node0003 | default_subcluster | UNKNOWN    |
 v_verticadb_node0004 | sc_01              | UP         | sand
 v_verticadb_node0005 | sc_01              | UP         | sand
 v_verticadb_node0006 | sc_01              | UP         | sand
 v_verticadb_node0007 | sc_02              | UP         | sand
 v_verticadb_node0008 | sc_02              | UP         | sand
 v_verticadb_node0009 | sc_02              | UP         | sand
(9 rows)

If you decide to upgrade your sandboxed cluster, you can confirm that the main cluster and sandboxed cluster are running two different Vertica versions by comparing their build_info values in the NODES system table:

=> SELECT node_name, subcluster_name, node_state, sandbox, build_info FROM NODES;
      node_name       |  subcluster_name   | node_state | sandbox | build_info
----------------------+--------------------+------------+---------+------------
 v_verticadb_node0001 | default_subcluster | UP         |         | v12.0.4-0
 v_verticadb_node0002 | default_subcluster | UP         |         | v12.0.4-0
 v_verticadb_node0003 | default_subcluster | UP         |         | v12.0.4-0
 v_verticadb_node0004 | sc_01              | UNKNOWN    | sand    | v12.0.4-1
 v_verticadb_node0005 | sc_01              | UNKNOWN    | sand    | v12.0.4-1
 v_verticadb_node0006 | sc_01              | UNKNOWN    | sand    | v12.0.4-1
 v_verticadb_node0007 | sc_02              | UNKNOWN    | sand    | v12.0.4-1
 v_verticadb_node0008 | sc_02              | UNKNOWN    | sand    | v12.0.4-1
 v_verticadb_node0009 | sc_02              | UNKNOWN    | sand    | v12.0.4-1
(9 rows)

Creating sandboxes

Note

Adding subclusters to existing sandboxes

Examples

Note

See also