Starting and stopping subclusters

Subclusters make it convenient to start and stop a group of nodes as needed.

Subclusters make it convenient to start and stop a group of nodes as needed. You start and stop them with admintools commands or Vertica functions. You can also start and stop subclusters with Management Console.

Starting a subcluster

To start a subcluster, use the admintools command restart_subcluster:

$ adminTools -t restart_subcluster -h
Usage: restart_subcluster [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of database whose subcluster is to be restarted
  -c SCNAME, --subcluster=SCNAME
                        Name of subcluster to be restarted
  -p DBPASSWORD, --password=DBPASSWORD
                        Database password in single quotes
  --timeout=NONINTERACTIVE_TIMEOUT
                        set a timeout (in seconds) to wait for actions to
                        complete ('never') will wait forever (implicitly sets
                        -i)
  -i, --noprompts       do not stop and wait for user input(default false).
                        Setting this implies a timeout of 20 min.
  -F, --force           Force the nodes in the subcluster to start and auto
                        recover if necessary

This example starts the subcluster analytics_cluster:

$ adminTools -t restart_subcluster -c analytics_cluster \
          -d verticadb -p password
*** Restarting subcluster for database verticadb ***
        Restarting host [10.11.12.192] with catalog [v_verticadb_node0006_catalog]
        Restarting host [10.11.12.181] with catalog [v_verticadb_node0004_catalog]
        Restarting host [10.11.12.205] with catalog [v_verticadb_node0005_catalog]
        Issuing multi-node restart
        Starting nodes:
                v_verticadb_node0004 (10.11.12.181)
                v_verticadb_node0005 (10.11.12.205)
                v_verticadb_node0006 (10.11.12.192)
        Starting Vertica on all nodes. Please wait, databases with a large
            catalog may take a while to initialize.
        Node Status: v_verticadb_node0002: (UP) v_verticadb_node0004: (DOWN)
                     v_verticadb_node0005: (DOWN) v_verticadb_node0006: (DOWN)
        Node Status: v_verticadb_node0002: (UP) v_verticadb_node0004: (DOWN)
                     v_verticadb_node0005: (DOWN) v_verticadb_node0006: (DOWN)
        Node Status: v_verticadb_node0002: (UP) v_verticadb_node0004: (DOWN)
                     v_verticadb_node0005: (DOWN) v_verticadb_node0006: (DOWN)
        Node Status: v_verticadb_node0002: (UP) v_verticadb_node0004: (DOWN)
                     v_verticadb_node0005: (DOWN) v_verticadb_node0006: (DOWN)
        Node Status: v_verticadb_node0002: (UP) v_verticadb_node0004: (UP)
                     v_verticadb_node0005: (UP) v_verticadb_node0006: (UP)
Communal storage detected: syncing catalog

Restart Subcluster result:  1

Stopping a subcluster

You can stop a subcluster gracefully with the function SHUTDOWN_WITH_DRAIN, or immediately with SHUTDOWN_SUBCLUSTER. You can also shut down subclusters with the admintools command stop_subcluster.

Graceful shutdown

The SHUTDOWN_WITH_DRAIN function drains a subcluster's client connections before shutting it down. The function first marks all nodes in the specified subcluster as draining. Work from existing user sessions continues on draining nodes, but the nodes refuse new client connections and are excluded from load-balancing operations. A dbadmin user can still connect to draining nodes. For more information about client connection draining, see Drain client connections.

To run the SHUTDOWN_WITH_DRAIN function, you must specify a timeout value. The function's behavior depends on the sign of the timeout value:

Positive: The nodes drain until either all the existing connections close or the function reaches the runtime limit set by the timeout value. As soon as one of these conditions is met, the function sends a shutdown message to the subcluster and returns.
Zero: The function immediately closes any active user sessions on the subcluster and then shuts down the subcluster and returns.
Negative: The function marks the subcluster's nodes as draining and waits to shut down the subcluster until all active user sessions disconnect.

After all nodes in a draining subcluster are down, its nodes are automatically reset to a not draining status.

The following example demonstrates how you can use a positive timeout value to give active user sessions time to finish their work before shutting down the subcluster:

=> SELECT node_name, subcluster_name, is_draining, count_client_user_sessions, oldest_session_user FROM draining_status ORDER BY 1;
      node_name       |  subcluster_name   | is_draining | count_client_user_sessions | oldest_session_user
----------------------+--------------------+-------------+----------------------------+---------------------
 v_verticadb_node0001 | default_subcluster | f           |                          0 |
 v_verticadb_node0002 | default_subcluster | f           |                          0 |
 v_verticadb_node0003 | default_subcluster | f           |                          0 |
 v_verticadb_node0004 | analytics          | f           |                          1 | analyst
 v_verticadb_node0005 | analytics          | f           |                          0 |
 v_verticadb_node0006 | analytics          | f           |                          0 |
(6 rows)

=> SELECT SHUTDOWN_WITH_DRAIN('analytics', 300);
NOTICE 0:  Draining has started on subcluster (analytics)
NOTICE 0:  Begin shutdown of subcluster (analytics)
                              SHUTDOWN_WITH_DRAIN
--------------------------------------------------------------------------------------------------------------------
Set subcluster (analytics) to draining state
Waited for 3 nodes to drain
Shutdown message sent to subcluster (analytics)

(1 row)

You can query the NODES system table to confirm that the subcluster shut down:

=> SELECT subcluster_name, node_name, node_state FROM nodes;
  subcluster_name   |      node_name       | node_state
--------------------+----------------------+------------
 default_subcluster | v_verticadb_node0001 | UP
 default_subcluster | v_verticadb_node0002 | UP
 default_subcluster | v_verticadb_node0003 | UP
 analytics          | v_verticadb_node0004 | DOWN
 analytics          | v_verticadb_node0005 | DOWN
 analytics          | v_verticadb_node0006 | DOWN
(6 rows)

If you want to see more information about the draining and shutdown events, such as whether all user sessions finished their work before the timeout, you can query the dc_draining_events table. In this case, the subcluster still had one active user session when the function reached timeout:

=> SELECT event_type, event_type_name, event_description, event_result, event_result_name FROM dc_draining_events;
 event_type |       event_type_name        |                          event_description                          | event_result | event_result_name
------------+------------------------------+---------------------------------------------------------------------+--------------+-------------------
          0 | START_DRAIN_SUBCLUSTER       | START_DRAIN for SHUTDOWN of subcluster (analytics)                  |            0 | SUCCESS
          2 | START_WAIT_FOR_NODE_DRAIN    | Wait timeout is 300 seconds                                         |            4 | INFORMATIONAL
          4 | INTERVAL_WAIT_FOR_NODE_DRAIN | 1 sessions remain after 0 seconds                                   |            4 | INFORMATIONAL
          4 | INTERVAL_WAIT_FOR_NODE_DRAIN | 1 sessions remain after 60 seconds                                  |            4 | INFORMATIONAL
          4 | INTERVAL_WAIT_FOR_NODE_DRAIN | 1 sessions remain after 120 seconds                                 |            4 | INFORMATIONAL
          4 | INTERVAL_WAIT_FOR_NODE_DRAIN | 1 sessions remain after 125 seconds                                 |            4 | INFORMATIONAL
          4 | INTERVAL_WAIT_FOR_NODE_DRAIN | 1 sessions remain after 180 seconds                                 |            4 | INFORMATIONAL
          4 | INTERVAL_WAIT_FOR_NODE_DRAIN | 1 sessions remain after 240 seconds                                 |            4 | INFORMATIONAL
          4 | INTERVAL_WAIT_FOR_NODE_DRAIN | 1 sessions remain after 250 seconds                                 |            4 | INFORMATIONAL
          4 | INTERVAL_WAIT_FOR_NODE_DRAIN | 1 sessions remain after 300 seconds                                 |            4 | INFORMATIONAL
          3 | END_WAIT_FOR_NODE_DRAIN      | Wait for drain ended with 1 sessions remaining                      |            2 | TIMEOUT
          5 | BEGIN_SHUTDOWN_AFTER_DRAIN   | Staring shutdown of subcluster (analytics) following drain          |            4 | INFORMATIONAL
(12 rows)

After you restart the subcluster, you can query the DRAINING_STATUS system table to confirm that the nodes have reset their draining statuses to not draining:

=> SELECT node_name, subcluster_name, is_draining, count_client_user_sessions, oldest_session_user FROM draining_status ORDER BY 1;
      node_name       |  subcluster_name   | is_draining | count_client_user_sessions | oldest_session_user
----------------------+--------------------+-------------+----------------------------+---------------------
 v_verticadb_node0001 | default_subcluster | f           |                          0 |
 v_verticadb_node0002 | default_subcluster | f           |                          0 |
 v_verticadb_node0003 | default_subcluster | f           |                          0 |
 v_verticadb_node0004 | analytics          | f           |                          0 |
 v_verticadb_node0005 | analytics          | f           |                          0 |
 v_verticadb_node0006 | analytics          | f           |                          0 |
(6 rows)

Immediate shutdown

To shut down a subcluster immediately, call SHUTDOWN_SUBCLUSTER. The following example shuts down the analytics subcluster immediately, without checking for active client connections:

=> SELECT SHUTDOWN_SUBCLUSTER('analytics');
 SHUTDOWN_SUBCLUSTER
---------------------
Subcluster shutdown
(1 row)

admintools

You can use the stop_subcluster tool to stop a subcluster:

$ adminTools -t stop_subcluster -h
Usage: stop_subcluster [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of database whose subcluster is to be stopped
  -c SCNAME, --subcluster=SCNAME
                        Name of subcluster to be stopped
  -p DBPASSWORD, --password=DBPASSWORD
                        Database password in single quotes
  -n DRAIN_SECONDS, --drain-seconds=DRAIN_SECONDS
                        Seconds to wait for user connections to close.
                        Default value is 60 seconds.
                        When the time expires, connections will be forcibly closed
                        and the db will shut down.
  -F, --force           Force the subcluster to shutdown immediately,
                        even if users are connected.
  --timeout=NONINTERACTIVE_TIMEOUT
                        set a timeout (in seconds) to wait for actions to
                        complete ('never') will wait forever (implicitly sets
                        -i)
  -i, --noprompts       do not stop and wait for user input(default false).
                        Setting this implies a timeout of 20 min.

By default, stop_subcluster calls SHUTDOWN_WITH_DRAIN to gracefully shut down the target subcluster. The shutdown process drains client connections from the subcluster before shutting it down.

The -n (--drain-seconds) option, which has a default value of 60 seconds, allows you to specify the number of seconds to wait before forcefully closing client connections and shutting down the subcluster. If you set a negative -n value, the subcluster is marked as draining but is not shut down until all active user sessions disconnect.

In the following example, the subcluster named analytics initially has an active client session, but the session closes before the timeout limit is reached and the subcluster shuts down:

$ admintools -t stop_subcluster -d verticadb -c analytics --password password --drain-seconds 200
--- Subcluster shutdown ---
Verifying subcluster 'analytics'
Node 'v_verticadb_node0004' will shutdown
Node 'v_verticadb_node0005' will shutdown
Node 'v_verticadb_node0006' will shutdown
Connecting to database to begin shutdown of subcluster 'analytics'
Shutdown will use connection draining.
Shutdown will wait for all client sessions to complete, up to 200 seconds
Then it will force a shutdown.
Poller has been running for 0:00:00.000022 seconds since 2022-07-28 12:18:04.891781

------------------------------------------------------------
client_sessions     |node_count          |node_names
--------------------------------------------------------------
0                   |5                   |v_verticadb_node0002,v_verticadb_node0004,v_verticadb_node0003,v_verticadb_node0...
1                   |1                   |v_verticadb_node0005
STATUS: vertica.engine.api.db_client.module is still running on 1 host: nodeIP as of 2022-07-28 12:18:14. See /opt/vertica/log/adminTools.log for full details.
Poller has been running for 0:00:10.383018 seconds since 2022-07-28 12:18:04.891781

...

------------------------------------------------------------
client_sessions     |node_count          |node_names
--------------------------------------------------------------
0                   |3                   |v_verticadb_node0002,v_verticadb_node0001,v_verticadb_node0003
down                |3                   |v_verticadb_node0004,v_verticadb_node0005,v_verticadb_node0006
Stopping poller drain_status because it was canceled
SUCCESS running the shutdown metafunction
Not waiting for processes to completely exit
Shutdown operation was successful

You can use the -F (or --force) option to shut down a subcluster immediately, without checking for active user sessions or draining the subcluster:

$ admintools -t stop_subcluster -d verticadb -c analytics --password password -F
--- Subcluster shutdown ---
Verifying subcluster 'analytics'
Node 'v_verticadb_node0004' will shutdown
Node 'v_verticadb_node0005' will shutdown
Node 'v_verticadb_node0006' will shutdown
Connecting to database to begin shutdown of subcluster 'analytics'
Running shutdown metafunction. Not using connection draining
STATUS: vertica.engine.api.db_client.module is still running on 1 host: 192.168.111.31 as of 2022-07-28 13:13:57. See /opt/vertica/log/adminTools.log for full details.
STATUS: vertica.engine.api.db_client.module is still running on 1 host: 192.168.111.31 as of 2022-07-28 13:14:07. See /opt/vertica/log/adminTools.log for full details.
SUCCESS running the shutdown metafunction
Not waiting for processes to completely exit
Shutdown operation was successful

If you want to shut down all subclusters in a database, see Stopping an Eon Mode Database.