Replacing nodes

If you have a database, you can replace nodes, as necessary, without bringing the system down.

If you have a K-Safe database, you can replace nodes, as necessary, without bringing the system down. For example, you might want to replace an existing node if you:

  • Need to repair an existing host system that no longer functions and restore it to the cluster

  • Want to exchange an existing host system for another more powerful system

The process you use to replace a node depends on whether you are replacing the node with:

  • A host that uses the same name and IP address

  • A host that uses a different name and IP address

  • An active standby node

Prerequisites

  • Configure the replacement hosts for Vertica. See Before you Install Vertica.

  • Read the Important Tipssections under Adding hosts to a cluster and Removing hosts from a cluster.

  • Ensure that the database administrator user exists on the new host and is configured identically to the existing hosts. Vertica will setup passwordless ssh as needed.

  • Ensure that directories for Catalog Path, Data Path, and any storage locations are added to the database when you create it and/or are mounted correctly on the new host and have read and write access permissions for the database administrator user. Also ensure that there is sufficient disk space.

  • Follow the best practice procedure below for introducing the failed hardware back into the cluster to avoid spurious full-node rebuilds.

Best practice for restoring failed hardware

Following this procedure will prevent Vertica from misdiagnosing missing disk or bad mounts as data corruptions, which would result in a time-consuming, full-node recovery.

If a server fails due to hardware issues, for example a bad disk or a failed controller, upon repairing the hardware:

  1. Reboot the machine into runlevel 1, which is a root and console-only mode.

    Runlevel 1 prevents network connectivity and keeps Vertica from attempting to reconnect to the cluster.

  2. In runlevel 1, validate that the hardware has been repaired, the controllers are online, and any RAID recover is able to proceed.

  3. Once the hardware is confirmed consistent, only then reboot to runlevel 3 or higher.

At this point, the network activates, and Vertica rejoins the cluster and automatically recovers any missing data. Note that, on a single-node database, if any files that were associated with a projection have been deleted or corrupted, Vertica will delete all files associated with that projection, which could result in data loss.