High availability with fault groups
Use fault groups to reduce the risk of correlated failures inherent in your physical environment. Correlated failures occur when two or more nodes fail as a result of a single failure. For example, such failures can occur due to problems with shared resources such as power loss, networking issues, or storage.
Vertica minimizes the risk of correlated failures by letting you define fault groups on your cluster. Vertica then uses the fault groups to distribute data segments across the cluster, so the database continues running if a single failure event occurs.
Note
If your cluster layout is managed by a single network switch, a switch failure can be a single point of failure. Fault groups cannot help with single-point failures.Vertica supports complex, hierarchical fault groups of different shapes and sizes. You can integrate fault groups with elastic cluster and large cluster arrangements to add cluster flexibility and reliability.
Making Vertica aware of cluster topology with fault groups
You can also use fault groups to make Vertica aware of the topology of the cluster on which your Vertica database is running. Making Vertica aware of your cluster's topology is required when using terrace routing, which can significantly reduce message buffering on a large cluster database.
Automatic fault groups
When you configure a cluster of 120 nodes or more, Vertica automatically creates fault groups around control nodes. Control nodes are a subset of cluster nodes that manage spread (control messaging). Vertica places nodes that share a control node in the same fault group. See Large cluster for details.
User-defined fault groups
Define your own default groups if:
-
Your cluster layout has the potential for correlated failures.
-
You want to influence which cluster hosts manage control messaging.
Example cluster topology
The following diagram provides an example of hierarchical fault groups configured on a single cluster:
-
Fault group
FG–A
contains nodes only. -
Fault group
FG-B
(parent) contains child fault groupsFG-C
andFG-D
. Each child fault group also contain nodes. -
Fault group
FG–E
(parent) contains child fault groupsFG-F
andFG-G
. The parent fault groupFG–E
also contains nodes.
How to create fault groups
Before you define fault groups, you must have a thorough knowledge of your physical cluster layout. Fault groups require careful planning.
To define fault groups, create an input file of your cluster arrangement. Then, pass the file to a script supplied by Vertica, and the script returns the SQL statements you need to run. See Fault Groups for details.