Hybrid Kubernetes clusters

An Eon Mode database can run hosts separate from the database and within Kubernetes.

Important

Image versions 24.1.0-0 and higher do not support any of the following features:

Backup and restore with vbr
Hybrid Kubernetes clusters

These operations require Administration tools (admintools), which is not included in the Vertica server image beginning with version 24.1.0-0. To perform these tasks, you must use image version 23.4.0-0 or lower with one of the following API version configurations:

v1beta1
v1 and vertica.com/vcluster-ops annotation set to false

An Eon Mode database can run hosts separate from the database and within Kubernetes. This architecture is useful in the following scenarios:

Leveraging Kubernetes tooling to quickly create a secondary subcluster for a database.
Creating an isolated sandbox environment to run ad hoc queries on a communal dataset.
Experimenting with the Vertica on Kubernetes performance overhead without migrating your primary subcluster into Kubernetes.

Define the Kubernetes portion of a hybrid architecture with a custom resource (CR). The custom resource has no knowledge of Vertica hosts that exist separately from the custom resource. This limits the operator's functionality and requires that you manually complete some tasks that the operator automates for a standard Vertica on Kubernetes custom resource.

Requirements and restrictions

The hybrid Kubernetes architecture has the following requirements and restrictions:

Hybrid Kubernetes clusters require a tool that enables Border Gateway Protocol (BGP) so that pods are accessible to your on-premises subcluster for external communication. For example, you can use the Calico CNI plugin to enable BGP.
You cannot use network address translation (NAT) between the Kubernetes pods and the on-premises cluster.

Operator limitations

In a hybrid architecture, the operator has no visibility outside of the custom resource. This limited visibility means that the operator cannot interact with the Eon Mode database or the primary subcluster. Within the scope of the custom resource, the operator automates only the following:

Schedules pods based on the manifest.
Creates service objects for the subcluster.
Creates a PersistentVolumeClaim (PVC) that persists data for each pod.
Executes the restart_node administration tool command if the Vertica server process is not running. To override this default behavior, set the autoRestartVertica custom resource parameter to false.

Defining a hybrid cluster

To define a hybrid cluster, you must set up SSH communications between the Eon Mode nodes and containers, and then define the hybrid CR.

SSH between environments

In an Eon Mode database, nodes communicate through SSH. Vertica containers use SSH with a static key. Because the CR has no knowledge of any of the Eon Mode hosts, you must make the containers aware of the Eon Mode SSH keys.

You can create a Secret for the CR that stores SSH credentials for both the Eon Mode database and the Vertica container. The Secret must contain the following:

id_rsa: private key shared among the pods.
id_rsa.pub: public key shared among the pods.
authorized_keys: file that contains the following keys:
- id_rsa.pub for pod-to-pod traffic.
- public key of on-premises root account.
- public key of on-prem dbadmin account.

The following command creates a Secret named ssh-key that stores these SSH credentials. The Secret persists between life cycles to allow secure connections between the on-premises nodes and the CR:

$ kubectl create secret generic ssh-keys --from-file=$HOME/.ssh

Hybrid CR definition

Create a custom resource to define a subcluster that runs outside your standard Eon Mode database:

apiVersion: vertica.com/v1beta1
kind: VerticaDB
metadata:
  name: hybrid-secondary-sc
spec:
  image: vertica/vertica-k8s:latest
  initPolicy: ScheduleOnly
  sshSecret: ssh-keys
  local:
    dataPath: /data
    depotPath: /depot
  dbName: vertdb
  subclusters:
    - name: sc1
      size: 3
    - name: sc2
      size: 3

Deprecated

The v1beta1 API version is deprecated and will be removed in a future release. Vertica recommends that you upgrade to API version v1. For details, see Upgrading Vertica on Kubernetes and VerticaDB custom resource definition.

In the previous example:

initPolicy: Hybrid clusters require that you set this to ScheduleOnly.
sshSecret: The Secret that contains SSH keys that authenticate connections to Vertica hosts outside the CR.
local: Required. The values persist data to the PersistentVolume (PV). These values must match the directory locations in the Eon Mode database that is associated with the Kubernetes pods.
dbName: This value must match the name of the standard Eon Mode database that is associated with this subcluster.
subclusters: Definition for each subcluster.

Note

Hybrid custom resources ignore configuration parameters that control settings outside the scope of the hybrid subcluster, such as the communal.* and the subclusters[i].isPrimary parameters.

For complete implementation details, see VerticaDB custom resource definition. For details about each setting, see Custom resource definition parameters.

Maintaining quorum

If quorum is lost, you must manually restart the cluster with admintools:

$ /opt/vertica/bin/admintools -t restart_db --database database-name;

For details about maintaining quorum, see Data integrity and high availability in an Eon Mode database.

Scaling the Kubernetes subcluster

When you scale a hybrid cluster, you add nodes from the primary subcluster to the secondary subcluster on Kubernetes.

HDFS with Kerberos authentication

If you are scaling a cluster that authenticates Hadoop file storage (HDFS) data with Kerberos, you must alter the database configuration before you scale.

In the default configuration, the Vertica server process running in the Kubernetes pods cannot access the HDFS data due to incorrect permissions on the keytab file mounted in the pod. This requires that you set the KerberosEnableKeytabPermissionCheck Kerberos parameter:

Set the KerberosEnableKeytabPermissionCheck configuration parameter to 0:

=> ALTER DATABASE DEFAULT SET KerberosEnableKeytabPermissionCheck = 0;
WARNING 4324:  Parameter KerberosEnableKeytabPermissionCheck will not take effect until database restart
ALTER DATABASE

Restart the cluster with admintools so that the new setting takes effect:

$ /opt/vertica/bin/admintools -t restart_db --database database-name;

For additional details about Vertica on Kubernetes and HDFS, see Configuring communal storage.

Scale the subcluster

When you add nodes from the primary subcluster to the secondary subcluster on Kubernetes, you must set up the configuration directory for the new nodes and change operator behavior during the scaling event:

Execute the update_vertica script to set up the configuration directory. Vertica on Kubernetes requires the following configuration options for update_vertica:

$ /opt/vertica/sbin/update_vertica \
    --accept-eula \
    --add-hosts host-list \
    --dba-user-password dba-user-password \
    --failure-threshold NONE \
    --no-system-configuration \
    --point-to-point \
    --data-dir /data-dir \
    --dba-user dbadmin \
    --no-package-checks \
    --no-ssh-key-install

Set autoRestartVertica to false so that the operator does not interfere with the scaling operation:

$ kubectl patch vdb database-name --type=merge --patch='{"spec": {"autoRestartVertica": false}}'

Add the new nodes with the admintools db_add_node option:

$ /opt/vertica/bin/admintools \
 -t db_add_node \
 --hosts host-list \
 --database database-name\
 --subcluster sc-name \
 --noprompt

For details, see Adding and removing nodes from subclusters.

After the scaling operation, set autoRestartVertica back to true:

$ kubectl patch vdb database-name --type=merge --patch='{"spec": {"autoRestartVertica": true}}'