Hybrid Kubernetes clusters
Important
Image version 24.1.0-0
does not support any of the following features:
- Backup and restore
- Hybrid Kubernetes clusters
These operations require Administration tools (admintools), which is not included in the 24.1.0-0
image. To perform these tasks, you must use image version 23.4.0-0
or lower with one of the following API version configurations:
v1beta1
v1
and vertica.com/vcluster-ops annotation set tofalse
An Eon Mode database can run hosts separate from the database and within Kubernetes. This architecture is useful in the following scenarios:
-
Leveraging Kubernetes tooling to quickly create a secondary subcluster for a database.
-
Creating an isolated sandbox environment to run ad hoc queries on a communal dataset.
-
Experimenting with the Vertica on Kubernetes performance overhead without migrating your primary subcluster into Kubernetes.
Define the Kubernetes portion of a hybrid architecture with a custom resource (CR). The custom resource has no knowledge of Vertica hosts that exist separately from the custom resource. This limits the operator's functionality and requires that you manually complete some tasks that the operator automates for a standard Vertica on Kubernetes custom resource.
Requirements and restrictions
The hybrid Kubernetes architecture has the following requirements and restrictions:
-
Hybrid Kubernetes clusters require a tool that enables Border Gateway Protocol (BGP) so that pods are accessible to your on-premises subcluster for external communication. For example, you can use the Calico CNI plugin to enable BGP.
-
You cannot use network address translation (NAT) between the Kubernetes pods and the on-premises cluster.
Operator limitations
In a hybrid architecture, the operator has no visibility outside of the custom resource. This limited visibility means that the operator cannot interact with the Eon Mode database or the primary subcluster. Within the scope of the custom resource, the operator automates only the following:
-
Schedules pods based on the manifest.
-
Creates service objects for the subcluster.
-
Creates a PersistentVolumeClaim (PVC) that persists data for each pod.
-
Executes the restart_node administration tool command if the Vertica server process is not running. To override this default behavior, set the
autoRestartVertica
custom resource parameter tofalse
.
Defining a hybrid cluster
To define a hybrid cluster, you must set up SSH communications between the Eon Mode nodes and containers, and then define the hybrid CR.
SSH between environments
In an Eon Mode database, nodes communicate through SSH. Vertica containers use SSH with a static key. Because the CR has no knowledge of any of the Eon Mode hosts, you must make the containers aware of the Eon Mode SSH keys.
You can create a Secret for the CR that stores SSH credentials for both the Eon Mode database and the Vertica container. The Secret must contain the following:
- id_rsa: private key shared among the pods.
- id_rsa.pub: public key shared among the pods.
- authorized_keys: file that contains the following keys:
- id_rsa.pub for pod-to-pod traffic.
- public key of on-premises root account.
- public key of on-prem dbadmin account.
The following command creates a Secret named ssh-key
that stores these SSH credentials. The Secret persists between life cycles to allow secure connections between the on-premises nodes and the CR:
$ kubectl create secret generic ssh-keys --from-file=$HOME/.ssh
Hybrid CR definition
Create a custom resource to define a subcluster that runs outside your standard Eon Mode database:
apiVersion: vertica.com/v1beta1
kind: VerticaDB
metadata:
name: hybrid-secondary-sc
spec:
image: vertica/vertica-k8s:latest
initPolicy: ScheduleOnly
sshSecret: ssh-keys
local:
dataPath: /data
depotPath: /depot
dbName: vertdb
subclusters:
- name: sc1
size: 3
- name: sc2
size: 3
Deprecated
Thev1beta1
API version is deprecated and will be removed in a future release. Vertica recommends that you upgrade to API version v1
. For details, see Upgrading Vertica on Kubernetes and VerticaDB CRD.
In the previous example:
-
initPolicy
: Hybrid clusters require that you set this toScheduleOnly
. -
sshSecret
: The Secret that contains SSH keys that authenticate connections to Vertica hosts outside the CR. -
local
: Required. The values persist data to the PersistentVolume (PV). These values must match the directory locations in the Eon Mode database that is associated with the Kubernetes pods. -
dbName
: This value must match the name of the standard Eon Mode database that is associated with this subcluster. -
subclusters
: Definition for each subcluster.
Note
Hybrid custom resources ignore configuration parameters that control settings outside the scope of the hybrid subcluster, such as thecommunal.*
and the subclusters[i].isPrimary
parameters.
For complete implementation details, see VerticaDB CRD. For details about each setting, see Custom resource definition parameters.
Maintaining quorum
If quorum is lost, you must manually restart the cluster with admintools:
$ /opt/vertica/bin/admintools -t restart_db --database database-name;
For details about maintaining quorum, see Data integrity and high availability in an Eon Mode database.
Scaling the Kubernetes subcluster
When you scale a hybrid cluster, you add nodes from the primary subcluster to the secondary subcluster on Kubernetes.
HDFS with Kerberos authentication
If you are scaling a cluster that authenticates Hadoop file storage (HDFS) data with Kerberos, you must alter the database configuration before you scale.
In the default configuration, the Vertica server process running in the Kubernetes pods cannot access the HDFS data due to incorrect permissions on the keytab file mounted in the pod. This requires that you set the KerberosEnableKeytabPermissionCheck
Kerberos parameter:
- Set the
KerberosEnableKeytabPermissionCheck
configuration parameter to0
:=> ALTER DATABASE DEFAULT SET KerberosEnableKeytabPermissionCheck = 0; WARNING 4324: Parameter KerberosEnableKeytabPermissionCheck will not take effect until database restart ALTER DATABASE
- Restart the cluster with admintools so that the new setting takes effect:
$ /opt/vertica/bin/admintools -t restart_db --database database-name;
For additional details about Vertica on Kubernetes and HDFS, see Configuring communal storage.
Scale the subcluster
When you add nodes from the primary subcluster to the secondary subcluster on Kubernetes, you must set up the configuration directory for the new nodes and change operator behavior during the scaling event:
-
Execute the
update_vertica
script to set up the configuration directory. Vertica on Kubernetes requires the following configuration options forupdate_vertica
:$ /opt/vertica/sbin/update_vertica \ --accept-eula \ --add-hosts host-list \ --dba-user-password dba-user-password \ --failure-threshold NONE \ --no-system-configuration \ --point-to-point \ --data-dir /data-dir \ --dba-user dbadmin \ --no-package-checks \ --no-ssh-key-install
-
Set autoRestartVertica to
false
so that the operator does not interfere with the scaling operation:$ kubectl patch vdb database-name --type=merge --patch='{"spec": {"autoRestartVertica": false}}'
-
Add the new nodes with the admintools
db_add_node
option:$ /opt/vertica/bin/admintools \ -t db_add_node \ --hosts host-list \ --database database-name\ --subcluster sc-name \ --noprompt
For details, see Adding and removing nodes from subclusters.
-
After the scaling operation, set
autoRestartVertica
back totrue
:$ kubectl patch vdb database-name --type=merge --patch='{"spec": {"autoRestartVertica": true}}'