VerticaDB custom resource definition
Important
Beginning with version 24.1.0, the VerticaDB operator version 2.0.0 manages deployments with vclusterops
, a Go library that uses a high-level REST interface to leverage the Node Management Agent and HTTPS service. The vclusterops
library replaces Administration tools (admintools), so you cannot access a shell within a container and execute any admintools commands.
Version 24.1.0 also introduces API version v1
. All examples in this section use API version v1
and the vcluster deployment type. API version v1beta
is deprecated, and Vertica recommends that you migrate to API version v1
. For migration details, see Upgrading Vertica on Kubernetes.
The Kubernetes API server stores only one format version of the custom resource. If you migrate to API version v1
and then create a custom resource with API version v1beta1
, then the conversion webhook converts the custom resource to API version v1
automatically.
If you migrated to API version v1
, you can view the v1beta1
equivalent of your custom resource with the following command:
$ kubectl get verticadbs.v1beta1.vertica.com cr-name -o yaml
The VerticaDB custom resource definition (CRD) deploys an Eon Mode database. Each subcluster is a StatefulSet, a workload resource type that persists data with ephemeral Kubernetes objects.
A VerticaDB custom resource (CR) requires a primary subcluster and a connection to a communal storage location to persist its data. The VerticaDB operator monitors the CR to maintain its desired state and validate state changes.
The following sections provide a YAML-formatted manifest that defines the minimum required fields to create a VerticaDB CR, and each subsequent section implements a production-ready recommendation or best practice using custom resource parameters. For a comprehensive list of all parameters and their definitions, see custom resource parameters.
Prerequisites
- Complete Installing the VerticaDB operator.
- Configure a dynamic volume provisioner.
- Confirm that you have the resources to deploy objects you plan to create.
- Optionally, acquire a Vertica license. By default, the Helm chart deploys the free Community Edition license. This license limits you to a three-node cluster and 1TB data.
- Configure a supported communal storage location with an empty communal path bucket.
- Understand Kubernetes Secrets and how Vertica manages Secrets. Secrets conceal sensitive information in your custom resource.
Minimal manifest
At minimum, a VerticaDB CR requires a connection to an empty communal storage bucket and a primary subcluster definition. The operator is namespace-scoped, so make sure that you apply the CR manifest in the same namespace as the operator.
The following VerticaDB CR connects to S3 communal storage and deploys a three-node primary subcluster on three nodes. This manifest serves as the starting point for all implementations detailed in the subsequent sections:
apiVersion: vertica.com/v1
kind: VerticaDB
metadata:
name: cr-name
spec:
licenseSecret: vertica-license
passwordSecret: su-password
communal:
path: "s3://bucket-name/key-name"
endpoint: "https://path/to/s3-endpoint"
credentialSecret: s3-creds
region: region
subclusters:
- name: primary
size: 3
shardCount: 6
The following sections detail the minimal manifest's CR parameters, and how to create the CR in the current namespace.
Required fields
Each VerticaDB manifest begins with required fields that describe the version, resource type, and metadata:
apiVersion
: The API group and Kubernetes API version inapi-group/version
format.kind
: The resource type.VerticaDB
is the name of the Vertica custom resource type.metadata
: Data that identifies objects in the namespace.metadata.name
: The name of this CR object. Provide a uniquemetadata.name
value so that you can identify the CR and its resources in its namespace.
spec definition
The spec
field defines the desired state of the CR. The operator control loop compares the spec
definition to the current state and reconciles any differences.
Nest all fields that define your StatefulSet under the spec
field.
Add a license
By default, the Helm chart pulls the free Vertica Community Edition (CE) image. The CE image has a restricted license that limits you to a three-node cluster and 1TB of data.
To add your license so that you can deploy more nodes and use more data, store your license in a Secret and add it to the manifest:
- Create a Secret from your Vertica license file:
$ kubectl create secret generic vertica-license --from-file=license.dat=/path/to/license-file.dat
- Add the Secret to the
licenseSecret
field:... spec: licenseSecret: vertica-license ...
The licenseSecret
value is mounted in the Vertica server container in the /home/dbadmin/licensing/mnt
directory.
Add password authentication
The passwordSecret
field enables password authentication for the database. You must define this field when you create the CR—you cannot define a password for an existing database.
To create a database password, conceal it in a Secret before you add it to the mainfest:
- Create a Secret from a literal string. You must use
password
as the key:$ kubectl create secret generic su-passwd --from-literal=password=password-value
- Add the Secret to the
passwordSecret
field:... spec: ... passwordSecret: su-password
Connect to communal storage
Vertica on Kubernetes supports multiple communal storage locations. For implementation details for each communal storage location, see Configuring communal storage.
This CR connects to an S3 communal storage location. Define your communal storage location with the communal
field:
...
spec:
...
communal:
path: "s3://bucket-name/key-name"
endpoint: "https://path/to/s3-endpoint"
credentialSecret: s3-creds
region: region
...
This manifest sets the following parameters:
-
credentialSecret
: The Secret that contains your communal access and secret key credentials.The following command stores both your S3-compatible communal access and secret key credentials in a Secret named
s3-creds
:$ kubectl create secret generic s3-creds --from-literal=accesskey=accesskey --from-literal=secretkey=secretkey
Note
OmitcredentialSecret
for environments that authenticate to S3 communal storage with Identity and Access Management (IAM) or IAM roles for service accounts (IRSA)—these methods do not require that you store your credentials in a Secret. For details, see Configuring communal storage. -
endpoint
: The S3 endpoint URL. -
path
: The location of the S3 storage bucket, in S3 bucket notation. This bucket must exist before you create the custom resource. After you create the custom resource, you cannot change this value. -
region
: The geographic location of the communal storage resources. This field is valid for AWS and GCP only. If you set the wrong region, you cannot connect to the communal storage location.
Define a primary subcluster
Each CR requires a primary subcluster or it returns an error. At minimum, you must define the name and size of the subcluster:
...
spec:
...
subclusters:
- name: primary
size: 3
...
This manifest sets the following parameters:
name
: The name of the subcluster.size
: The number of pods in the subcluster.
When you define a CR with a single subcluster, the operator designates it as the primary subcluster. If your manifest includes multiple subclusters, you must use the type
parameter to identify the primary subcluster. For example:
spec:
...
subclusters:
- name: primary
size: 3
type: primary
- name: secondary
size: 3
For additional details about primary and secondary subclusters, see Subclusters.
Set the shard count
shardCount
specifies the number of shards in the database, which determines how subcluster nodes subscribe to communal storage data. You cannot change this value after you instantiate the CR. When you change the number of pods in a subcluster or add or remove a subcluster, the operator rebalances shards automatically.
Vertica recommends that the shard count equals double the number of nodes in the cluster. Because this manifest creates a three-node cluster with one Vertica server container per node, set shardCount
to 6
:
...
spec:
...
shardCount: 6
For guidance on selecting the shard count, see Configuring your Vertica cluster for Eon Mode. For details about limiting each node to one Vertica server container, see Node affinity.
Apply the manifest
After you define the minimal manifest in a YAML-formatted file, use kubectl
to create the VerticaDB CR. The following command creates a CR in the current namespace:
$ kubectl apply -f minimal.yaml
verticadb.vertica.com/cr-name created
After you apply the manifest, the operator creates the primary subcluster, connects to the communal storage, and creates the database. You can use kubectl wait
to see when the database is ready:
$ kubectl wait --for=condition=DBInitialized=True vdb/cr-name --timeout=10m
verticadb.vertica.com/cr-name condition met
Specify an image
Each time the operator launches a container, it pulls the image for the most recently released Vertica version from the OpenText Dockerhub repository. Vertica recommends that you explicitly set the image that the operator pulls for your CR. For a list of available Vertica images, see the OpenText Dockerhub registry.
To run a specific image version, set the image
parameter in docker-registry-hostname/image-name:tag
format:
spec:
...
image: vertica/vertica-k8s:version
When you specify an image other than the latest
, the operator pulls the image only when it is not available locally. You can control when the operator pulls the image with the imagePullPolicy
custom resource parameter.
Communal storage authentication
Your communal storage validates HTTPS connections with a self-signed certificate authority (CA) bundle. You must make the CA bundle's root certificate available to each Vertica server container so that the communal storage can authenticate requests from your subcluster.
This authentication requires that you set the following parameters:
-
certSecrets
: Adds a Secret that contains the root certificate.This parameter is a list of Secrets that encrypt internal and external communications for your CR. Each certificate is mounted in the Vertica server container filesystem in the
/certs/
Secret-name
/
cert-name
directory. -
communal.caFile
: Makes the communal storage location aware of the mount path that stores the certificate Secret.
Complete the following to add these parameters to the manifest:
- Create a Secret that contains the PEM-encoded root certificate. The following command creates a Secret named
aws-cert
:$ kubectl create secret generic aws-cert --from-file=root-cert.pem
- Add the
certSecrets
andcommunal.caFile
parameters to the manifest:spec: ... communal: ... caFile: /certs/aws-cert/root_cert.pem certSecrets: - name: aws-cert
Now, the communal storage authenticates requests with the /certs/aws-cert/root_cert.pem
file, whose contents are stored in the aws-cert
Secret.
External client connections
Each subcluster communicates with external clients and internal pods through a service object. To configure the service object to accept external client connections, set the following parameters:
-
serviceName
: Assigns a custom name to the service object. A custom name lets you identify it among multiple subclusters.Service object names use the
metadata.name-serviceName
naming convention. -
serviceType
: Defines the type of the subcluster service object.By default, a subcluster uses the
ClusterIP
serviceType, which sets a stable IP and port that is accessible from within Kubernetes only. In many circumstances, external client applications need to connect to a subcluster that is fine-tuned for that specific workload. For external client access, set theserviceType
toNodePort
orLoadBalancer
.Note
TheLoadBalancer
service type is an external service type that is managed by your cloud provider. For implementation details, refer to the Kubernetes documentation and your cloud provider's documentation. -
serviceAnnotations
: Assigns a custom annotation to the service object for implementation-specific services.
Add these external client connection parameters under the subclusters
field:
spec:
...
subclusters:
...
serviceName: connections
serviceType: LoadBalancer
serviceAnnotations:
service.beta.kubernetes.io/load-balancer-source-ranges: 10.0.0.0/24
This example creates a LoadBalancer
service object named verticadb-connections
. The serviceAnnotations
parameter defines the CIDRs that can access the network load balancer (NLB). For additional details, see the AWS Load Balancer Controller documentation.
Note
If you run your CR on Amazon Elastic Kubernetes Service (EKS), Vertica recommends the AWS Load Balancer Controller. To use the AWS Load Balancer Controller, apply the following annotations:
serviceAnnotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
For longer-running queries, you might need to configure TCP keepalive settings.
For additional details about Vertica and service objects, see Containerized Vertica on Kubernetes.
Authenticate clients
You might need to connect applications or command-line interface (CLI) tools to your VerticaDB CR. You can add TLS certificates that authenticate client requests with the certSecrets
parameter:
- Create a Secret that contains your TLS certificates. The following command creates a Secret named
mtls
:$ kubectl create secret generic mtls --from-file=mtls=/path/to/mtls-cert
- Add the Secret to the
certSecrets parameter
:This mounts the TLS certificates in thespec: ... certSecrets: ... - name: mtls
/certs/mtls/mtls-cert
directory.
Sidecar logger
A sidecar is a utility container that runs in the same pod as your main application container and performs a task for that main application's process. The VerticaDB CR uses a sidecar container to handle logs for the Vertica server container. You can use the vertica-logger image to add a sidecar that sends logs from vertica.log
to standard output on the host node for log aggregation.
Add a sidecar with the sidecars
parameter. This parameter accepts a list of sidecar definitions, where each element specifies the following:
name
: Name of the sidecar.name
indicates the beginning of a sidecar element.image
: Image for the sidecar container.
The following example adds a single sidecar container that shares a pod with each Vertica server container:
spec:
...
sidecars:
- name: sidecar-container
image: sidecar-image:latest
This configuration persists logs only for the lifecycle of the container. To persist log data between pod lifecycles, you must mount a custom volume in the sidecar filesystem.
Persist logs with a volume
An external service that requires long-term access to Vertica server data should use a volume to persist that data between pod lifecycles. For details about volumes, see the Kubernetes documentation.
The following parameters add a volume to your CR and mounts it in a sidecar container:
volumes
: Make a custom volume available to the CR so that you can mount it in a container filesystem. This parameter requires aname
value and a volume type.sidecars[i].volumeMounts
: Mounts one or more volumes in the sidecar container filesystem. This parameter requires aname
value and amountPath
value that defines where the volume is mounted in the sidecar container.Note
Vertica also provides a
spec.volumeMounts
parameter so you can mount volumes for other use cases. This parameter behaves likesidecars[i].volumeMounts
, but it mounts volumes in the Vertica server container filesystem.For details, see Custom resource definition parameters.
The following example creates a volume of type emptyDir
, and mounts it in the sidecar-container
filesystem:
spec:
...
volumes:
- name: sidecar-vol
emptyDir: {}
...
sidecars:
- name: sidecar-container
image: sidecar-image:latest
volumeMounts:
- name: sidecar-vol
mountPath: /path/to/sidecar-vol
Resource limits and requests
You should limit the amount of CPU and memory resources that each host node allocates for the Vertica server pod, and set the amount of resources each pod can request.
To control these values, set the following parameters under the subclusters.resources
field:
limits.cpu
: Maximum number of CPUs that each server pod can consume.limits.memory
: Maximum amount of memory that each server pod can consume.requests.cpu
: Number CPUs that each pod requests from the host node.requests.memory
: Amount of memory that each pod requests from a PV.
When you change resource settings, Kubernetes restarts each pod with the updated settings.
Note
Select resource settings that your host nodes can accommodate. When a pod is started or rescheduled, Kubernetes searches for host nodes with enough resources available to start the pod. If there is not a host node with enough resources, the pod STATUS stays in Pending until the resources become available.
For guidance on setting production limits and requests, see Recommendations for Sizing Vertica Nodes and Clusters.
As a best practice, set resource.limits.*
and resource.requests.*
to equal values so that the pods are assigned to the Guaranteed
Quality of Service (QoS) class. Equal settings also provide the best safeguard against the Out Of Memory (OOM) Killer in constrained environments.
The following example allocates 32 CPUs and 96 gigabytes of memory on the host node, and limits the requests to the same values. Because the limits.*
and requests.*
values are equal, the pods are assigned the Guaranteed
QoS class:
spec:
...
subclusters:
...
resources:
limits:
cpu: 32
memory: 96Gi
requests:
cpu: 32
memory: 96Gi
Node affinity
Kubernetes affinity and anti-affinity settings control which resources the operator uses to schedule pods. As a best practice, you should set affinity
to ensure that a single node does not serve more than one Vertica pod.
The following example creates an anti-affinity rule that schedules only one Vertica server pod per node:
spec:
...
subclusters:
...
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- vertica
topologyKey: "kubernetes.io/hostname"
The following provides a detailed explanation about all settings in the previous example:
affinity
: Provides control over pod and host scheduling using labels.podAntiAffinity
: Uses pod labels to prevent scheduling on certain resources.requiredDuringSchedulingIgnoredDuringExecution
: The rules defined under this statement must be met before a pod is scheduled on a host node.labelSelector
: Identifies the pods affected by this affinity rule.matchExpressions
: A list of pod selector requirements that consists of akey
,operator
, andvalues
definition. ThismatchExpression
rule checks if the host node is running another pod that uses avertica
label.topologyKey
: Defines the scope of the rule. Because this uses thehostname
topology label, this applies the rule in terms of pods and host nodes.
For additional details, see the Kubernetes documentation.