These tips can help you avoid issues related to your Vertica on Kubernetes deployment and troubleshoot any problems that occur.
Download the kubectl command line tool to debug your Kubernetes resources.
This is the multi-page printable view of this section. Click here to print.
These tips can help you avoid issues related to your Vertica on Kubernetes deployment and troubleshoot any problems that occur.
Download the kubectl command line tool to debug your Kubernetes resources.
When you deploy a custom resource (CR), you might encounter a variety of issues. To pinpoint an issue, use the following commands to inspect the objects that the CR creates:
kubectl get
returns basic information about deployed objects:
kubectl describe
returns detailed information about deployed objects:
Because the operator takes time to perform tasks, updates to the custom resource are not effective immediately. Use the kubectl command line tool to verify that changes are applied.
You can use the kubectl wait command to wait for a specified condition. For example, the operator uses the UpgradeInProgress
condition to provide an upgrade status. After you begin the image version upgrade, wait until the operator acknowledges the upgrade and sets this condition to True:
After the upgrade begins, you can wait until the operator leaves upgrade mode and sets this condition to False:
For more information about kubectl wait, see the kubectl reference documentation.
When you check the pods in your cluster, the pods are running but the database is not ready:
To find the root cause of the issue, use kubectl logs
to check the operator manager. The following example shows that the communal storage bucket does not exist:
Create an S3 bucket for the cluster:
Use kubectl get pods
to verify that the cluster uses the new S3 bucket and the database is ready:
After you create a custom resource instance, the database is not available. The kubectl get
custom-resource
command does not display information:
Use kubectl describe
custom-resource
to check the events for the pods to identify any issues:
In this circumstance, the custom resource uses a Secret named su-passwd
to store the Superuser Password Secret
, but there is no such Secret available. Create a Secret named su-passwd
to store the Secret:
For detailed steps about creating the Secret manifest and applying it to a namespace, see the Kubernetes documentation.
For details about Vertica and secret credentials, see Secrets management.
Use kubectl get
custom-resource
to verify the issue is resolved:
You receive an ImagePullBackOff error when you deploy a Vertica cluster with Helm charts, but you do not pre-pull the Vertica image from the local registry server:
This occurs because the Vertica image size is too big to pull from the registry while deploying the Vertica cluster. Execute the following command on a Kubernetes host:
To solve this issue, complete one of the following:
Pull the Vertica images on each node before creating the Vertica StatefulSet:
Use the reduced-size vertica/vertica-k8s:latest image for the Vertica server.
If your host nodes do not have enough resources to fulfill the resource request from a pod, the pod stays in pending status.
In the following example, the pod requests 40 CPUs on the host node, and the pod stays in Pending:
To confirm the resources available on the host node. The following command confirms that the host node has only 40 allocatable CPUs:
To correct this issue, reduce the resource.requests
in the subcluster to values lower than the maximum allocatable CPUs. The following example uses a YAML-formatted file named patch.yaml
to lower the resource requests for the pod:
When you remove a host node from your Kubernetes cluster, a Vertica pod might stay in pending status if the pod uses a PersistentVolume (PV) that has a node affinity rule that prevents the pod from running on another node.
To resolve this issue, you must verify that the pods are pending because of an affinity rule, and then use the vdb-gen
tool to revive the entire cluster.
First, determine if the pod is pending because of a node affinity rule. This requires details about the pending pod, the PersistentVolumeClaim (PVC) associated with the pod, and the PersistentVolume (PV) associated with the PVC:
Use kubectl describe
to return details about the pending pod:
The Message
column verifies that the pod was not scheduled due a volume node affinity conflict
.
Get the name of the PVC associated with the pod:
Use the PVC to get the PV. PVs are associated with nodes:
Use the PV to get the name of the node that has the affinity rule:
Verify that the node with the affinity rule is the node that was removed from the Kubernetes cluster.
Next, you must revive the entire cluster to get all pods running again. When you revive the cluster, you create new PVCs that restore the association between each pod and a PV to satisfy the node affinity rule.
While you have nodes running in the cluster, you can use the vdb-gen
tool to generate a manifest and revive the database:
Download the vdb-gen
tool from the vertica-kubernetes GitHub repository:
Copy the tool into a pod that has a running Vertica process:
The vdb-gen
tool requires the database name, so retrieve it with the following command:
Run the vdb-gen
tool with the database name. The following command runs the tool and pipes the output to a file named revive.yaml
:
Copy revive.yaml
to your local machine so that you can use it after you remove the cluster:
Save the current VerticaDB Custom Resource (CR). For example, the following command saves a CR named vertdb
to a file named orig.yaml
:
Update revive.yaml
with parts of orig.yaml
that vdb-gen
did not capture. For example, custom resource limits.
Delete the existing Vertica cluster:
Confirm that all PVCs that are associated with the deleted cluster were removed:
Retrieve the PVC names. A PVC name uses the dbname
-subcluster
-podindex
format:
Delete the PVCs:
Revive the database with revive.yaml
:
After the revive completes, all Vertica pods are running, and PVCs are recreated on new nodes. Wait for the operator to start the database.
Vertica does not officially support Istio because the Istio sidecar port requirement conflicts with the port that Vertica requires for internal node communication. However, you can deploy Vertica on Kubernetes to Istio with changes to the Istio InboundInterceptionMode setting. Vertica provides access to this setting with annotations on the VerticaDB CR.
REDIRECT
mode is the default InboundInterceptionMode setting, and it requires that you disable network address translation (NAT) on port 5434, the port that the pods use for internal communication. Disable NAT on this port with the excludeInboundPorts
annotation:
If you use custom certificates when you install the operator with the Helm chart, the helm install
or kubectl apply
command might return an error similar to the following:
$ kubectl apply -f ../operatorcrd.yaml
Error from server (InternalError): error when creating "../operatorcrd.yaml": Internal error occurred: failed calling webhook "mverticadb.kb.io": Post "https://verticadb-operator-webhook-service.namespace.svc:443/mutate-vertica-com-v1-verticadb?timeout=10s": x509: certificate is valid for ip-10-0-21-169.ec2.internal, test-bastion, not verticadb-operator-webhook-service.default.svc
You receive this error when the TLS key's Domain Name System (DNS) or Subject Alternate Name (SAN) is incorrect. To correct this error, define the DNS and SAN in a configuration file in the following format:
commonName = verticadb-operator-webhook-service.namespace.svc
...
[alt_names]
DNS.1 = verticadb-operator-webhook-service.namespace.svc
DNS.2 = verticadb-operator-webhook-service.namespace.svc.cluster.local
For additional details, see Installing the VerticaDB operator.
Vertica provides the vlogger image that sends logs from vertica.log
to standard output on the host node for log aggregation.
To add the sidecar to the CR, add an element to the spec.sidecars
definition:
To test the sidecar, run the following command and verify that it returns logs:
In some circumstances, you might need to examine a core file that contains information about the Vertica server container process.
The following steps generate a core file for the Vertica server process:
Use the securityContext
value to set the privileged
property to true
:
On the host machine, verify that /proc/sys/kernel/core_pattern
is set to core
:
The /proc/sys/kernel/core_pattern
file is not namespaced, so setting this value affects all containers running on that host.
When Vertica generates a core, the machine writes a message to vertica.log
that indicates where you can locate the core file.
If you want to generate a core file in OpenShift, you must add the SYS_PTRACE
capability in the CR to collect vstacks:
Use the securityContext
value to set the capabilities.add
property to ["SYS_PRTRACE"]
:
Apply the changes:
Get a shell in the container and execute vstack
as the superuser:
You might notice that your VerticaAutoScaler is not scaling correctly according to CPU utilization:
You receive this error because the metrics server is not installed:
To install the metrics server:
Download the components.yaml file:
Optionally, disable TLS:
Apply the YAML file:
Verify that the metrics server is running:
You might receive an error that states:
failed to get cpu utilization: missing request for cpu
You get this error because you must set resource limits on all containers, including sidecar containers. To correct this error:
Verify the error:
Add resource limits to the CR:
Apply the update:
When you set a new CPU resource limit, Kubernetes reschedules each pod in the StatefulSet in a rolling update until all pods have the updated CPU resource limit.