This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Subclusters on Kubernetes
Eon Mode uses subclusters for workload isolation and scaling.
Eon Mode uses subclusters for workload isolation and scaling. The VerticaDB operator provides tools to direct external client communications to specific subclusters, and automate scaling without stopping your database.
The custom resource definition (CRD) provides parameters that allow you to fine-tune each subcluster for specific workloads. For example, you can increase the subcluster size
setting for increased throughput, or adjust the resource requests and limits to manage compute power. When you create a custom resource instance, the operator deploys each subcluster as a StatefulSet. Each StatefulSet has a service object, which allows an external client to connect to a specific subcluster.
Naming conventions
Kubernetes derives names for the subcluster Statefulset, service object, and pod from the subcluster name. This naming convention tightly couples the subcluster objects to help Kubernetes manage the cluster effectively. If you want to rename a subcluster, you must delete it from the CRD and redefine it so that the operator can create new objects with a derived name.
Kubernetes forms an object's fully qualified domain name (FQDN) with its resource type name, so resource type names must follow FQDN naming conventions. The underscore character ( "_" ) does not follow FQDN rules, but you can use it in the subcluster name. Vertica converts each underscore to a hyphen ( "-
" ) in the FQDN for any object name derived from the subcluster name. For example, Vertica generates a default subcluster and names it default_subcluster
, and then converts the corresponding portion of the derived object's FQDN to default-subcluster
.
For additional naming guidelines, see the Kubernetes documentation.
External client connections
External clients can target specific subclusters that are fine-tuned to handle their workload. Each subcluster has a service object that handles external connections. To target multiple subclusters with a single service object, assign each subcluster the same spec.subclusters.serviceName
value in the custom resource (CR). For implementation details, see VerticaDB custom resource definition.
The operator performs health monitoring that checks whether the Vertica daemon is running on each pod. If the daemon is running, then the operator allows the service object to route traffic to the pod.
By default, the service object derives its name from the custom resource name and the associated subcluster and uses the following format:
customResourceName-subclusterName
To override this default format, set the subclusters[i].serviceName CR parameter, which changes the format to the following:
metadata.name-serviceName
Vertica supports the following service object types:
-
ClusterIP: The default service type. This service provides internal load balancing, and sets a stable IP and port that is accessible from within the subcluster only.
-
NodePort: Provides external client access. You can specify a port number for each host node in the subcluster to open for client connections.
-
LoadBalancer: Uses a cloud provider load balancer to create NodePort and ClusterIP services as needed. For details about implementation, see the Kubernetes documentation and your cloud provider documentation.
For configuration details, see VerticaDB custom resource definition.
Managing internal and external workloads
The Vertica StatefulSet is associated with an external service object. All external client requests are sent through this service object and load balanced among the pods in the cluster.
Import and export
Importing and exporting data between a cluster outside of Kubernetes requires that you expose the service with the NodePort
or LoadBalancer
service type and properly configure the network.
Important
When importing or exporting data, each node must have a static IP address. Rescheduled pods might be on different host nodes, so you must monitor and update the static IP addresses to reflect the new node.
For more information, see Configuring the Network to Import and Export Data.
1 - Client proxy for subclusters
Enables you to configure a client proxy for each subcluster to communicate with all its nodes instead of connecting directly to the database nodes.
Important
This is a beta feature.
A proxy between the client and the Vertica server helps manage communication. You can configure client proxy pod(s) for each subcluster which communicate with all nodes in the subcluster instead of connecting directly to the database nodes. The VerticaDB operator mounts a config map as the configuration file in the proxy pod(s) and automatically updates the config map when the state of the subcluster changes.
For each subcluster, a client proxy deployment with the name <vdb-name>-<subcluster-name>-proxy
and a client proxy config map with the name <vdb-name>-<subcluster-name>-proxy-cm
are created. You can only verify if the deployment and config map with these names have been created, but you must not edit them.
When a new connection request is made, it is redirected to a node based on the workload specified in the request. If no workload is provided, the default workload is used. The proxy retrieves the list of available nodes for that workload and redirects the request according to the load balancing policy. To reduce performance impact, the proxy caches the node list for a predefined period, which minimizes server calls and improves overall performance.
During an online upgrade, Vertica transfers active connections from a subcluster that is scheduled to shut down. The proxy detects and handles session transfer messages from the server.
Enabling client proxy pod
To enable client proxy for the Vertica database, set the vertica.com/use-client-proxy
annotation to true
.
metadata:
annotations:
vertica.com/use-client-proxy: "true"
vertica.com/client-proxy-log-level: INFO
...
spec:
...
proxy:
image: opentext/client-proxy:latest
...
subclusters:
- affinity: {}
name: sc1
proxy:
replicas: 1
resources: {}
Note
- Replicas must be >=1 (for any new subcluster in the VerticaDB).
- The
vertica.com/use-client-proxy
annotation cannot be changed (update, add, or delete) after VerticaDB is created.
- spec.proxy.image cannot be changed (update, add, or delete) after VerticaDB is created.
- You can set the log level to one of these values, TRACE|DEBUG|INFO|WARN|FATAL|NONE to control logging granularity.
INFO
is the default log level.
Creating replicas of client proxy pod
You can create more than one client proxy pod for a subcluster. To do this, set spec.subclusters[].proxy.replicas
to a value >1 based on your requirement.
...
subclusters:
- affinity: {}
name: sc1
proxy:
replicas: 1
resources: {}
Verifying deployment and config map
After client proxy is enabled, you can verify the deployment and config map.
To check the deployment:
$ kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
vertica-db-sc1-proxy 1/1 1 1 5m57s
verticadb-operator-manager 1/1 1 1 3h42m
To check the config map:
$ kubectl get cm
NAME DATA AGE
vertica-db-sc1-proxy-cm 1 6m10s
verticadb-operator-manager-config 24 3h42m
$ kubectl describe configmap vertica-db-sc1-proxy-cm
Name: vertica-db-sc1-proxy-cm
Namespace: vertica
Labels: app.kubernetes.io/component=database
app.kubernetes.io/instance=vertica-db
app.kubernetes.io/managed-by=verticadb-operator
app.kubernetes.io/name=vertica
app.kubernetes.io/version=25.1.0-0
vertica.com/database=vertica
Annotations: vertica.com/operator-deployment-method: helm
vertica.com/operator-version: 25.1.0-0
Data
====
config.yaml:
----
listener:
host: ""
port: 5433
database:
nodes:
- vertica-db-sc1-0.vertica-db.vertica.svc.cluster.local:5433
- vertica-db-sc1-1.vertica-db.vertica.svc.cluster.local:5433
- vertica-db-sc1-2.vertica-db.vertica.svc.cluster.local:5433
log:
level: INFO
BinaryData
====
Events: <none>
Connecting to Vertica nodes through client proxy
You can run the following command to verify that the client proxy pod is created:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
vertica-db-sc1-0 2/2 Running 0 19h 10.244.1.244 k8s-ubuntu20-05.verticacorp.com <none> <none>
vertica-db-sc1-1 2/2 Running 0 19h 10.244.1.246 k8s-ubuntu20-05.verticacorp.com <none> <none>
vertica-db-sc1-2 2/2 Running 0 19h 10.244.2.218 k8s-ubuntu20-06 <none> <none>
vertica-db-sc1-proxy-b46578c96-bhs5r 1/1 Running 0 19h 10.244.2.214 k8s-ubuntu20-06 <none> <none>
verticadb-operator-manager-75ddffb477-qmbpf 1/1 Running 0 23h 10.244.1.214 k8s-ubuntu20-05.verticacorp.com <none> <none>
In this example, the IP of the client proxy pod is 10.244.2.214
.
You can still use NodePort or load balancer to connect to the service of the subcluster through the client proxy. The service will now redirect the connection to the client proxy instead of the Vertica nodes. Here, the service verica-db-sc1
has a load balancer a24fb01e0875e4adc844aa046951366f-55b4172b9dacecfb.elb.us-east-1.amazonaws.com
.
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
vertica-db ClusterIP None <none> 5434/TCP,4803/TCP,8443/TCP,5554/TCP 13d
vertica-db-sc1 LoadBalancer 172.30.84.160 a24fb01e0875e4adc844aa046951366f-55b4172b9dacecfb.elb.us-east-1.amazonaws.com 5433:31475/TCP,8443:30239/TCP 13d
In the following example, we use the vsql
client to connect via the service:
$ /opt/vertica/bin/vsql -h a24fb01e0875e4adc844aa046951366f-55b4172b9dacecfb.elb.us-east-1.amazonaws.com -U dbadmin
Welcome to vsql, the Vertica Analytic Database interactive terminal.
Type: \h or \? for help with vsql commands
\g or terminate with semicolon to execute query
\q to quit
vertica=> select node_name,client_hostname,client_type,client_os_hostname from current_session;
node_name | client_hostname | client_type | client_os_hostname
--------------------+--------------------+-------------+---------------------------------
v_vertica_node0001 | 10.244.2.214:46750 | vsql | k8s-ubuntu20-04.verticacorp.com
(1 row)
You will notice that in the server session, the client_hostname
shows the client proxy’s Cluster-IP (10.244.2.214
in this case) instead of the actual client machine.
2 - Scaling subclusters
The operator enables you to scale the number of subclusters and the number of pods per subcluster automatically.
The operator enables you to scale the number of subclusters and the number of pods per subcluster automatically. This utilizes or conserves resources depending on the immediate needs of your workload.
The following sections explain how to scale resources for new workloads. For details about scaling resources for existing workloads, see VerticaAutoscaler custom resource definition.
Prerequisites
Scaling the number of subclusters
Adjust the number of subclusters in your custom resource to fine-tune resources for short-running dashboard queries. For example, increase the number of subclusters to increase throughput. For more information, see Improving query throughput using subclusters.
-
Use kubectl edit
to open your default text editor and update the YAML file for the specified custom resource. The following command opens a custom resource named vdb
for editing:
-
In the spec
section of the custom resource, locate the subclusters
subsection. Begin with the type
field to define a new subcluster.
The type
field indicates the subcluster type. Because there is already a primary subcluster, enter Secondary
:
spec:
...
subclusters:
...
- type: secondary
-
Follow the steps in VerticaDB custom resource definition to complete the subcluster definition. The following completed example adds a secondary subcluster for dashboard queries:
spec:
...
subclusters:
- type: primary
name: primary-subcluster
...
- type: secondary
name: dashboard
clientNodePort: 32001
resources:
limits:
cpu: 32
memory: 96Gi
requests:
cpu: 32
memory: 96Gi
serviceType: NodePort
size: 3
-
Save and close the custom resource file. When the update completes, you receive a message similar to the following:
verticadb.vertica.com/vertica-db edited
-
Use the kubectl wait
command to monitor when the new pods are ready:
$ kubectl wait --for=condition=Ready pod --selector app.kubernetes.io/name=verticadb --timeout 180s
pod/vdb-dashboard-0 condition met
pod/vdb-dashboard-1 condition met
pod/vdb-dashboard-2 condition met
Scaling the pods in a subcluster
For long-running, analytic queries, increase the pod count for a subcluster. See Using elastic crunch scaling to improve query performance.
-
Use kubectl edit
to open your default text editor and update the YAML file for the specified custom resource. The following command opens a custom resource named verticadb
for editing:
-
Update the subclusters.size
value to 6:
spec:
...
subclusters:
...
- type: secondary
...
size: 6
Shards are rebalanced automatically.
-
Save and close the custom resource file. You receive a message similar to the following when you successfully update the file:
verticadb.vertica.com/verticadb edited
-
Use the kubectl wait
command to monitor when the new pods are ready:
$ kubectl wait --for=condition=Ready pod --selector app.kubernetes.io/name=verticadb --timeout 180s
pod/vdb-subcluster1-3 condition met
pod/vdb-subcluster1-4 condition met
pod/vdb-subcluster1-5 condition met
Removing a subcluster
Remove a subcluster when it is no longer needed, or to preserve resources.
Important
Because each custom resource instance requires a primary subcluster, you cannot remove all subclusters.
-
Use kubectl edit
to open your default text editor and update the YAML file for the specified custom resource. The following command opens a custom resource named verticadb
for editing:
-
In the subclusters
subsection nested under spec
, locate the subcluster that you want to delete. Delete the element in the subcluster array represents the subcluster that you want to delete. Each element is identified by a hyphen (-).
-
After you delete the subcluster and save, you receive a message similar to the following:
verticadb.vertica.com/verticadb edited