This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Subclusters on Kubernetes

Eon Mode uses subclusters for workload isolation and scaling.

1: Client proxy for subclusters
2: Scaling subclusters

Eon Mode uses subclusters for workload isolation and scaling. The VerticaDB operator provides tools to direct external client communications to specific subclusters, and automate scaling without stopping your database.

The custom resource definition (CRD) provides parameters that allow you to fine-tune each subcluster for specific workloads. For example, you can increase the subcluster size setting for increased throughput, or adjust the resource requests and limits to manage compute power. When you create a custom resource instance, the operator deploys each subcluster as a StatefulSet. Each StatefulSet has a service object, which allows an external client to connect to a specific subcluster.

Naming conventions

Kubernetes derives names for the subcluster Statefulset, service object, and pod from the subcluster name. This naming convention tightly couples the subcluster objects to help Kubernetes manage the cluster effectively. If you want to rename a subcluster, you must delete it from the CRD and redefine it so that the operator can create new objects with a derived name.

Kubernetes forms an object's fully qualified domain name (FQDN) with its resource type name, so resource type names must follow FQDN naming conventions. The underscore character ( "_" ) does not follow FQDN rules, but you can use it in the subcluster name. Vertica converts each underscore to a hyphen ( "-" ) in the FQDN for any object name derived from the subcluster name. For example, Vertica generates a default subcluster and names it default_subcluster, and then converts the corresponding portion of the derived object's FQDN to default-subcluster.

For additional naming guidelines, see the Kubernetes documentation.

External client connections

External clients can target specific subclusters that are fine-tuned to handle their workload. Each subcluster has a service object that handles external connections. To target multiple subclusters with a single service object, assign each subcluster the same spec.subclusters.serviceName value in the custom resource (CR). For implementation details, see VerticaDB custom resource definition.

The operator performs health monitoring that checks whether the Vertica daemon is running on each pod. If the daemon is running, then the operator allows the service object to route traffic to the pod.

By default, the service object derives its name from the custom resource name and the associated subcluster and uses the following format:

customResourceName-subclusterName

To override this default format, set the subclusters[i].serviceName CR parameter, which changes the format to the following:

metadata.name-serviceName

Vertica supports the following service object types:

ClusterIP: The default service type. This service provides internal load balancing, and sets a stable IP and port that is accessible from within the subcluster only.
NodePort: Provides external client access. You can specify a port number for each host node in the subcluster to open for client connections.
LoadBalancer: Uses a cloud provider load balancer to create NodePort and ClusterIP services as needed. For details about implementation, see the Kubernetes documentation and your cloud provider documentation.

For configuration details, see VerticaDB custom resource definition.

Managing internal and external workloads

The Vertica StatefulSet is associated with an external service object. All external client requests are sent through this service object and load balanced among the pods in the cluster.

Import and export

Importing and exporting data between a cluster outside of Kubernetes requires that you expose the service with the NodePort or LoadBalancer service type and properly configure the network.

Important

When importing or exporting data, each node must have a static IP address. Rescheduled pods might be on different host nodes, so you must monitor and update the static IP addresses to reflect the new node.

For more information, see Configuring the Network to Import and Export Data.

1 - Client proxy for subclusters

Enables you to configure a client proxy for each subcluster to communicate with all its nodes instead of connecting directly to the database nodes.

Important

This is a beta feature.

A proxy between the client and the Vertica server helps manage communication. You can configure client proxy pod(s) for each subcluster which communicate with all nodes in the subcluster instead of connecting directly to the database nodes. The VerticaDB operator mounts a config map as the configuration file in the proxy pod(s) and automatically updates the config map when the state of the subcluster changes.

For each subcluster, a client proxy deployment with the name <vdb-name>-<subcluster-name>-proxy and a client proxy config map with the name <vdb-name>-<subcluster-name>-proxy-cm are created. You can only verify if the deployment and config map with these names have been created, but you must not edit them.

When a new connection request is made, it is redirected to a node based on the workload specified in the request. If no workload is provided, the default workload is used. The proxy retrieves the list of available nodes for that workload and redirects the request according to the load balancing policy. To reduce performance impact, the proxy caches the node list for a predefined period, which minimizes server calls and improves overall performance.

During an online upgrade, Vertica transfers active connections from a subcluster that is scheduled to shut down. The proxy detects and handles session transfer messages from the server.

Enabling client proxy pod

To enable client proxy for the Vertica database, set the vertica.com/use-client-proxy annotation to true.

metadata:
  annotations:
    vertica.com/use-client-proxy: "true"
    vertica.com/client-proxy-log-level: INFO
...
spec:
...
  proxy:
    image: opentext/client-proxy:latest
  ...
  subclusters:
  - affinity: {}
    name: sc1
    proxy:
      replicas: 1
      resources: {}

Note

Replicas must be >=1 (for any new subcluster in the VerticaDB).
The vertica.com/use-client-proxy annotation cannot be changed (update, add, or delete) after VerticaDB is created.
spec.proxy.image cannot be changed (update, add, or delete) after VerticaDB is created.
You can set the log level to one of these values, TRACE|DEBUG|INFO|WARN|FATAL|NONE to control logging granularity. INFO is the default log level.

Creating replicas of client proxy pod

You can create more than one client proxy pod for a subcluster. To do this, set spec.subclusters[].proxy.replicas to a value >1 based on your requirement.

  ...
  subclusters:
  - affinity: {}
    name: sc1
    proxy:
      replicas: 1
      resources: {}

Verifying deployment and config map

After client proxy is enabled, you can verify the deployment and config map.

To check the deployment:

$ kubectl get deployment
NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
vertica-db-sc1-proxy         1/1     1            1           5m57s
verticadb-operator-manager   1/1     1            1           3h42m

To check the config map:

$ kubectl get cm
NAME                                DATA   AGE
vertica-db-sc1-proxy-cm             1      6m10s
verticadb-operator-manager-config   24     3h42m

$ kubectl describe configmap vertica-db-sc1-proxy-cm
Name:         vertica-db-sc1-proxy-cm
Namespace:    vertica
Labels:       app.kubernetes.io/component=database
              app.kubernetes.io/instance=vertica-db
              app.kubernetes.io/managed-by=verticadb-operator
              app.kubernetes.io/name=vertica
              app.kubernetes.io/version=25.1.0-0
              vertica.com/database=vertica
Annotations:  vertica.com/operator-deployment-method: helm
              vertica.com/operator-version: 25.1.0-0
 
Data
====
config.yaml:
----
listener:
  host: ""
  port: 5433
database:
  nodes:
  - vertica-db-sc1-0.vertica-db.vertica.svc.cluster.local:5433
  - vertica-db-sc1-1.vertica-db.vertica.svc.cluster.local:5433
  - vertica-db-sc1-2.vertica-db.vertica.svc.cluster.local:5433
log:
  level: INFO
 
 
BinaryData
====
 
Events:  <none>

Connecting to Vertica nodes through client proxy

You can run the following command to verify that the client proxy pod is created:

$ kubectl get pods -o wide
NAME                                          READY   STATUS    RESTARTS   AGE   IP             NODE                              NOMINATED NODE   READINESS GATES
vertica-db-sc1-0                              2/2     Running   0          19h   10.244.1.244   k8s-ubuntu20-05.verticacorp.com   <none>           <none>
vertica-db-sc1-1                              2/2     Running   0          19h   10.244.1.246   k8s-ubuntu20-05.verticacorp.com   <none>           <none>
vertica-db-sc1-2                              2/2     Running   0          19h   10.244.2.218   k8s-ubuntu20-06                   <none>           <none>
vertica-db-sc1-proxy-b46578c96-bhs5r          1/1     Running   0          19h   10.244.2.214   k8s-ubuntu20-06                   <none>           <none>
verticadb-operator-manager-75ddffb477-qmbpf   1/1     Running   0          23h   10.244.1.214   k8s-ubuntu20-05.verticacorp.com   <none>           <none>

In this example, the IP of the client proxy pod is 10.244.2.214.

You can still use NodePort or load balancer to connect to the service of the subcluster through the client proxy. The service will now redirect the connection to the client proxy instead of the Vertica nodes. Here, the service verica-db-sc1 has a load balancer a24fb01e0875e4adc844aa046951366f-55b4172b9dacecfb.elb.us-east-1.amazonaws.com.

$ kubectl get svc
NAME            TYPE           CLUSTER-IP      EXTERNAL-IP                                                                     PORT(S)                               AGE
vertica-db       ClusterIP      None            <none>                                                                          5434/TCP,4803/TCP,8443/TCP,5554/TCP   13d
vertica-db-sc1   LoadBalancer   172.30.84.160   a24fb01e0875e4adc844aa046951366f-55b4172b9dacecfb.elb.us-east-1.amazonaws.com   5433:31475/TCP,8443:30239/TCP         13d

In the following example, we use the vsql client to connect via the service:

$ /opt/vertica/bin/vsql -h a24fb01e0875e4adc844aa046951366f-55b4172b9dacecfb.elb.us-east-1.amazonaws.com -U dbadmin
Welcome to vsql, the Vertica Analytic Database interactive terminal.
 
Type:  \h or \? for help with vsql commands
       \g or terminate with semicolon to execute query
       \q to quit
 
vertica=> select node_name,client_hostname,client_type,client_os_hostname from current_session;
     node_name      |  client_hostname   | client_type |       client_os_hostname
--------------------+--------------------+-------------+---------------------------------
v_vertica_node0001 | 10.244.2.214:46750 | vsql        | k8s-ubuntu20-04.verticacorp.com
(1 row)

You will notice that in the server session, the client_hostname shows the client proxy’s Cluster-IP (10.244.2.214 in this case) instead of the actual client machine.

2 - Scaling subclusters

The operator enables you to scale the number of subclusters and the number of pods per subcluster automatically.

The operator enables you to scale the number of subclusters and the number of pods per subcluster automatically. This utilizes or conserves resources depending on the immediate needs of your workload.

The following sections explain how to scale resources for new workloads. For details about scaling resources for existing workloads, see VerticaAutoscaler custom resource definition.

Prerequisites

Complete Installing the VerticaDB operator.
Install the kubectl command line tool.
Complete VerticaDB custom resource definition.
Confirm that you have the resources to scale.

Note
By default, the custom resource uses the free Community Edition (CE) license. This license allows you to deploy up to three nodes with a maximum of 1TB of data. To add resources beyond these limits, you must add your Vertica license to the custom resource as described in VerticaDB custom resource definition.

Scaling the number of subclusters

Adjust the number of subclusters in your custom resource to fine-tune resources for short-running dashboard queries. For example, increase the number of subclusters to increase throughput. For more information, see Improving query throughput using subclusters.

Use kubectl edit to open your default text editor and update the YAML file for the specified custom resource. The following command opens a custom resource named vdb for editing:
```
$ kubectl edit vdb
```
In the spec section of the custom resource, locate the subclusters subsection. Begin with the type field to define a new subcluster.

The type field indicates the subcluster type. Because there is already a primary subcluster, enter Secondary:
```
spec:
...
  subclusters:
  ...
  - type: secondary
```

Follow the steps in VerticaDB custom resource definition to complete the subcluster definition. The following completed example adds a secondary subcluster for dashboard queries:

spec:
...
  subclusters:
  - type: primary
    name: primary-subcluster
  ...
  - type: secondary
    name: dashboard
    clientNodePort: 32001
    resources:
      limits:
        cpu: 32
        memory: 96Gi
      requests:
        cpu: 32
        memory: 96Gi
    serviceType: NodePort
    size: 3

Save and close the custom resource file. When the update completes, you receive a message similar to the following:
```
verticadb.vertica.com/vertica-db edited
```

Use the kubectl wait command to monitor when the new pods are ready:

$ kubectl wait --for=condition=Ready pod --selector app.kubernetes.io/name=verticadb --timeout 180s
pod/vdb-dashboard-0 condition met
pod/vdb-dashboard-1 condition met
pod/vdb-dashboard-2 condition met

Scaling the pods in a subcluster

For long-running, analytic queries, increase the pod count for a subcluster. See Using elastic crunch scaling to improve query performance.

Use kubectl edit to open your default text editor and update the YAML file for the specified custom resource. The following command opens a custom resource named verticadb for editing:
```
$ kubectl edit verticadb
```

Update the subclusters.size value to 6:

spec:
...
  subclusters:
  ...
  - type: secondary
    ...
    size: 6

Shards are rebalanced automatically.

Save and close the custom resource file. You receive a message similar to the following when you successfully update the file:

verticadb.vertica.com/verticadb edited

Use the kubectl wait command to monitor when the new pods are ready:

$ kubectl wait --for=condition=Ready pod --selector app.kubernetes.io/name=verticadb --timeout 180s
pod/vdb-subcluster1-3 condition met
pod/vdb-subcluster1-4 condition met
pod/vdb-subcluster1-5 condition met

Stopping and shutting down a subcluster

To optimize costs, you can gracefully shut down the subcluster and then the nodes the subcluster is running on. This approach is particularly effective when Vertica nodes run on dedicated instances ensuring that shutting down one subcluster does not impact other subclusters that need to remain online.

A subcluster can remain in a shutdown state for as long as required.

In the following example, subcluster sc2 will be stopped and remain in the shutdown state as long as shutdown is set to true.

spec:
...
  subclusters:
  ...
    name: sc2
    shutdown: true
    size: 3
    type: secondary

All pods in the subcluster will be deleted and will not be recreated until shutdown is set to false.

Checking the status

You can check the status of the subcluster as follows:

$ kubectl describe vdb
Name:         vertica-db
...
Events:
  Type    Reason           Age   From                Message
  ----    ------           ----  ----                -------
  Normal   StopSubclusterStart      20s                    verticadb-operator  Starting stop subcluster "sc2".
  Normal   StopSubclusterSucceeded  9s                     verticadb-operator  Successfully stopped subcluster "sc2".

Note that spec.subclusters[].shutdown is set to true for the subcluster that has been shut down.

$ kubectl describe vdb
Name:         vertica-db
...
spec:
...
  subclusters:
  ...
    name: sc2
    shutdown: true
...
status:
  ...
  subclusters:
    ...
    name:           sc2
    oid:            54043195528448686
    shutdown:       true
    upNodeCount:  0

You can start the subcluster again by setting spec.subclusters[].shutdown to false.

spec:
...
  subclusters:
  ...
    name: sc2
    shutdown: false
    size: 3
    type: secondary

Note

Ensure shutdown is not set to true when adding a new subcluster to a vdb.
You cannot sandbox or unsandbox a subcluster with shutdown set to true.
A subcluster cannot be removed if it has shutdown set to true.

Removing a subcluster

Remove a subcluster when it is no longer needed, or to preserve resources.

Important

Because each custom resource instance requires a primary subcluster, you cannot remove all subclusters.

Use kubectl edit to open your default text editor and update the YAML file for the specified custom resource. The following command opens a custom resource named verticadb for editing:
```
$ kubectl edit verticadb
```
In the subclusters subsection nested under spec, locate the subcluster that you want to delete. Delete the element in the subcluster array represents the subcluster that you want to delete. Each element is identified by a hyphen (-).
After you delete the subcluster and save, you receive a message similar to the following:
```
verticadb.vertica.com/verticadb edited
```