This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Subclusters on Kubernetes

Eon Mode uses subclusters for workload isolation and scaling.

Eon Mode uses subclusters for workload isolation and scaling. The VerticaDB operator provides tools to direct external client communications to specific subclusters, and automate scaling without stopping your database.

The custom resource definition (CRD) provides parameters that allow you to fine-tune each subcluster for specific workloads. For example, you can increase the subcluster size setting for increased throughput, or adjust the resource requests and limits to manage compute power. When you create a custom resource instance, the operator deploys each subcluster as a StatefulSet. Each StatefulSet has a service object, which allows an external client to connect to a specific subcluster.

Naming conventions

Kubernetes derives names for the subcluster Statefulset, service object, and pod from the subcluster name. This naming convention tightly couples the subcluster objects to help Kubernetes manage the cluster effectively. If you want to rename a subcluster, you must delete it from the CRD and redefine it so that the operator can create new objects with a derived name.

Kubernetes forms an object's fully qualified domain name (FQDN) with its resource type name, so resource type names must follow FQDN naming conventions. The underscore character ( "_" ) does not follow FQDN rules, but you can use it in the subcluster name. Vertica converts each underscore to a hyphen ( "-" ) in the FQDN for any object name derived from the subcluster name. For example, Vertica generates a default subcluster and names it default_subcluster, and then converts the corresponding portion of the derived object's FQDN to default-subcluster.

For additional naming guidelines, see the Kubernetes documentation.

External client connections

External clients can target specific subclusters that are fine-tuned to handle their workload. Each subcluster has a service object that handles external connections. To target multiple subclusters with a single service object, assign each subcluster the same spec.subclusters.serviceName value in the custom resource (CR). For implementation details, see VerticaDB custom resource definition.

The operator performs health monitoring that checks whether the Vertica daemon is running on each pod. If the daemon is running, then the operator allows the service object to route traffic to the pod.

By default, the service object derives its name from the custom resource name and the associated subcluster and uses the following format:

customResourceName-subclusterName

To override this default format, set the subclusters[i].serviceName CR parameter, which changes the format to the following:

metadata.name-serviceName

Vertica supports the following service object types:

  • ClusterIP: The default service type. This service provides internal load balancing, and sets a stable IP and port that is accessible from within the subcluster only.

  • NodePort: Provides external client access. You can specify a port number for each host node in the subcluster to open for client connections.

  • LoadBalancer: Uses a cloud provider load balancer to create NodePort and ClusterIP services as needed. For details about implementation, see the Kubernetes documentation and your cloud provider documentation.

For configuration details, see VerticaDB custom resource definition.

Managing internal and external workloads

The Vertica StatefulSet is associated with an external service object. All external client requests are sent through this service object and load balanced among the pods in the cluster.

Import and export

Importing and exporting data between a cluster outside of Kubernetes requires that you expose the service with the NodePort or LoadBalancer service type and properly configure the network.

1 - Client proxy for subclusters

Enables you to configure a client proxy for each subcluster to communicate with all its nodes instead of connecting directly to the database nodes.

A proxy between the client and the Vertica server helps manage communication. You can configure client proxy pod(s) for each subcluster which communicate with all nodes in the subcluster instead of connecting directly to the database nodes. The VerticaDB operator mounts a config map as the configuration file in the proxy pod(s) and automatically updates the config map when the state of the subcluster changes.

For each subcluster, a client proxy deployment with the name <vdb-name>-<subcluster-name>-proxy and a client proxy config map with the name <vdb-name>-<subcluster-name>-proxy-cm are created. You can only verify if the deployment and config map with these names have been created, but you must not edit them.

When a new connection request is made, it is redirected to a node based on the workload specified in the request. If no workload is provided, the default workload is used. The proxy retrieves the list of available nodes for that workload and redirects the request according to the load balancing policy. To reduce performance impact, the proxy caches the node list for a predefined period, which minimizes server calls and improves overall performance.

During an online upgrade, Vertica transfers active connections from a subcluster that is scheduled to shut down. The proxy detects and handles session transfer messages from the server.

Enabling client proxy pod

To enable client proxy for the Vertica database, set the vertica.com/use-client-proxy annotation to true.

metadata:
  annotations:
    vertica.com/use-client-proxy: "true"
    vertica.com/client-proxy-log-level: INFO
...
spec:
...
  proxy:
    image: opentext/client-proxy:latest
  ...
  subclusters:
  - affinity: {}
    name: sc1
    proxy:
      replicas: 1
      resources: {}

Creating replicas of client proxy pod

You can create more than one client proxy pod for a subcluster. To do this, set spec.subclusters[].proxy.replicas to a value >1 based on your requirement.

  ...
  subclusters:
  - affinity: {}
    name: sc1
    proxy:
      replicas: 1
      resources: {}

Verifying deployment and config map

After client proxy is enabled, you can verify the deployment and config map.

To check the deployment:

$ kubectl get deployment
NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
vertica-db-sc1-proxy         1/1     1            1           5m57s
verticadb-operator-manager   1/1     1            1           3h42m

To check the config map:

$ kubectl get cm
NAME                                DATA   AGE
vertica-db-sc1-proxy-cm             1      6m10s
verticadb-operator-manager-config   24     3h42m
$ kubectl describe configmap vertica-db-sc1-proxy-cm
Name:         vertica-db-sc1-proxy-cm
Namespace:    vertica
Labels:       app.kubernetes.io/component=database
              app.kubernetes.io/instance=vertica-db
              app.kubernetes.io/managed-by=verticadb-operator
              app.kubernetes.io/name=vertica
              app.kubernetes.io/version=25.1.0-0
              vertica.com/database=vertica
Annotations:  vertica.com/operator-deployment-method: helm
              vertica.com/operator-version: 25.1.0-0
 
Data
====
config.yaml:
----
listener:
  host: ""
  port: 5433
database:
  nodes:
  - vertica-db-sc1-0.vertica-db.vertica.svc.cluster.local:5433
  - vertica-db-sc1-1.vertica-db.vertica.svc.cluster.local:5433
  - vertica-db-sc1-2.vertica-db.vertica.svc.cluster.local:5433
log:
  level: INFO
 
 
BinaryData
====
 
Events:  <none>

Connecting to Vertica nodes through client proxy

You can run the following command to verify that the client proxy pod is created:

$ kubectl get pods -o wide
NAME                                          READY   STATUS    RESTARTS   AGE   IP             NODE                              NOMINATED NODE   READINESS GATES
vertica-db-sc1-0                              2/2     Running   0          19h   10.244.1.244   k8s-ubuntu20-05.verticacorp.com   <none>           <none>
vertica-db-sc1-1                              2/2     Running   0          19h   10.244.1.246   k8s-ubuntu20-05.verticacorp.com   <none>           <none>
vertica-db-sc1-2                              2/2     Running   0          19h   10.244.2.218   k8s-ubuntu20-06                   <none>           <none>
vertica-db-sc1-proxy-b46578c96-bhs5r          1/1     Running   0          19h   10.244.2.214   k8s-ubuntu20-06                   <none>           <none>
verticadb-operator-manager-75ddffb477-qmbpf   1/1     Running   0          23h   10.244.1.214   k8s-ubuntu20-05.verticacorp.com   <none>           <none>

In this example, the IP of the client proxy pod is 10.244.2.214.

You can still use NodePort or load balancer to connect to the service of the subcluster through the client proxy. The service will now redirect the connection to the client proxy instead of the Vertica nodes. Here, the service verica-db-sc1 has a load balancer a24fb01e0875e4adc844aa046951366f-55b4172b9dacecfb.elb.us-east-1.amazonaws.com.

$ kubectl get svc
NAME            TYPE           CLUSTER-IP      EXTERNAL-IP                                                                     PORT(S)                               AGE
vertica-db       ClusterIP      None            <none>                                                                          5434/TCP,4803/TCP,8443/TCP,5554/TCP   13d
vertica-db-sc1   LoadBalancer   172.30.84.160   a24fb01e0875e4adc844aa046951366f-55b4172b9dacecfb.elb.us-east-1.amazonaws.com   5433:31475/TCP,8443:30239/TCP         13d

In the following example, we use the vsql client to connect via the service:

$ /opt/vertica/bin/vsql -h a24fb01e0875e4adc844aa046951366f-55b4172b9dacecfb.elb.us-east-1.amazonaws.com -U dbadmin
Welcome to vsql, the Vertica Analytic Database interactive terminal.
 
Type:  \h or \? for help with vsql commands
       \g or terminate with semicolon to execute query
       \q to quit
 
vertica=> select node_name,client_hostname,client_type,client_os_hostname from current_session;
     node_name      |  client_hostname   | client_type |       client_os_hostname
--------------------+--------------------+-------------+---------------------------------
v_vertica_node0001 | 10.244.2.214:46750 | vsql        | k8s-ubuntu20-04.verticacorp.com
(1 row)

You will notice that in the server session, the client_hostname shows the client proxy’s Cluster-IP (10.244.2.214 in this case) instead of the actual client machine.

2 - Scaling subclusters

The operator enables you to scale the number of subclusters and the number of pods per subcluster automatically.

The operator enables you to scale the number of subclusters and the number of pods per subcluster automatically. This utilizes or conserves resources depending on the immediate needs of your workload.

The following sections explain how to scale resources for new workloads. For details about scaling resources for existing workloads, see VerticaAutoscaler custom resource definition.

Prerequisites

Scaling the number of subclusters

Adjust the number of subclusters in your custom resource to fine-tune resources for short-running dashboard queries. For example, increase the number of subclusters to increase throughput. For more information, see Improving query throughput using subclusters.

  1. Use kubectl edit to open your default text editor and update the YAML file for the specified custom resource. The following command opens a custom resource named vdb for editing:

    $ kubectl edit vdb
    
  2. In the spec section of the custom resource, locate the subclusters subsection. Begin with the type field to define a new subcluster.

    The type field indicates the subcluster type. Because there is already a primary subcluster, enter Secondary:

    spec:
    ...
      subclusters:
      ...
      - type: secondary
    
  3. Follow the steps in VerticaDB custom resource definition to complete the subcluster definition. The following completed example adds a secondary subcluster for dashboard queries:

    spec:
    ...
      subclusters:
      - type: primary
        name: primary-subcluster
      ...
      - type: secondary
        name: dashboard
        clientNodePort: 32001
        resources:
          limits:
            cpu: 32
            memory: 96Gi
          requests:
            cpu: 32
            memory: 96Gi
        serviceType: NodePort
        size: 3
    
  4. Save and close the custom resource file. When the update completes, you receive a message similar to the following:

    verticadb.vertica.com/vertica-db edited
    
  5. Use the kubectl wait command to monitor when the new pods are ready:

    $ kubectl wait --for=condition=Ready pod --selector app.kubernetes.io/name=verticadb --timeout 180s
    pod/vdb-dashboard-0 condition met
    pod/vdb-dashboard-1 condition met
    pod/vdb-dashboard-2 condition met
    

Scaling the pods in a subcluster

For long-running, analytic queries, increase the pod count for a subcluster. See Using elastic crunch scaling to improve query performance.

  1. Use kubectl edit to open your default text editor and update the YAML file for the specified custom resource. The following command opens a custom resource named verticadb for editing:

    $ kubectl edit verticadb
    
  2. Update the subclusters.size value to 6:

    spec:
    ...
      subclusters:
      ...
      - type: secondary
        ...
        size: 6
    

    Shards are rebalanced automatically.

  3. Save and close the custom resource file. You receive a message similar to the following when you successfully update the file:

    verticadb.vertica.com/verticadb edited

  4. Use the kubectl wait command to monitor when the new pods are ready:

    $ kubectl wait --for=condition=Ready pod --selector app.kubernetes.io/name=verticadb --timeout 180s
    pod/vdb-subcluster1-3 condition met
    pod/vdb-subcluster1-4 condition met
    pod/vdb-subcluster1-5 condition met
    

Removing a subcluster

Remove a subcluster when it is no longer needed, or to preserve resources.

  1. Use kubectl edit to open your default text editor and update the YAML file for the specified custom resource. The following command opens a custom resource named verticadb for editing:

    $ kubectl edit verticadb
    
  2. In the subclusters subsection nested under spec, locate the subcluster that you want to delete. Delete the element in the subcluster array represents the subcluster that you want to delete. Each element is identified by a hyphen (-).

  3. After you delete the subcluster and save, you receive a message similar to the following:

    verticadb.vertica.com/verticadb edited