This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Subclusters on Kubernetes

Eon Mode uses subclusters for workload isolation and scaling.

1: Scaling subclusters
2: VerticaAutoscaler custom resource

Eon Mode uses subclusters for workload isolation and scaling. The Vertica operator provides tools to direct external client communications to specific subclusters, and automate scaling without stopping your database.

The custom resource definition (CRD) provides parameters that allow you to fine-tune each subcluster for specific workloads. For example, you can increase the subcluster size setting for increased throughput, or adjust the resource requests and limits to manage compute power. When you create a custom resource instance, the operator deploys each subcluster as a StatefulSet. Each StatefulSet has a service object, which allows an external client to connect to a specific subcluster.

Kubernetes uses the subcluster name to derive names for the subcluster StatefulSet, service object, and pods. This naming convention tightly couples the subcluster objects to help Kubernetes effectively manage the cluster. If you want to rename a subcluster, you must delete it from the CRD and redefine it so that the operator can create new objects with a derived name.

Important

The default subcluster name that the Vertica server generates is default_subcluster. This name is invalid for Kubernetes resource types. You must provide a valid name that follows Kubernetes guidelines.

External client connections

External clients can target specific subclusters that are fine-tuned to handle their workload. Each subcluster has a service object that handles external connections. To target multiple subclusters with a single service object, assign each subcluster the same spec.subclusters.serviceName value in the custom resource (CR). For implementation details, see Creating a custom resource.

The operator performs health monitoring that checks if the Vertica daemon is running on each pod. If it is, then the operator allows the service object to route traffic to the pod.

By default, the service object derives its name from the custom resource name and the associated subcluster and uses the customResourceName-subclusterName format. Use the subclusters[i].serviceName CR parameter to override the default naming format and use the metadata.name-serviceName format.

Vertica supports the following service object types:

ClusterIP: The default service type. This service provides internal load balancing, and sets a stable IP and port that is accessible from within the subcluster only.
NodePort: Provides external client access. You can specify a port number for each host node in the subcluster to open for client connections.
LoadBalancer: Uses a cloud provider load balancer to create NodePort and ClusterIP services as needed. For details about implementation, see the Kubernetes documentation and your cloud provider documentation.

For configuration details, see Creating a custom resource.

Managing internal and external workloads

The Vertica StatefulSet is associated with an external service object. All external client requests are sent through this service object and load balanced among the pods in the cluster.

Import and export

Importing and exporting data between a cluster outside of Kubernetes requires that you expose the service with the NodePort or LoadBalancer service type and properly configure the network.

Important

When importing or exporting data, each node must have a static IP address. Rescheduled pods might be on different host nodes, so you must monitor and update the static IP addresses to reflect the new node.

For more information, see Configuring the Network to Import and Export Data.

1 - Scaling subclusters

The operator enables you to scale the number of subclusters, and the number of pods per subcluster automatically.

The operator enables you to scale the number of subclusters, and the number of pods per subcluster automatically. This allows you to utilize or conserve resources depending on the immediate needs of your workload.

The following sections explain how to scale resources for new workloads. For details about scaling resources for existing workloads, see VerticaAutoscaler custom resource.

Prerequisites

Complete Installing the Vertica DB operator.
Install the kubectl command line tool.
Complete Creating a custom resource.
Confirm that you have the resources to scale.

Note
By default, the custom resource uses the free Community Edition (CE) license. This license allows you to deploy up to three nodes with a maximum of 1TB of data. To add resources beyond these limits, you must add your Vertica license to the custom resource as described in Creating a custom resource.

Scaling the number of subclusters

Adjust the number of subclusters in your custom resource to fine-tune resources for short-running dashboard queries. For example, increase the number of subclusters to increase throughput. For more information, see Improving query throughput using subclusters.

Use kubectl edit to open your default text editor and update the YAML file for the specified custom resource. The following command opens a custom resource named vdb for editing:
```
$ kubectl edit vdb
```
In the spec section of the custom resource, locate the subclusters subsection. Begin the isPrimary field to define a new subcluster.

The isPrimary field accepts a boolean that specifies whether the subcluster is a primary or secondary. Because there is already a primary subcluster in our custom resource, enter false:
```
spec:
...
  subclusters:
  ...
  - isPrimary: false
```

Follow the steps in Creating a custom resource to complete the subcluster definition. The following completed example adds a secondary subcluster for dashboard queries:

spec:
...
  subclusters:
  - isPrimary: true
    name: primary-subcluster
  ...
  - isPrimary: false
    name: dashboard
    nodePort: 32001
    resources:
      limits:
        cpu: 32
        memory: 96Gi
      requests:
        cpu: 32
        memory: 96Gi
    serviceType: NodePort
    size: 3

Save and close the custom resource file. You receive a message similar to the following when you successfully update the file:

verticadb.vertica.com/vertica-db edited

Use the kubectl wait command to monitor when the new pods are ready:

$ kubectl wait --for=condition=Ready pod --selector app.kubernetes.io/name=vertica-db --timeout 180s
pod/vdb-dashboard-0 condition met
pod/vdb-dashboard-1 condition met
pod/vdb-dashboard-2 condition met

Scaling the pods in a subcluster

For long-running, analytic queries, increase the pod count for a subcluster. See Using elastic crunch scaling to improve query performance.

Use kubectl edit to open your default text editor and update the YAML file for the specified custom resource. The following command opens a custom resource named vdb for editing:
```
$ kubectl edit vertica-db
```

Update the subclusters.size value to 6:

spec:
...
  subclusters:
  ...
  - isPrimary: false
    ...
    size: 6

Shards are rebalanced automatically.

Save and close the custom resource file. You receive a message similar to the following when you successfully update the file:

verticadb.vertica.com/vertica-db edited

Use the kubectl wait command to monitor when the new pods are ready:

$ kubectl wait --for=condition=Ready pod --selector app.kubernetes.io/name=vertica-db --timeout 180s
pod/vdb-subcluster1-3 condition met
pod/vdb-subcluster1-4 condition met
pod/vdb-subcluster1-5 condition met

Removing a subcluster

Remove a subcluster when it is no longer needed, or to preserve resources.

Important

Because each custom resource instance requires a primary subcluster, you cannot remove all subclusters.

Use kubectl edit to open your default text editor and update the YAML file for the specified custom resource. The following command opens a custom resource named vdb for editing:
```
$ kubectl edit vertica-db
```
In the subclusters subsection nested under spec, locate the subcluster that you want to delete. Delete the element in the subcluster array represents the subcluster that you want to delete. Each element is identified by a hyphen (-).
After you delete the subcluster and save, you receive a message similar to the following:

verticadb.vertica.com/vertica-db edited

2 - VerticaAutoscaler custom resource

The VerticaAutoscaler custom resource (CR) is a HorizontalPodAutoscaler that automatically scales resources for existing subclusters using one of the following strategies:.

The VerticaAutoscaler custom resource (CR) is a HorizontalPodAutoscaler that automatically scales resources for existing subclusters using one of the following strategies:

Subcluster scaling for short-running dashboard queries.
Pod scaling for long-running analytic queries.

The VerticaAutoscaler CR scales using resource or custom metrics. Vertica manages subclusters by workload, which helps you pinpoint the best metrics to trigger a scaling event. To maintain data integrity, the operator does not scale down unless all connections to the pods are drained and sessions are closed.

For details about the algorithm that determines when the VerticaAutoscaler scales, see the Kubernetes documentation.

Additionally, the VerticaAutoscaler provides a webhook to validate state changes. By default, this webhook is enabled. You can configure this webhook with the webhook.enable Helm chart parameter.

Parameters

Parameter	Description
`verticaDBName`	Required. Name of the VerticaDB CR that the VerticaAutoscaler CR scales resources for.
`scalingGranularity`	Required. The scaling strategy. This parameter accepts one of the following values: Subcluster: Create or delete entire subclusters. To create a new subcluster, the operator uses a template or an existing subcluster with the same serviceName. Pod: Increase or decrease the size of an existing subcluster. Default: Subcluster
`serviceName`	Required. Refers to the subclusters[i].serviceName for the VerticaDB CR. VerticaAutoscaler uses this value as a selector when scaling subclusters together.
`template`	When `scalingGranularity` is set to Subcluster, you can use this parameter to define how VerticAutoscaler scales the new subcluster. The following is an example: spec: verticaDBName: dbname scalingGranularity: Subcluster serviceName: service-name template: name: autoscaler-name size: 2 serviceName: service-name isPrimary: false If you set template.size to 0, VerticaAutoscaler selects as a template an existing subcluster that uses `service-name`. This setting is ignored when `scalingGranularity` is set to Pod.

Examples

The examples in this section use the following VerticaDB custom resource. Each example uses CPU to trigger scaling:

apiVersion: vertica.com/v1beta1
kind: VerticaDB
metadata:
  name: dbname
spec:
  communal:
    path: "path/to/communal-storage"
    endpoint: "path/to/communal-endpoint"
    credentialSecret: credentials-secret
  subclusters:
    - name: primary1
      size: 3
      isPrimary: true
      serviceName: primary1
      resources:
        limits:
          cpu: "8"
        requests:
          cpu: "4"

Prerequisites

Complete Installing the Vertica DB operator.
Install the kubectl command line tool.
Complete Creating a custom resource.
Confirm that you have the resources to scale.

Note
By default, the custom resource uses the free Community Edition (CE) license. This license allows you to deploy up to three nodes with a maximum of 1TB of data. To add resources beyond these limits, you must add your Vertica license to the custom resource as described in Creating a custom resource.

Set a value for the metric that triggers scaling. For example, if you want to scale by CPU utilization, you must set CPU limits and requests.

Subcluster scaling

Automatically adjust the number of subclusters in your custom resource to fine-tune resources for short-running dashboard queries. For example, increase the number of subclusters to increase throughput. For more information, see Improving query throughput using subclusters.

All subclusters share the same service object, so there are no required changes to external service objects. Pods in the new subcluster are load balanced by the existing service object.

The following example creates a VerticaAutoscaler custom resource that scales by subcluster when the VerticaDB uses 50% of the node's available CPU:

Define the VerticaAutoscaler custom resource in a YAML-formatted manifest:

apiVersion: vertica.com/v1beta1
kind: VerticaAutoscaler
metadata:
  name: autoscaler-name
spec:
  verticaDBName: dbname
  scalingGranularity: Subcluster
  serviceName: primary1

Create the VerticaAutoscaler with the kubectl autoscale command:
```
$ kubectl autoscale verticaautoscaler autoscaler-name --cpu-percent=50 --min=3 --max=12
```
The previous command creates a HorizontalPodAutoscaler object that:
- Sets the target CPU utilization to 50%.
- Scales to a minimum of three pods in one subcluster, and 12 pods in four subclusters.

Pod scaling

For long-running, analytic queries, increase the pod count for a subcluster. For additional information about Vertica and analytic queries, see Using elastic crunch scaling to improve query performance.

When you scale pods in an Eon Mode database, you must consider the impact on database shards. For details, see Shards and subscriptions.

The following example creates a VerticaAutoscaler custom resource that scales by pod when the VerticaDB uses 50% of the node's available CPU:

Define the VerticaAutoScaler custom resource in a YAML-formatted manifest:

apiVersion: vertica.com/v1beta1
kind: VerticaAutoscaler
metadata:
  name: autoscaler-name
spec:
  verticaDBName: dbname
  scalingGranularity: Pod
  serviceName: primary1

Create the autoscaler instance with the kubectl autoscale command:
```
$ kubectl autoscale verticaautoscaler autoscaler-name --cpu-percent=50 --min=3 --max=12
```
The previous command creates a HorizontalPodAutoscaler object that:
- Sets the target CPU utilization to 50%.
- Scales to a minimum of three pods in one subcluster, and 12 pods in four subclusters.

Event monitoring

To view the VerticaAutoscaler object, use the kubetctl describe hpa command:

$ kubectl describe hpa autoscaler-name
Name:                                                  as
Namespace:                                             vertica
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Tue, 12 Apr 2022 15:11:28 -0300
Reference:                                             VerticaAutoscaler/as
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  0% (9m) / 50%
Min replicas:                                          3
Max replicas:                                          12
VerticaAutoscaler pods:                                3 current / 3 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range

When a scaling event occurs, you can view the admintools commands to scale the cluster. Use kubectl to view the StatefulSets:

$ kubectl get statefulsets
NAME                                                   READY   AGE
db-name-as-instance-name-0                             0/3     71s
db-name-primary1                                       3/3     39m

Use kubectl describe to view the executing commands:

$ kubectl describe vdb dbname | tail
  Upgrade Status:
Events:
  Type    Reason                   Age   From                Message
  ----    ------                   ----  ----                -------
  Normal  ReviveDBStart            41m   verticadb-operator  Calling 'admintools -t revive_db'
  Normal  ReviveDBSucceeded        40m   verticadb-operator  Successfully revived database. It took 25.255683916s
  Normal  ClusterRestartStarted    40m   verticadb-operator  Calling 'admintools -t start_db' to restart the cluster
  Normal  ClusterRestartSucceeded  39m   verticadb-operator  Successfully called 'admintools -t start_db' and it took 44.713787718s
  Normal  SubclusterAdded          10s   verticadb-operator  Added new subcluster 'as-0'
  Normal  AddNodeStart             9s    verticadb-operator  Calling 'admintools -t db_add_node' for pod(s) 'db-name-as-instance-name-0-0, db-name-as-instance-name-0-1, db-name-as-instance-name-0-2'

Subclusters on Kubernetes

Important

External client connections

Managing internal and external workloads

Import and export

Important

1 - Scaling subclusters

Prerequisites

Note

Scaling the number of subclusters

Scaling the pods in a subcluster

Removing a subcluster

Important

2 - VerticaAutoscaler custom resource

Parameters

Examples

Prerequisites

Note

Subcluster scaling

Pod scaling

Event monitoring