Upgrading Vertica on Kubernetes

The operator automates Vertica server version upgrades for a custom resource (CR).

The operator automates Vertica server version upgrades for a custom resource (CR). Use the upgradePolicy setting in the CR to determine whether your cluster remains online or is taken offline during the version upgrade.

Prerequisites

Before you begin, complete the following:

Setting the policy

The upgradePolicy CR parameter setting determines how the operator upgrades Vertica server versions. It provides the following options:

Setting Description
Offline

The operator shuts down the cluster to prevent multiple versions from running simultaneously.

The operator performs all server version upgrades using the Offline setting in the following circumstances:

  • You have only one subcluster

  • You are upgrading from a Vertica server version prior to version 11.1.0

ReadOnlyOnline

The cluster continues to operate during a read-only online upgrade. The database is in read-only mode while the operator upgrades the image for the primary subcluster.

Online

The cluster continues to operate during an online upgrade. You can modify the data while the operator upgrades the database.

Auto

The default setting. The operator selects either Offline or ReadOnlyOnline depending on the configuration. The operator performs a ReadOnlyOnline upgrade if all of the following are true:

  • A license Secret exists

  • K-Safety is 1

  • The cluster is currently running a Vertica version 11.1.0 or higher

If the current configuration does not meet all of the previous requirements, the operator performs an Offline upgrade.

Reconcile loop iteration time

During an upgrade, the operator runs the reconcile loop to compare the actual state of the objects to the desired state defined in the CR. The operator requeues any unfinished work, and the reconcile loop compares states with a set period of time between each reconcile iteration.

Online upgrade

An online upgrade allows you to load data with minimal downtime, keeping the database active with continuous writes though replication. By leveraging sandboxes, instead of shutting down the primary subclusters and limiting secondary subclusters to read-only mode, you can sandbox a secondary subcluster. This allows ongoing read and write access to the database while the primary subcluster is being upgraded.

Online upgrade workflow

The following outlines the workflow during an online upgrade:

  1. Enable no-ddl mode: This mode restricts certain actions, such as creating new users or views. You can only insert data into existing tables or create new tables.
  2. Create a sandbox: The operator creates a new sandbox that replicates the main cluster. This requires additional resources temporarily. In the following example vertica-db-sc1 is the original cluster while vertica-db-sc1-sb is the sandboxed copy.
    $ kubectl get pods
    NAME                                          READY   STATUS    RESTARTS   AGE
    vertica-db-sc1-0                              2/2     Running   0          23m
    vertica-db-sc1-1                              2/2     Running   0          23m
    vertica-db-sc1-2                              2/2     Running   0          23m
    vertica-db-sc1-sb-0                           2/2     Running   0          83s
    vertica-db-sc1-sb-1                           2/2     Running   0          83s
    vertica-db-sc1-sb-2                           2/2     Running   0          83s
    verticadb-operator-manager-5f4564f946-qmklq   1/1     Running   163        7d4h
    
  3. Upgrade the sandbox: The sandbox is upgraded in Offline mode.
  4. Replicate data and redirect connections: Changes are synchronized by replicating data from the main cluster to the sandbox and connections are redirected to the sandbox environment.
  5. Promote the sandbox: The sandbox is now promoted to the main cluster.
  6. Remove the old cluster: After redirect is complete, the old cluster is removed. The new StatefulSet name and Vertica node names will differ from those of the old cluster.
    $ kubectl get pods
    NAME                                          READY   STATUS    RESTARTS   AGE
    vertica-db-sc1-sb-0                           2/2     Running   0          3m34s
    vertica-db-sc1-sb-1                           2/2     Running   0          3m34s
    vertica-db-sc1-sb-2                           2/2     Running   0          3m33s
    verticadb-operator-manager-5f4564f946-qmklq   1/1     Running   163        7d4h
    

Client session transfer

During an online upgrade, the operator pauses write operations to replicate data from the main cluster to the sandbox. After data replication is complete, client sessions are transferred from the existing Vertica version on the main cluster to a sandboxed subcluster on the upgraded Vertica version.

Routing client traffic during a ReadOnlyOnline upgrade

During a read-only online upgrade, the operator begins by upgrading the Vertica server version in the primary subcluster to form a cluster with the new version. When the operator restarts the primary nodes, it places the secondary subclusters in read-only mode. Next, the operator upgrades any secondary subclusters one at a time. During the upgrade for any subcluster, all client connections are drained, and traffic is rerouted to either an existing subcluster or a temporary subcluster.

Read-only online upgrades require more than one subcluster so that the operator can reroute client traffic for the subcluster while it is upgrading. By default, the operator selects which subcluster receives the rerouted traffic using the following rules:

  • When rerouting traffic for the primary subcluster, the operator selects the first secondary subcluster defined in the CR.

  • When restarting the first secondary subcluster after the upgrade, the operator selects the first subcluster that is defined in the CR that is up.

  • If no secondary subclusters exist, you cannot perform a read-only online upgrade. The operator selects the first primary subcluster defined in the CR and performs an offline upgrade.

Route to an existing subcluster

You might want to control which subclusters handle rerouted client traffic due to subcluster capacity or licensing limitations. You can set the temporarySubclusterRouting.names parameter to specify an existing subcluster to receive the rerouted traffic:

spec:
  ...
  temporarySubclusterRouting:
    names:
      - subcluster-2
      - subcluster-1

In the previous example, subcluster-2 accepts traffic when the other subcluster-1 is offline. When subcluster-2 is down, subcluster-1 accepts its traffic.

Route to a temporary subcluster

To create a temporary subcluster that exists for the duration of the upgrade process, use the temporarySubclusterRouting.template parameter to provide a name and size for the temporary subcluster:

spec:
  ...
  temporarySubclusterRouting:
    template:
      name: transient
      size: 3

If you choose to upgrade with a temporary subcluster, ensure that you have the necessary resources.

Migrating deployment types

Beginning with Vertica server version 24.1.0, the operator manages deployments with vclusterops, a Go library that uses a high-level REST interface to perform database operations with the Node Management Agent (NMA) and HTTPS service. The vclusterops library replaces Administration tools (admintools), a traditional command-line interface that executes administrator commands through STDIN and required SSH keys for internal node communications. The vclusterops deployment is more efficient in containerized environments than the admintools deployment.

Because version 24.1.0 does not include admintools, you must migrate to the vcluster deployment type when you upgrade from an earlier server version.

Migrate the VerticaDB CR

Before you can migrate deployment types, you must upgrade the VerticaDB operator to version 2.0.0.

To migrate deployment types, update the manifest and apply it:

  1. Update the manifest to a vcluster deployment. The following sample manifest includes all fields that are required to migrate to a vclusterops deployment:

    apiVersion: vertica.com/v1
    kind: VerticaDB
    metadata:
      name: cr-name
      annotations:
        vertica.com/vcluster-ops: "true"
        vertica.com/run-nma-in-sidecar: "false"
    spec:
      image: "vertica/vertica-k8s:24.1.0-0"
      ...
    

    This manifest sets the following parameters:

    • apiVersion: By default, v1 supports vcluster deployments. Deprecated API version v1beta1 also supports vcluster, but Vertica recommends that you change to v1.
    • vertica.com/vcluster-ops: Set to true. With API version v1, this field and setting are optional. If you use the deprecated v1beta1, this setting is required or the migration fails.
    • vertica.com/run-nma-in-sidecar: You must set this to false for vcluster deployments. For additional details, see VerticaDB custom resource definition.
    • spec.image: Set this to a 24.1.0 image version. For a list images, see Vertica images.
  2. Apply the updated manifest to complete the migration:

    $ kubectl apply -f migration.yaml
    

Upgrade the Vertica server version

After you select your upgrade policy, use the kubectl command line tool to perform the upgrade and monitor its progress. The following steps demonstrate an online upgrade:

  1. Set the upgrade policy to Online:

    $ kubectl patch verticadb cluster-name --type=merge --patch '{"spec": {"upgradePolicy": "Online"}}'
    
  2. Update the image setting in the CR:

    $ kubectl patch verticadb cluster-name --type=merge --patch '{"spec": {"image": "vertica/vertica-k8s:new-version"}}'
    
  3. Use kubectl wait to wait until the operator leaves upgrade mode:

    $ kubectl wait --for=condition=UpgradeInProgress=False vdb/cluster-name --timeout=800s
    

View the upgrade process

To view the current phase of the upgrade process, use kubectl get to inspect the upgradeStatus status field:

$ kubectl get vdb -n namespacedatabase-name -o jsonpath='{.status.upgradeStatus}{"\n"}'
Restarting cluster with new image

To view the entire upgrade process, use kubectl describe to list the events the operator generated during the upgrade:

$ kubectl describe vdb cluster-name

...
Events:
  Type     Reason                                   Age                From                Message
  ----     ------                                   ----               ----                -------
  Normal   SubclusterRemoved                        32m                verticadb-operator  Removed subcluster 'sc_3'
  Normal   SubclusterRemoved                        32m                verticadb-operator  Removed subcluster 'sc2'
  Normal   SubclusterAdded                          18m                verticadb-operator  Added new subcluster 'sc1-sb'
  Normal   AddNodeStart                             18m                verticadb-operator  Starting add database node for pod(s) 'vertica-db-sc1-0, vertica-db-sc1-1, vertica-db-sc1-2'
  Normal   AddNodeSucceeded                         17m                verticadb-operator  Successfully added database nodes and it took 38s
  Normal   RebalanceShards                          17m                verticadb-operator  Successfully called 'rebalance_shards' for 'sc1-sb'
  Normal   SandboxSubclusterStart                   17m                verticadb-operator  Starting add subcluster "sc1-sb" to sandbox "replica-group-b-e904a"
  Normal   SandboxSubclusterSucceeded               17m                verticadb-operator  Successfully added subcluster "sc1-sb" to sandbox "replica-group-b-e904a"
  Normal   UpgradeStart                             16m (x2 over 18m)  verticadb-operator  Vertica server upgrade has started.
  Normal   ClusterShutdownStarted                   16m                verticadb-operator  Starting stop database on sandbox replica-group-b-e904a
  Normal   ClusterShutdownSucceeded                 16m                verticadb-operator  Successfully shutdown the database on sandbox replica-group-b-e904a and it took 17s
  Warning  LowLocalDataAvailSpace                   16m                verticadb-operator  Low disk space in persistent volume attached to vertica-db-sc1-sb-1
  Normal   ClusterRestartStarted                    14m (x3 over 15m)  verticadb-operator  Starting restart of the sandbox replica-group-b-e904a
  Normal   ClusterRestartSucceeded                  13m                verticadb-operator  Successfully restarted the sandbox replica-group-b-e904a and it took 70s
  Normal   PromoteSandboxSubclusterToMainStart      12m                verticadb-operator  Starting promote sandbox "replica-group-b-e904a" to main
  Normal   PromoteSandboxSubclusterToMainSucceeded  11m                verticadb-operator  Successfully promote sandbox "replica-group-b-e904a" to main
  Normal   SubclusterRemoved                        11m                verticadb-operator  Removed subcluster 'sc1'
  Normal   RenameSubclusterStart                    11m                verticadb-operator  Starting rename subcluster "sc1-sb" to "sc1"
  Normal   RenameSubclusterSucceeded                11m                verticadb-operator  Successfully rename subcluster "sc1-sb" to "sc1"
  Normal   UpgradeSucceeded                         11m                verticadb-operator  Vertica server upgrade has completed successfully.  New image is 'vertica/vertica-k8s:new-version'