Upgrading Vertica on Kubernetes

The operator automates Vertica server version upgrades for a custom resource (CR).

The operator automates Vertica server version upgrades for a custom resource (CR). Use the upgradePolicy setting in the CR to determine whether your cluster remains online or is taken offline during the version upgrade.

Prerequisites

Before you begin, complete the following:

Setting the policy

The upgradePolicy CR parameter setting determines how the operator upgrades Vertica server versions. It provides the following options:

Setting Description
Offline

The operator shuts down the cluster to prevent multiple versions from running simultaneously.

The operator performs all server version upgrades using the Offline setting in the following circumstances:

  • You have only one subcluster

  • You are upgrading from a Vertica server version prior to version 11.1.0

Online The cluster continues to operate during an online upgrade. The data is in read-only mode while the operator upgrades the image for the primary subcluster.
Auto

The default setting. The operator selects either Offline or Online depending on the configuration. The operator performs an Online upgrade if all of the following are true:

  • A license Secret exists

  • K-Safety is 1

  • The cluster is currently running a Vertica version 11.1.0 or higher

If the current configuration does not meet all of the previous requirements, the operator performs an Offline upgrade.

Set the reconcile loop iteration time

During an upgrade, the operator runs the reconcile loop to compare the actual state of the objects to the desired state defined in the CR. The operator requeues any unfinished work, and the reconcile loop compares states with a set period of time between each reconcile iteration. Set the upgradeRequeueTime parameter to determine the amount of time between each reconcile loop iteration.

Routing client traffic during an online upgrade

During an online upgrade, the operator begins by upgrading the Vertica server version in the primary subcluster to form a cluster with the new version. When the operator restarts the primary nodes, it places the secondary subclusters in read-only mode. Next, the operator upgrades any secondary subclusters one at a time. During the upgrade for any subcluster, all client connections are drained, and traffic is rerouted to either an existing subcluster or a temporary subcluster.

Online upgrades require more than one subcluster so that the operator can reroute client traffic for the subcluster while it is upgrading. By default, the operator selects which subcluster receives the rerouted traffic using the following rules:

  • When rerouting traffic for the primary subcluster, the operator selects the first secondary subcluster defined in the CR.

  • When restarting the first secondary subcluster after the upgrade, the operator selects the first subcluster that is defined in the CR that is up.

  • If no secondary subclusters exist, you cannot perform an online upgrade. The operator selects the first primary subcluster defined in the CR and performs an offline upgrade.

Routing client traffic to an existing subcluster

You might want to control which subclusters handle rerouted client traffic due to subcluster capacity or licensing limitations. You can set the temporarySubclusterRouting.names parameter to specify an existing subcluster to receive the rerouted traffic:

spec:
  ...
  temporarySubclusterRouting:
    names:
      - subcluster-2
      - subcluster-1

In the previous example, subcluster-2 accepts traffic when the other subcluster-1 is offline. When subcluster-2 is down, subcluster-1 accepts its traffic.

Routing client traffic to a temporary subcluster

To create a temporary subcluster that exists for the duration of the upgrade process, use the temporarySubclusterRouting.template parameter to provide a name and size for the temporary subcluster:

spec:
  ...
  temporarySubclusterRouting:
    template:
      name: transient
      size: 3

If you choose to upgrade with a temporary subcluster, ensure that you have the necessary resources.

Upgrading the Vertica server version

After you set the upgradePolicy and optionally configure temporary subcluster routing, use the kubectl command line tool to perform the upgrade and monitor its progress.

The following steps perform an online version upgrade:

  1. Set the upgrade policy. The following command uses the kubectl patch command to set the upgradePolicy value to Online:

    $ kubectl patch verticadb cluster-name --type=merge --patch '{"spec": {"upgradePolicy": "Online"}}'
    
  2. Update the image value in the CR with kubectl patch:

    $ kubectl patch verticadb cluster-name --type=merge --patch '{"spec": {"image": "vertica/vertica-k8s:new-version"}}'
    
  3. Use kubectl wait to wait until the operator acknowledges the new image and begins upgrade mode:

    $ kubectl wait --for=condition=ImageChangeInProgress=True vdb/cluster-name --timeout=180s
    
  4. Use kubectl wait to wait until the operator leaves upgrade mode:

    $ kubectl wait --for=condition=ImageChangeInProgress=False vdb/cluster-name --timeout=800s
    

Viewing the upgrade process

To view the current phase of the upgrade process, use kubectl get to inspect the upgradeStatus status field:

$ kubectl get vdb -n namespacedatabase-name -o jsonpath='{.status.upgradeStatus}{"\n"}'
Restarting cluster with new image

To view the entire upgrade process, use kubectl describe to list the events the operator generated during the upgrade:

$ kubectl describe vdb cluster-name

...
Events:
  Type    Reason                   Age    From                Message
  ----    ------                   ----   ----                -------
  Normal  UpgradeStart             5m10s  verticadb-operator  Vertica server upgrade has started.  New image is 'vertica-k8s:new-version'
  Normal  ClusterShutdownStarted   5m12s  verticadb-operator  Calling 'admintools -t stop_db'
  Normal  ClusterShutdownSucceeded 4m08s  verticadb-operator  Successfully called 'admintools -t stop_db' and it took 56.22132s
  Normal  ClusterRestartStarted    4m25s  verticadb-operator  Calling 'admintools -t start_db' to restart the cluster
  Normal  ClusterRestartSucceeded  25s    verticadb-operator  Successfully called 'admintools -t start_db' and it took 240s
  Normal  UpgradeSucceeded         5s     verticadb-operator  Vertica server upgrade has completed successfully