Kubernetes 群集故障排除

这些提示帮助您避免有关 Vertica 在 Kerberos 上部署的问题，并对产生的任何问题进行故障排除。

下载 kubectl 命令行工具以调试您的 Kubernetes 资源。

Helm 安装失败

安装 VerticaDB 运算符和准入控制器 Helm 图表时，helm install 命令可能会返回以下错误：

$ helm install vdb-op vertica-charts/verticadb-operator
Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: [unable to recognize "": no matches for kind "Certificate" in version "cert-manager.io/v1", unable to recognize "": no matches for kind "Issuer" in version "cert-manager.io/v1"]

该错误表明您尚未满足准入控制器 webhook 的 TLS 先决条件。要解决此问题，请安装 cert-manager 或配置自定义证书。以下步骤安装 cert-manager。

安装 cert-manager YAML 清单：

$ kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.5.3/cert-manager.yaml

验证 cert-manager 安装。

如果您在安装 cert-manager 后立即尝试安装 Helm 图表，则可能收到以下错误：

$ helm install vdb-op vertica-charts/verticadb-operator
Error: failed to create resource: Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.232.154:443: connect: connection refused

您收到此错误是因为 cert-manager 需要时间来创建其 pod 并将 webhook 注册到群集。等待几分钟，然后使用以下命令验证 cert-manager 安装：

$ kubectl get pods --namespace cert-manager
NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-7dd5854bb4-skks7              1/1     Running   5          12d
cert-manager-cainjector-64c949654c-9nm2z   1/1     Running   5          12d
cert-manager-webhook-6bdffc7c9d-b7r2p      1/1     Running   5          12d

有关 cert-manager 安装验证的更多详细信息，请参阅 cert-manager 文档。

验证 cert-manager 安装后，您必须卸载 Helm 图表，然后重新安装：
```
$ helm uninstall vdb-op
$ helm install vdb-op vertica-charts/verticadb-operator
```

有关其他信息，请参阅安装 Vertica DB 操作器。

自定义证书 Helm 安装错误

如果您在使用 Helm 图表安装运算符时使用自定义证书，则 helm install 或 kubectl apply 命令可能会返回类似以下内容的错误：

$ kubectl apply -f ../operatorcrd.yaml
Error from server (InternalError): error when creating "../operatorcrd.yaml": Internal error occurred: failed calling webhook "mverticadb.kb.io": Post "https://verticadb-operator-webhook-service.namespace.svc:443/mutate-vertica-com-v1beta1-verticadb?timeout=10s": x509: certificate is valid for ip-10-0-21-169.ec2.internal, test-bastion, not verticadb-operator-webhook-service.default.svc

当 TLS 密钥的域名系统 (DNS) 或主题备用名称 (SAN) 不正确时，您会收到此错误。要更正此错误，请按以下格式在配置文件中定义 DNS 和 SAN：

commonName = verticadb-operator-webhook-service.namespace.svc
...
[alt_names]
DNS.1 = verticadb-operator-webhook-service.namespace.svc
DNS.2 = verticadb-operator-webhook-service.namespace.svc.cluster.local

有关更多详细信息，请参阅安装 Vertica DB 操作器。

验证对自定义资源的更新

由于操作员需要时间来执行任务，因此对自定义资源的更新不会立即生效。使用 kubectl 命令行工具验证是否应用了更改。

您可以使用 kubectl wait 命令等待指定的条件。例如，操作员使用 ImageChangeInProgress 条件来提供升级状态。开始映像版本升级后，等待操作员确认升级并将此条件设置为 True：

$ kubectl wait --for=condition=ImageChangeInProgress=True vdb/cluster-name –-timeout=180s

升级开始后，您可以等到操作员离开升级模式并将此条件设置为 False：

$ kubectl wait --for=condition=ImageChangeInProgress=False vdb/cluster-name –-timeout=800s

有关 kubectl wait 的详细信息，请参阅 kubectl 参考文档。

pod 正在运行，但数据库尚未就绪

当您检查群集中的 pod 时，pod 正在运行，但数据库尚未就绪：

$ kubectl get pods
NAME                                                    READY   STATUS    RESTARTS   AGE
vertica-crd-sc1-0                                       0/1     Running   0          12m
vertica-crd-sc1-1                                       0/1     Running   1          12m
vertica-crd-sc1-2                                       0/1     Running   0          12m
verticadb-operator-controller-manager-5d9cdc9b8-kw9nv   2/2     Running   0          24m

要查找问题的根本原因，请使用 kubectl logs 检查操作员管理器。以下示例显示公共存储桶不存在：

$ kubectl logs -l app.kubernetes.io/name=verticadb-operator -c manager -f
2021-08-04T20:03:00.289Z        INFO    controllers.VerticaDB   ExecInPod entry {"verticadb": "default/vertica-crd", "pod": {"namespace": "default", "name": "vertica-crd-sc1-0"}, "command": "bash -c ls -l /opt/vertica/config/admintools.conf && grep '^node\\|^v_\\|^host' /opt/vertica/config/admintools.conf "}
2021-08-04T20:03:00.369Z        INFO    controllers.VerticaDB   ExecInPod stream        {"verticadb": "default/vertica-crd", "pod": {"namespace": "default", "name": "vertica-crd-sc1-0"}, "err": null, "stdout": "-rw-rw-r-- 1 dbadmin verticadba 1243 Aug  4 20:00 /opt/vertica/config/admintools.conf\nhosts = 10.244.1.5,10.244.2.4,10.244.4.6\nnode0001 = 10.244.1.5,/data,/data\nnode0002 = 10.244.2.4,/data,/data\nnode0003 = 10.244.4.6,/data,/data\n", "stderr": ""}
2021-08-04T20:03:00.369Z        INFO    controllers.VerticaDB   ExecInPod entry {"verticadb": "default/vertica-crd", "pod": {"namespace": "default", "name": "vertica-crd-sc1-0"}, "command": "/opt/vertica/bin/admintools -t create_db --skip-fs-checks --hosts=10.244.1.5,10.244.2.4,10.244.4.6 --communal-storage-location=s3://newbucket/db/26100df1-93e5-4e64-b665-533e14abb67c --communal-storage-params=/home/dbadmin/auth_parms.conf --sql=/home/dbadmin/post-db-create.sql --shard-count=12 --depot-path=/depot --database verticadb --force-cleanup-on-failure --noprompt --password ******* "}
2021-08-04T20:03:00.369Z        DEBUG   controller-runtime.manager.events       Normal  {"object": {"kind":"VerticaDB","namespace":"default","name":"vertica-crd","uid":"26100df1-93e5-4e64-b665-533e14abb67c","apiVersion":"vertica.com/v1beta1","resourceVersion":"11591"}, "reason": "CreateDBStart", "message": "Calling 'admintools -t create_db'"}
2021-08-04T20:03:17.051Z        INFO    controllers.VerticaDB   ExecInPod stream        {"verticadb": "default/vertica-crd", "pod": {"namespace": "default", "name": "vertica-crd-sc1-0"}, "err": "command terminated with exit code 1", "stdout": "Default depot size in use\nDistributing changes to cluster.\n\tCreating database verticadb\nBootstrap on host 10.244.1.5 return code 1 stdout '' stderr 'Logged exception in writeBufferToFile: RecvFiles failed in closing file [s3://newbucket/db/26100df1-93e5-4e64-b665-533e14abb67c/verticadb_rw_access_test.txt]: The specified bucket does not exist. Writing test data to file s3://newbucket/db/26100df1-93e5-4e64-b665-533e14abb67c/verticadb_rw_access_test.txt failed.\\nTesting rw access to communal location s3://newbucket/db/26100df1-93e5-4e64-b665-533e14abb67c/ failed\\n'\n\nError: Bootstrap on host 10.244.1.5 return code 1 stdout '' stderr 'Logged exception in writeBufferToFile: RecvFiles failed in closing file [s3://newbucket/db/26100df1-93e5-4e64-b665-533e14abb67c/verticadb_rw_access_test.txt]: The specified bucket does not exist. Writing test data to file s3://newbucket/db/26100df1-93e5-4e64-b665-533e14abb67c/verticadb_rw_access_test.txt failed.\\nTesting rw access to communal location s3://newbucket/db/26100df1-93e5-4e64-b665-533e14abb67c/ failed\\n'\n\n", "stderr": ""}
2021-08-04T20:03:17.051Z        INFO    controllers.VerticaDB   aborting reconcile of VerticaDB {"verticadb": "default/vertica-crd", "result": {"Requeue":true,"RequeueAfter":0}, "err": null}
2021-08-04T20:03:17.051Z        DEBUG   controller-runtime.manager.events       Warning {"object": {"kind":"VerticaDB","namespace":"default","name":"vertica-crd","uid":"26100df1-93e5-4e64-b665-533e14abb67c","apiVersion":"vertica.com/v1beta1","resourceVersion":"11591"}, "reason": "S3BucketDoesNotExist", "message": "The bucket in the S3 path 's3://newbucket/db/26100df1-93e5-4e64-b665-533e14abb67c' does not exist"}

为群集创建一个 S3 存储桶：

$ S3_BUCKET=newbucket
$ S3_CLUSTER_IP=$(kubectl get svc | grep minio | head -1 | awk '{print $3}')
$ export AWS_ACCESS_KEY_ID=minio
$ export AWS_SECRET_ACCESS_KEY=minio123
$ aws s3 mb s3://$S3_BUCKET --endpoint-url http://$S3_CLUSTER_IP
make_bucket: newbucket

使用 kubectl get pods 验证群集使用新的 S3 存储桶并且数据库已准备就绪：

$ kubectl get pods
NAME                                                    READY   STATUS    RESTARTS   AGE
minio-ss-0-0                                            1/1     Running   0          18m
minio-ss-0-1                                            1/1     Running   0          18m
minio-ss-0-2                                            1/1     Running   0          18m
minio-ss-0-3                                            1/1     Running   0          18m
vertica-crd-sc1-0                                       1/1     Running   0          20m
vertica-crd-sc1-1                                       1/1     Running   0          20m
vertica-crd-sc1-2                                       1/1     Running   0          20m
verticadb-operator-controller-manager-5d9cdc9b8-kw9nv   2/2     Running   0          63m

数据库不可用

创建自定义资源实例后，数据库不可用。 kubectl get custom-resource 命令不显示信息：

$ kubectl get vdb
NAME          AGE   SUBCLUSTERS   INSTALLED   DBADDED   UP
vertica-crd   4s

使用 kubectl describe custom-resource 检查 pod 的事件以确定任何问题：

$ kubectl describe vdb
Name:         vertica-crd
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  vertica.com/v1beta1
Kind:         VerticaDB
Metadata:
  ...
  Superuser Password Secret:  su-passwd
Events:
  Type     Reason                           Age                From                Message
  ----     ------                           ----               ----                -------
  Warning  SuperuserPasswordSecretNotFound  5s (x12 over 15s)  verticadb-operator  Secret for superuser password 'su-passwd' was not found

在这种情况下，自定义资源使用名为 su-passwd 的 Secret 来存储 Superuser Password Secret，但没有这样的 Secret 可用。创建名为 su-passwd 的 Secret 来存储 Secret：

$ kubectl create secret generic su-passwd --from-literal=password=sup3rs3cr3t
secret/su-passwd created

使用 kubectl get custom-resource 来验证问题是否已解决：

$ kubectl get vdb
NAME          AGE   SUBCLUSTERS   INSTALLED   DBADDED   UP
vertica-crd   89s   1             0           0         0

映像拉取失败

使用 Helm 图表部署 Vertica 群集时收到 ImagePullBackOff 错误，但您未从本地注册表服务器预先拉取 Vertica 映像：

$ kubectl describe pod pod-name-0
...
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  ...
  Warning  Failed            2m32s                  kubelet            Failed to pull image "k8s-rhel7-01:5000/vertica-k8s:default-1": rpc error: code = Unknown desc = context canceled
  Warning  Failed            2m32s                  kubelet            Error: ErrImagePull
  Normal   BackOff           2m32s                  kubelet            Back-off pulling image "k8s-rhel7-01:5000/vertica-k8s:default-1"
  Warning  Failed            2m32s                  kubelet            Error: ImagePullBackOff
  Normal   Pulling           2m18s (x2 over 4m22s)  kubelet            Pulling image "k8s-rhel7-01:5000/vertica-k8s:default-1"

出现这种情况是因为 Vertica 映像太大而无法在部署 Vertica 群集时从注册表中拉取。在 Kubernetes 主机上执行以下命令：

$ docker image list | grep vertica-k8s
k8s-rhel7-01:5000/vertica-k8s default-1 2d6f5d3d90d6 9 days ago 1.55GB

要解决此问题，请完成以下操作之一：

在创建 Vertica StatefulSet 之前拉取每个节点上的 Vertica 映像：

$ NODES=`kubectl get nodes | grep -v NAME | awk '{print $1}'`
$ for node in $NODES; do ssh $node docker pull $DOCKER_REGISTRY:5000/vertica-k8s:$K8S_TAG; done

为 Vertica 服务器使用大小减小的 vertica/vertica-k8s:latest 映像。

由于 CPU 不足而挂起 pod

如果您的主机节点没有足够的资源来满足来自 pod 的资源请求，则 pod 将保持挂起状态。

注意

最佳实践是，不要请求主机节点上可用的最大资源量来为主机节点上的其他进程保留资源。

在以下示例中，pod 请求主机节点上的 40 个 CPU，并且 pod 停留在 Pending 状态：

$ kubectl describe pod cluster-vertica-defaultsubcluster-0
...
Status:         Pending
...
Containers:
  server:
    Image:       docker.io/library/vertica-k8s:default-1
    Ports:       5433/TCP, 5434/TCP, 22/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP
    Command:
      /opt/vertica/bin/docker-entrypoint.sh
      restart-vertica-node
    Limits:
      memory:  200Gi
    Requests:
      cpu: 40
      memory:  200Gi
...
Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  3h20m  default-scheduler  0/5 nodes are available: 5 Insufficient cpu.

确认主机节点上可用的资源。以下命令确认主机节点只有 40 个可分配 CPU：

$ kubectl describe node host-node-1
...
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Sat, 20 Mar 2021 22:39:10 -0400   Sat, 20 Mar 2021 13:07:02 -0400   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Sat, 20 Mar 2021 22:39:10 -0400   Sat, 20 Mar 2021 13:07:02 -0400   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Sat, 20 Mar 2021 22:39:10 -0400   Sat, 20 Mar 2021 13:07:02 -0400   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Sat, 20 Mar 2021 22:39:10 -0400   Sat, 20 Mar 2021 13:07:12 -0400   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  172.19.0.5
  Hostname:    eng-g9-191
Capacity:
  cpu:                40
  ephemeral-storage:  285509064Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             263839236Ki
  pods:               110
Allocatable:
  cpu:                40
  ephemeral-storage:  285509064Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             263839236Ki
  pods:               110
...
Non-terminated Pods:          (3 in total)
  Namespace                   Name                                   CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                                   ------------  ----------  ---------------  -------------  ---
  default                     cluster-vertica-defaultsubcluster-0    38 (95%)      0 (0%)      200Gi (79%)      200Gi (79%)    51m
  kube-system                 kube-flannel-ds-8brv9                  100m (0%)     100m (0%)   50Mi (0%)        50Mi (0%)      9h
  kube-system                 kube-proxy-lgjhp                       0 (0%)        0 (0%)      0 (0%)           0 (0%)         9h
...

要更正此问题，请将子群集中的 resource.requests 减少到低于最大可分配 CPU 的值。以下示例使用名为 patch.yaml 的 YAML 格式文件来降低 pod 的资源请求：

$ cat patch.yaml
spec:
  subclusters:
    - name: defaultsubcluster
      resources:
        requests:
          memory: 238Gi
          cpu: "38"
        limits:
          memory: 238Gi
$ kubectl patch vdb cluster-vertica –-type=merge --patch “$(cat patch.yaml)”
verticadb.vertica.com/cluster-vertica patched

添加和测试 vlogger sidecar

Vertica 提供了 vlogger 映像，该映像将日志从 vertica.log 发送到主机节点上的标准输出以进行日志聚合。

要将 sidecar 添加到 CR，请在 spec.sidecars 定义中添加一个元素：

spec:
  ...
  sidecars:
    - name: vlogger
      image: vertica/vertica-logger:1.0.0

要测试 sidecar，请运行以下命令并验证它是否返回日志：

$ kubectl logs pod-name -c vlogger

2021-12-08 14:39:08.538 DistCall Dispatch:0x7f3599ffd700-c000000000997e [Txn
2021-12-08 14:40:48.923 INFO New log
2021-12-08 14:40:48.923 Main Thread:0x7fbbe2cf6280 [Init] <INFO> Log /data/verticadb/v_verticadb_node0002_catalog/vertica.log opened; #1
2021-12-08 14:40:48.923 Main Thread:0x7fbbe2cf6280 [Init] <INFO> Processing command line: /opt/vertica/bin/vertica -D /data/verticadb/v_verticadb_node0002_catalog -C verticadb -n v_verticadb_node0002 -h 10.20.30.40 -p 5433 -P 4803 -Y ipv4
2021-12-08 14:40:48.923 Main Thread:0x7fbbe2cf6280 [Init] <INFO> Starting up Vertica Analytic Database v11.0.2-20211201
2021-12-08 14:40:48.923 Main Thread:0x7fbbe2cf6280 [Init] <INFO>
2021-12-08 14:40:48.923 Main Thread:0x7fbbe2cf6280 [Init] <INFO> vertica(v11.0.2) built by @re-docker5 from master@a44ffabdf3f05e8d104426506b088192f741c485 on 'Wed Dec  1 06:10:34 2021' $BuildId$
2021-12-08 14:40:48.923 Main Thread:0x7fbbe2cf6280 [Init] <INFO> CPU architecture: x86_64
2021-12-08 14:40:48.923 Main Thread:0x7fbbe2cf6280 [Init] <INFO> 64-bit Optimized Build
2021-12-08 14:40:48.923 Main Thread:0x7fbbe2cf6280 [Init] <INFO> Compiler Version: 7.3.1 20180303 (Red Hat 7.3.1-5)
2021-12-08 14:40:48.923 Main Thread:0x7fbbe2cf6280 [Init] <INFO> LD_LIBRARY_PATH=/opt/vertica/lib
2021-12-08 14:40:48.923 Main Thread:0x7fbbe2cf6280 [Init] <INFO> LD_PRELOAD=
2021-12-08 14:40:48.925 Main Thread:0x7fbbe2cf6280 <LOG> @v_verticadb_node0002: 00000/5081: Total swap memory used: 0
2021-12-08 14:40:48.925 Main Thread:0x7fbbe2cf6280 <LOG> @v_verticadb_node0002: 00000/4435: Process size resident set: 28651520
2021-12-08 14:40:48.925 Main Thread:0x7fbbe2cf6280 <LOG> @v_verticadb_node0002: 00000/5075: Total Memory free + cache: 59455180800
2021-12-08 14:40:48.925 Main Thread:0x7fbbe2cf6280 [Txn] <INFO> Looking for catalog at: /data/verticadb/v_verticadb_node0002_catalog/Catalog
...

使用 VerticaAutoscaler 找不到 CPU 指标

您可能会注意到 VerticaAutoScaler 没有根据 CPU 利用率正确缩放：

$ kubectl get hpa
NAME                REFERENCE                           TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
autoscaler-name     VerticaAutoscaler/autoscaler-name   <unknown>/50%   3         12        0          19h

$ kubectl describe hpa
Warning: autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler
Name: autoscaler-name
Namespace: namespace
Labels: <none>
Annotations: <none>
CreationTimestamp: Thu, 12 May 2022 10:25:02 -0400
Reference: VerticaAutoscaler/autoscaler-name
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): <unknown> / 50%
Min replicas: 3
Max replicas: 12
VerticaAutoscaler pods: 3 current / 0 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 7s horizontal-pod-autoscaler failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
Warning FailedComputeMetricsReplicas 7s horizontal-pod-autoscaler invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)

您收到此错误是因为未安装指标服务器：

$ kubectl top nodes
error: Metrics API not available

要安装指标服务器：

下载 components.yaml 文件：

$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

（可选）禁用 TLS：

$ if ! grep kubelet-insecure-tls components.yaml; then
  sed -i 's/- args:/- args:\n - --kubelet-insecure-tls/' components.yaml;

应用 YAML 文件：
```
$ kubectl apply -f components.yaml
```

验证指标服务器是否正在运行：

$ kubectl get svc metrics-server -n namespace
NAME             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
metrics-server   ClusterIP   10.105.239.175   <none>        443/TCP   19h

VerticaAutoscaler 的 CPU 请求错误

您可能会收到一条错误消息，指出：

failed to get cpu utilization: missing request for cpu

收到此错误是因为您必须对所有容器（包括 sidecar 容器）设置资源限制。要更正此错误：

验证错误：

$ kubectl get hpa
NAME                REFERENCE                           TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
autoscaler-name     VerticaAutoscaler/autoscaler-name   <unknown>/50%   3         12        0          19h

$ kubectl describe hpa
Warning: autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler
Name: autoscaler-name
Namespace: namespace
Labels: <none>
Annotations: <none>
CreationTimestamp: Thu, 12 May 2022 15:58:31 -0400
Reference: VerticaAutoscaler/autoscaler-name
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): <unknown> / 50%
Min replicas: 3
Max replicas: 12
VerticaAutoscaler pods: 3 current / 0 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: failed to get cpu utilization: missing request for cpu
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 4s (x5 over 64s) horizontal-pod-autoscaler failed to get cpu utilization: missing request for cpu
Warning FailedComputeMetricsReplicas 4s (x5 over 64s) horizontal-pod-autoscaler invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: missing request for cpu

向 CR 添加资源限制：

$ cat /tmp/vdb.yaml
apiVersion: vertica.com/v1beta1
kind: VerticaDB
metadata:
  name: vertica-vdb
spec:
  sidecars:
    - name: vlogger
      image: vertica/vertica-logger:latest
      resources:
        requests:
          memory: "100Mi"
          cpu: "100m"
        limits:
          memory: "100Mi"
          cpu: "100m"
  communal:
    credentialSecret: communal-creds
    endpoint: https://endpoint
        path: s3://bucket-location
  dbName: verticadb
  image: vertica/vertica-k8s:latest
  subclusters:
  - isPrimary: true
    name: sc1
    resources:
      requests:
        memory: "4Gi"
        cpu: 2
      limits:
        memory: "4Gi"
        cpu: 2
    serviceType: ClusterIP
    serviceName: sc1
    size: 3
  upgradePolicy: Auto

应用更新：

$ kubectl apply -f /tmp/vdb.yaml
verticadb.vertica.com/vertica-vdb created

当您设置新的 CPU 资源限制时，Kubernetes 会以滚动更新的方式重新调度 StatefulSet 中的每个 pod，直到所有 pod 都具有更新的 CPU 资源限制。