Kubernetes 群集故障排除

这些提示帮助您避免有关 Vertica 在 Kerberos 上部署的问题,并对产生的任何问题进行故障排除。

下载 kubectl 命令行工具以调试您的 Kubernetes 资源。

Helm 安装失败

安装 VerticaDB 运算符和准入控制器 Helm 图表时,helm install 命令可能会返回以下错误:

$ helm install vdb-op vertica-charts/verticadb-operator
Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: [unable to recognize "": no matches for kind "Certificate" in version "cert-manager.io/v1", unable to recognize "": no matches for kind "Issuer" in version "cert-manager.io/v1"]

该错误表明您尚未满足准入控制器 webhook 的 TLS 先决条件。要解决此问题,请安装 cert-manager 或配置自定义证书。以下步骤安装 cert-manager。

  1. 安装 cert-manager YAML 清单:

    $ kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.5.3/cert-manager.yaml
    
  2. 验证 cert-manager 安装。

    如果您在安装 cert-manager 后立即尝试安装 Helm 图表,则可能收到以下错误:

    $ helm install vdb-op vertica-charts/verticadb-operator
    Error: failed to create resource: Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.232.154:443: connect: connection refused
    

    您收到此错误是因为 cert-manager 需要时间来创建其 pod 并将 webhook 注册到群集。等待几分钟,然后使用以下命令验证 cert-manager 安装:

    $ kubectl get pods --namespace cert-manager
    NAME                                       READY   STATUS    RESTARTS   AGE
    cert-manager-7dd5854bb4-skks7              1/1     Running   5          12d
    cert-manager-cainjector-64c949654c-9nm2z   1/1     Running   5          12d
    cert-manager-webhook-6bdffc7c9d-b7r2p      1/1     Running   5          12d
    

    有关 cert-manager 安装验证的更多详细信息,请参阅 cert-manager 文档

  3. 验证 cert-manager 安装后,您必须卸载 Helm 图表,然后重新安装:

    $ helm uninstall vdb-op
    $ helm install vdb-op vertica-charts/verticadb-operator
    

有关其他信息,请参阅安装 Vertica DB 操作器

自定义证书 Helm 安装错误

如果您在使用 Helm 图表安装运算符时使用自定义证书,则 helm installkubectl apply 命令可能会返回类似以下内容的错误:

$ kubectl apply -f ../operatorcrd.yaml
Error from server (InternalError): error when creating "../operatorcrd.yaml": Internal error occurred: failed calling webhook "mverticadb.kb.io": Post "https://verticadb-operator-webhook-service.namespace.svc:443/mutate-vertica-com-v1beta1-verticadb?timeout=10s": x509: certificate is valid for ip-10-0-21-169.ec2.internal, test-bastion, not verticadb-operator-webhook-service.default.svc

当 TLS 密钥的域名系统 (DNS) 或主题备用名称 (SAN) 不正确时,您会收到此错误。要更正此错误,请按以下格式在配置文件中定义 DNS 和 SAN:

commonName = verticadb-operator-webhook-service.namespace.svc
...
[alt_names]
DNS.1 = verticadb-operator-webhook-service.namespace.svc
DNS.2 = verticadb-operator-webhook-service.namespace.svc.cluster.local

有关更多详细信息,请参阅安装 Vertica DB 操作器

验证对自定义资源的更新

由于操作员需要时间来执行任务,因此对自定义资源的更新不会立即生效。使用 kubectl 命令行工具验证是否应用了更改。

您可以使用 kubectl wait 命令等待指定的条件。例如,操作员使用 ImageChangeInProgress 条件来提供升级状态。开始映像版本升级后,等待操作员确认升级并将此条件设置为 True:

$ kubectl wait --for=condition=ImageChangeInProgress=True vdb/cluster-name –-timeout=180s

升级开始后,您可以等到操作员离开升级模式并将此条件设置为 False:

$ kubectl wait --for=condition=ImageChangeInProgress=False vdb/cluster-name –-timeout=800s

有关 kubectl wait 的详细信息,请参阅 kubectl 参考文档

pod 正在运行,但数据库尚未就绪

当您检查群集中的 pod 时,pod 正在运行,但数据库尚未就绪:

$ kubectl get pods
NAME                                                    READY   STATUS    RESTARTS   AGE
vertica-crd-sc1-0                                       0/1     Running   0          12m
vertica-crd-sc1-1                                       0/1     Running   1          12m
vertica-crd-sc1-2                                       0/1     Running   0          12m
verticadb-operator-controller-manager-5d9cdc9b8-kw9nv   2/2     Running   0          24m

要查找问题的根本原因,请使用 kubectl logs 检查操作员管理器。以下示例显示公共存储桶不存在:

$ kubectl logs -l app.kubernetes.io/name=verticadb-operator -c manager -f
2021-08-04T20:03:00.289Z        INFO    controllers.VerticaDB   ExecInPod entry {"verticadb": "default/vertica-crd", "pod": {"namespace": "default", "name": "vertica-crd-sc1-0"}, "command": "bash -c ls -l /opt/vertica/config/admintools.conf && grep '^node\\|^v_\\|^host' /opt/vertica/config/admintools.conf "}
2021-08-04T20:03:00.369Z        INFO    controllers.VerticaDB   ExecInPod stream        {"verticadb": "default/vertica-crd", "pod": {"namespace": "default", "name": "vertica-crd-sc1-0"}, "err": null, "stdout": "-rw-rw-r-- 1 dbadmin verticadba 1243 Aug  4 20:00 /opt/vertica/config/admintools.conf\nhosts = 10.244.1.5,10.244.2.4,10.244.4.6\nnode0001 = 10.244.1.5,/data,/data\nnode0002 = 10.244.2.4,/data,/data\nnode0003 = 10.244.4.6,/data,/data\n", "stderr": ""}
2021-08-04T20:03:00.369Z        INFO    controllers.VerticaDB   ExecInPod entry {"verticadb": "default/vertica-crd", "pod": {"namespace": "default", "name": "vertica-crd-sc1-0"}, "command": "/opt/vertica/bin/admintools -t create_db --skip-fs-checks --hosts=10.244.1.5,10.244.2.4,10.244.4.6 --communal-storage-location=s3://newbucket/db/26100df1-93e5-4e64-b665-533e14abb67c --communal-storage-params=/home/dbadmin/auth_parms.conf --sql=/home/dbadmin/post-db-create.sql --shard-count=12 --depot-path=/depot --database verticadb --force-cleanup-on-failure --noprompt --password ******* "}
2021-08-04T20:03:00.369Z        DEBUG   controller-runtime.manager.events       Normal  {"object": {"kind":"VerticaDB","namespace":"default","name":"vertica-crd","uid":"26100df1-93e5-4e64-b665-533e14abb67c","apiVersion":"vertica.com/v1beta1","resourceVersion":"11591"}, "reason": "CreateDBStart", "message": "Calling 'admintools -t create_db'"}
2021-08-04T20:03:17.051Z        INFO    controllers.VerticaDB   ExecInPod stream        {"verticadb": "default/vertica-crd", "pod": {"namespace": "default", "name": "vertica-crd-sc1-0"}, "err": "command terminated with exit code 1", "stdout": "Default depot size in use\nDistributing changes to cluster.\n\tCreating database verticadb\nBootstrap on host 10.244.1.5 return code 1 stdout '' stderr 'Logged exception in writeBufferToFile: RecvFiles failed in closing file [s3://newbucket/db/26100df1-93e5-4e64-b665-533e14abb67c/verticadb_rw_access_test.txt]: The specified bucket does not exist. Writing test data to file s3://newbucket/db/26100df1-93e5-4e64-b665-533e14abb67c/verticadb_rw_access_test.txt failed.\\nTesting rw access to communal location s3://newbucket/db/26100df1-93e5-4e64-b665-533e14abb67c/ failed\\n'\n\nError: Bootstrap on host 10.244.1.5 return code 1 stdout '' stderr 'Logged exception in writeBufferToFile: RecvFiles failed in closing file [s3://newbucket/db/26100df1-93e5-4e64-b665-533e14abb67c/verticadb_rw_access_test.txt]: The specified bucket does not exist. Writing test data to file s3://newbucket/db/26100df1-93e5-4e64-b665-533e14abb67c/verticadb_rw_access_test.txt failed.\\nTesting rw access to communal location s3://newbucket/db/26100df1-93e5-4e64-b665-533e14abb67c/ failed\\n'\n\n", "stderr": ""}
2021-08-04T20:03:17.051Z        INFO    controllers.VerticaDB   aborting reconcile of VerticaDB {"verticadb": "default/vertica-crd", "result": {"Requeue":true,"RequeueAfter":0}, "err": null}
2021-08-04T20:03:17.051Z        DEBUG   controller-runtime.manager.events       Warning {"object": {"kind":"VerticaDB","namespace":"default","name":"vertica-crd","uid":"26100df1-93e5-4e64-b665-533e14abb67c","apiVersion":"vertica.com/v1beta1","resourceVersion":"11591"}, "reason": "S3BucketDoesNotExist", "message": "The bucket in the S3 path 's3://newbucket/db/26100df1-93e5-4e64-b665-533e14abb67c' does not exist"}

为群集创建一个 S3 存储桶:

$ S3_BUCKET=newbucket
$ S3_CLUSTER_IP=$(kubectl get svc | grep minio | head -1 | awk '{print $3}')
$ export AWS_ACCESS_KEY_ID=minio
$ export AWS_SECRET_ACCESS_KEY=minio123
$ aws s3 mb s3://$S3_BUCKET --endpoint-url http://$S3_CLUSTER_IP
make_bucket: newbucket

使用 kubectl get pods 验证群集使用新的 S3 存储桶并且数据库已准备就绪:

$ kubectl get pods
NAME                                                    READY   STATUS    RESTARTS   AGE
minio-ss-0-0                                            1/1     Running   0          18m
minio-ss-0-1                                            1/1     Running   0          18m
minio-ss-0-2                                            1/1     Running   0          18m
minio-ss-0-3                                            1/1     Running   0          18m
vertica-crd-sc1-0                                       1/1     Running   0          20m
vertica-crd-sc1-1                                       1/1     Running   0          20m
vertica-crd-sc1-2                                       1/1     Running   0          20m
verticadb-operator-controller-manager-5d9cdc9b8-kw9nv   2/2     Running   0          63m

数据库不可用

创建自定义资源实例后,数据库不可用。 kubectl get custom-resource 命令不显示信息:

$ kubectl get vdb
NAME          AGE   SUBCLUSTERS   INSTALLED   DBADDED   UP
vertica-crd   4s

使用 kubectl describe custom-resource 检查 pod 的事件以确定任何问题:

$ kubectl describe vdb
Name:         vertica-crd
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  vertica.com/v1beta1
Kind:         VerticaDB
Metadata:
  ...
  Superuser Password Secret:  su-passwd
Events:
  Type     Reason                           Age                From                Message
  ----     ------                           ----               ----                -------
  Warning  SuperuserPasswordSecretNotFound  5s (x12 over 15s)  verticadb-operator  Secret for superuser password 'su-passwd' was not found

在这种情况下,自定义资源使用名为 su-passwd 的 Secret 来存储 Superuser Password Secret,但没有这样的 Secret 可用。创建名为 su-passwd 的 Secret 来存储 Secret:

$ kubectl create secret generic su-passwd --from-literal=password=sup3rs3cr3t
secret/su-passwd created

使用 kubectl get custom-resource 来验证问题是否已解决:

$ kubectl get vdb
NAME          AGE   SUBCLUSTERS   INSTALLED   DBADDED   UP
vertica-crd   89s   1             0           0         0

映像拉取失败

使用 Helm 图表部署 Vertica 群集时收到 ImagePullBackOff 错误,但您未从本地注册表服务器预先拉取 Vertica 映像:

$ kubectl describe pod pod-name-0
...
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  ...
  Warning  Failed            2m32s                  kubelet            Failed to pull image "k8s-rhel7-01:5000/vertica-k8s:default-1": rpc error: code = Unknown desc = context canceled
  Warning  Failed            2m32s                  kubelet            Error: ErrImagePull
  Normal   BackOff           2m32s                  kubelet            Back-off pulling image "k8s-rhel7-01:5000/vertica-k8s:default-1"
  Warning  Failed            2m32s                  kubelet            Error: ImagePullBackOff
  Normal   Pulling           2m18s (x2 over 4m22s)  kubelet            Pulling image "k8s-rhel7-01:5000/vertica-k8s:default-1"

出现这种情况是因为 Vertica 映像太大而无法在部署 Vertica 群集时从注册表中拉取。在 Kubernetes 主机上执行以下命令:

$ docker image list | grep vertica-k8s
k8s-rhel7-01:5000/vertica-k8s default-1 2d6f5d3d90d6 9 days ago 1.55GB

要解决此问题,请完成以下操作之一:

  • 在创建 Vertica StatefulSet 之前拉取每个节点上的 Vertica 映像:

    $ NODES=`kubectl get nodes | grep -v NAME | awk '{print $1}'`
    $ for node in $NODES; do ssh $node docker pull $DOCKER_REGISTRY:5000/vertica-k8s:$K8S_TAG; done
    
  • 为 Vertica 服务器使用大小减小的 vertica/vertica-k8s:latest 映像。

由于 CPU 不足而挂起 pod

如果您的主机节点没有足够的资源来满足来自 pod 的资源请求,则 pod 将保持挂起状态。

在以下示例中,pod 请求主机节点上的 40 个 CPU,并且 pod 停留在 Pending 状态:

$ kubectl describe pod cluster-vertica-defaultsubcluster-0
...
Status:         Pending
...
Containers:
  server:
    Image:       docker.io/library/vertica-k8s:default-1
    Ports:       5433/TCP, 5434/TCP, 22/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP
    Command:
      /opt/vertica/bin/docker-entrypoint.sh
      restart-vertica-node
    Limits:
      memory:  200Gi
    Requests:
      cpu: 40
      memory:  200Gi
...
Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  3h20m  default-scheduler  0/5 nodes are available: 5 Insufficient cpu.

确认主机节点上可用的资源。以下命令确认主机节点只有 40 个可分配 CPU:

$ kubectl describe node host-node-1
...
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Sat, 20 Mar 2021 22:39:10 -0400   Sat, 20 Mar 2021 13:07:02 -0400   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Sat, 20 Mar 2021 22:39:10 -0400   Sat, 20 Mar 2021 13:07:02 -0400   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Sat, 20 Mar 2021 22:39:10 -0400   Sat, 20 Mar 2021 13:07:02 -0400   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Sat, 20 Mar 2021 22:39:10 -0400   Sat, 20 Mar 2021 13:07:12 -0400   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  172.19.0.5
  Hostname:    eng-g9-191
Capacity:
  cpu:                40
  ephemeral-storage:  285509064Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             263839236Ki
  pods:               110
Allocatable:
  cpu:                40
  ephemeral-storage:  285509064Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             263839236Ki
  pods:               110
...
Non-terminated Pods:          (3 in total)
  Namespace                   Name                                   CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                                   ------------  ----------  ---------------  -------------  ---
  default                     cluster-vertica-defaultsubcluster-0    38 (95%)      0 (0%)      200Gi (79%)      200Gi (79%)    51m
  kube-system                 kube-flannel-ds-8brv9                  100m (0%)     100m (0%)   50Mi (0%)        50Mi (0%)      9h
  kube-system                 kube-proxy-lgjhp                       0 (0%)        0 (0%)      0 (0%)           0 (0%)         9h
...

要更正此问题,请将子群集中的 resource.requests 减少到低于最大可分配 CPU 的值。以下示例使用名为 patch.yaml 的 YAML 格式文件来降低 pod 的资源请求:

$ cat patch.yaml
spec:
  subclusters:
    - name: defaultsubcluster
      resources:
        requests:
          memory: 238Gi
          cpu: "38"
        limits:
          memory: 238Gi
$ kubectl patch vdb cluster-vertica –-type=merge --patch “$(cat patch.yaml)”
verticadb.vertica.com/cluster-vertica patched

添加和测试 vlogger sidecar

Vertica 提供了 vlogger 映像,该映像将日志从 vertica.log 发送到主机节点上的标准输出以进行日志聚合。

要将 sidecar 添加到 CR,请在 spec.sidecars 定义中添加一个元素:

spec:
  ...
  sidecars:
    - name: vlogger
      image: vertica/vertica-logger:1.0.0

要测试 sidecar,请运行以下命令并验证它是否返回日志:

$ kubectl logs pod-name -c vlogger

2021-12-08 14:39:08.538 DistCall Dispatch:0x7f3599ffd700-c000000000997e [Txn
2021-12-08 14:40:48.923 INFO New log
2021-12-08 14:40:48.923 Main Thread:0x7fbbe2cf6280 [Init] <INFO> Log /data/verticadb/v_verticadb_node0002_catalog/vertica.log opened; #1
2021-12-08 14:40:48.923 Main Thread:0x7fbbe2cf6280 [Init] <INFO> Processing command line: /opt/vertica/bin/vertica -D /data/verticadb/v_verticadb_node0002_catalog -C verticadb -n v_verticadb_node0002 -h 10.20.30.40 -p 5433 -P 4803 -Y ipv4
2021-12-08 14:40:48.923 Main Thread:0x7fbbe2cf6280 [Init] <INFO> Starting up Vertica Analytic Database v11.0.2-20211201
2021-12-08 14:40:48.923 Main Thread:0x7fbbe2cf6280 [Init] <INFO>
2021-12-08 14:40:48.923 Main Thread:0x7fbbe2cf6280 [Init] <INFO> vertica(v11.0.2) built by @re-docker5 from master@a44ffabdf3f05e8d104426506b088192f741c485 on 'Wed Dec  1 06:10:34 2021' $BuildId$
2021-12-08 14:40:48.923 Main Thread:0x7fbbe2cf6280 [Init] <INFO> CPU architecture: x86_64
2021-12-08 14:40:48.923 Main Thread:0x7fbbe2cf6280 [Init] <INFO> 64-bit Optimized Build
2021-12-08 14:40:48.923 Main Thread:0x7fbbe2cf6280 [Init] <INFO> Compiler Version: 7.3.1 20180303 (Red Hat 7.3.1-5)
2021-12-08 14:40:48.923 Main Thread:0x7fbbe2cf6280 [Init] <INFO> LD_LIBRARY_PATH=/opt/vertica/lib
2021-12-08 14:40:48.923 Main Thread:0x7fbbe2cf6280 [Init] <INFO> LD_PRELOAD=
2021-12-08 14:40:48.925 Main Thread:0x7fbbe2cf6280 <LOG> @v_verticadb_node0002: 00000/5081: Total swap memory used: 0
2021-12-08 14:40:48.925 Main Thread:0x7fbbe2cf6280 <LOG> @v_verticadb_node0002: 00000/4435: Process size resident set: 28651520
2021-12-08 14:40:48.925 Main Thread:0x7fbbe2cf6280 <LOG> @v_verticadb_node0002: 00000/5075: Total Memory free + cache: 59455180800
2021-12-08 14:40:48.925 Main Thread:0x7fbbe2cf6280 [Txn] <INFO> Looking for catalog at: /data/verticadb/v_verticadb_node0002_catalog/Catalog
...

使用 VerticaAutoscaler 找不到 CPU 指标

您可能会注意到 VerticaAutoScaler 没有根据 CPU 利用率正确缩放:

$ kubectl get hpa
NAME                REFERENCE                           TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
autoscaler-name     VerticaAutoscaler/autoscaler-name   <unknown>/50%   3         12        0          19h

$ kubectl describe hpa
Warning: autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler
Name: autoscaler-name
Namespace: namespace
Labels: <none>
Annotations: <none>
CreationTimestamp: Thu, 12 May 2022 10:25:02 -0400
Reference: VerticaAutoscaler/autoscaler-name
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): <unknown> / 50%
Min replicas: 3
Max replicas: 12
VerticaAutoscaler pods: 3 current / 0 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 7s horizontal-pod-autoscaler failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
Warning FailedComputeMetricsReplicas 7s horizontal-pod-autoscaler invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)

您收到此错误是因为未安装指标服务器:

$ kubectl top nodes
error: Metrics API not available

要安装指标服务器:

  1. 下载 components.yaml 文件:

    $ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
    
  2. (可选)禁用 TLS:

    $ if ! grep kubelet-insecure-tls components.yaml; then
      sed -i 's/- args:/- args:\n - --kubelet-insecure-tls/' components.yaml;
    
  3. 应用 YAML 文件:

    $ kubectl apply -f components.yaml
    
  4. 验证指标服务器是否正在运行:

    $ kubectl get svc metrics-server -n namespace
    NAME             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
    metrics-server   ClusterIP   10.105.239.175   <none>        443/TCP   19h
    

VerticaAutoscaler 的 CPU 请求错误

您可能会收到一条错误消息,指出:

failed to get cpu utilization: missing request for cpu

收到此错误是因为您必须对所有容器(包括 sidecar 容器)设置资源限制。要更正此错误:

  1. 验证错误:

    $ kubectl get hpa
    NAME                REFERENCE                           TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
    autoscaler-name     VerticaAutoscaler/autoscaler-name   <unknown>/50%   3         12        0          19h
    
    $ kubectl describe hpa
    Warning: autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler
    Name: autoscaler-name
    Namespace: namespace
    Labels: <none>
    Annotations: <none>
    CreationTimestamp: Thu, 12 May 2022 15:58:31 -0400
    Reference: VerticaAutoscaler/autoscaler-name
    Metrics: ( current / target )
    resource cpu on pods (as a percentage of request): <unknown> / 50%
    Min replicas: 3
    Max replicas: 12
    VerticaAutoscaler pods: 3 current / 0 desired
    Conditions:
    Type Status Reason Message
    ---- ------ ------ -------
    AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
    ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: failed to get cpu utilization: missing request for cpu
    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Warning FailedGetResourceMetric 4s (x5 over 64s) horizontal-pod-autoscaler failed to get cpu utilization: missing request for cpu
    Warning FailedComputeMetricsReplicas 4s (x5 over 64s) horizontal-pod-autoscaler invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: missing request for cpu
    
  2. 向 CR 添加资源限制:

    $ cat /tmp/vdb.yaml
    apiVersion: vertica.com/v1beta1
    kind: VerticaDB
    metadata:
      name: vertica-vdb
    spec:
      sidecars:
        - name: vlogger
          image: vertica/vertica-logger:latest
          resources:
            requests:
              memory: "100Mi"
              cpu: "100m"
            limits:
              memory: "100Mi"
              cpu: "100m"
      communal:
        credentialSecret: communal-creds
        endpoint: https://endpoint
            path: s3://bucket-location
      dbName: verticadb
      image: vertica/vertica-k8s:latest
      subclusters:
      - isPrimary: true
        name: sc1
        resources:
          requests:
            memory: "4Gi"
            cpu: 2
          limits:
            memory: "4Gi"
            cpu: 2
        serviceType: ClusterIP
        serviceName: sc1
        size: 3
      upgradePolicy: Auto
    
  3. 应用更新:

    $ kubectl apply -f /tmp/vdb.yaml
    verticadb.vertica.com/vertica-vdb created
    

当您设置新的 CPU 资源限制时,Kubernetes 会以滚动更新的方式重新调度 StatefulSet 中的每个 pod,直到所有 pod 都具有更新的 CPU 资源限制。