Prometheus integration

Vertica on Kubernetes integrates with Prometheus to scrape time series metrics about the VerticaDB operator.

Vertica on Kubernetes integrates with Prometheus to scrape time series metrics about the VerticaDB operator and Vertica server process. These metrics create a detailed model of your application over time to provide valuable performance and troubleshooting insights as well as facilitate internal and external communications and service discovery in microservice and containerized architectures.

Prometheus requires that you set up targets—metrics that you want to monitor. Each target is exposed on an endpoint, and Prometheus periodically scrapes that endpoint to collect target data. Vertica exports metrics and provides access methods for both the VerticaDB operator and server process.

Server metrics

Vertica exports server metrics on port 8443 at the following endpoint:

https://host-address:8443/api-version/metrics

Only the superuser can authenticate to the HTTPS service, and the service accepts only mutual TLS (mTLS) authentication. The setup for both Vertica on Kubernetes and non-containerized Vertica environments is identical. For details, see HTTPS service.

Vertica on Kubernetes lets you set a custom port for its HTTP service with the subclusters[i].verticaHTTPNodePort custom resource parameter. This parameter sets a custom port for the HTTPS service for NodePort serviceTypes.

For request and response examples, see the /metrics endpoint description. For a list of available metrics, see Prometheus metrics.

Grafana dashboards

You can visualize Vertica server time series metrics with Grafana dashboards. Vertica dashboards that use a Prometheus data source are available at Grafana Dashboards:

You can also download the source for each dashboard from the vertica/grafana-dashboards repository.

Operator metrics

The VerticaDB operator supports the Operator SDK framework, which requires that an authorization proxy impose role-based-access control (RBAC) to access operator metrics over HTTPS. To increase flexibility, Vertica provides the following options to access the Prometheus /metrics endpoint:

HTTPS access: Meet operator SDK requirements and use a sidecar container as an RBAC proxy to authorize connections.
HTTP access: Expose the /metrics endpoint to external connections without RBAC. Any client with network access can read from /metrics.
Disable Prometheus entirely.

Vertica provides Helm chart parameters and YAML manifests to configure each option.

Note

If you installed the VerticaDB operator with OperatorHub.io, you can use the Prometheus integration with the default Helm chart settings. OperatorHub.io installations cannot configure any Helm chart parameters.

Prerequisites

Complete Installing the VerticaDB operator.
Install the kubectl command line tool.

HTTPS with RBAC

The operator SDK framework requires that operators use an authorization proxy for metrics access. Because the operator sends metrics to localhost only, Vertica meets these requirements with a sidecar container with localhost access that enforces RBAC.

RBAC rules are cluster-scoped, and the sidecar authorizes connections from clients associated with a service account that has the correct ClusterRole and ClusterRoleBindings. Vertica provides the following example manifests:

verticadb-operator-proxy-role-cr: ClusterRole that has TokenReviews and SubjectAccessReviews access so that the sidecar can verify privileges on connections.
verticadb-operator-proxy-rolebinding-crb: ClusterRoleBinding that associates the ClusterRole that verifies sidecar privileges to a service account.
verticadb-operator-metrics-reader-cr: ClusterRole that allows HTTP GET requests on the /metrics endpoint for non-Kubernetes resources.
verticadb-operator-metrics-reader-crb: ClusterRoleBinding that associates the metrics reader ClusterRole with a service account.

For additional details about ClusterRoles and ClusterRoleBindings, see the Kubernetes documentation.

Create RBAC rules

Note

This section details how to create RBAC rules for environments that require that you set up ClusterRole and ClusterRoleBinding objects outside of the Helm chart installation.

The following steps create the ClusterRole and ClusterRoleBindings objects that grant access to the /metrics endpoint to a non-Kubernetes resource such as Prometheus. Because RBAC rules are cluster-scoped, you must create or add to an existing ClusterRoleBinding:

Create a ClusterRoleBinding that binds the role for the RBAC sidecar proxy with a service account:

Create a ClusterRoleBinding:

$ kubectl create clusterrolebinding verticadb-operator-proxy-rolebinding \
    --clusterrole=verticadb-operator-proxy-role \
    --serviceaccount=namespace:serviceaccount

Add a service account to an existing ClusterRoleBinding:

$ kubectl patch clusterrolebinding verticadb-operator-proxy-rolebinding \
    --type='json' \
    -p='[{"op": "add", "path": "/subjects/-", "value": {"kind": "ServiceAccount", "name": "serviceaccount","namespace": "namespace" } }]'

Create a ClusterRoleBinding that binds the role for the non-Kubernetes object to the RBAC sidecar proxy service account:

Create a ClusterRoleBinding:

$ kubectl create clusterrolebinding verticadb-operator-metrics-reader \
    --clusterrole=verticadb-operator-metrics-reader \
    --serviceaccount=namespace:serviceaccount \
    --group=system:authenticated

Bind the service account to an existing ClusterRoleBinding:

$ kubectl patch clusterrolebinding verticadb-operator-metrics-reader \
    --type='json' \
    -p='[{"op": "add", "path": "/subjects/-", "value": {"kind": "ServiceAccount", "name": "serviceaccount","namespace": "namespace"},{"op":"add","path":"/subjects/-","value":{"kind": "Group", "name": "system:authenticated"} }]'

$ kubectl patch clusterrolebinding verticadb-operator-metrics-reader \
    --type='json' \
    -p='[{"op": "add", "path": "/subjects/-", "value": {"kind": "ServiceAccount", "name": "serviceaccount","namespace": "namespace" } }]'

When you install the Helm chart, the ClusterRole and ClusterRoleBindings are created automatically. By default, the prometheus.expose parameter is set to EnableWithProxy, which creates the service object and exposes the operator's /metrics endpoint.

For details about creating a sidecar container, see VerticaDB custom resource definition.

Service object

Connect to the /metrics endpoint at port 8443 with the following path:

https://verticadb-operator-metrics-service.namespace.svc.cluster.local:8443/metrics

Bearer token authentication

Kubernetes authenticates requests to the API server with service account credentials. Each pod is associated with a service account and has the following credentials stored in the filesystem of each container in the pod:

Token at /var/run/secrets/kubernetes.io/serviceaccount/token
Certificate authority (CA) bundle at /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

Use these credentials to authenticate to the /metrics endpoint through the service object. You must use the credentials for the service account that you used to create the ClusterRoleBindings.

For example, the following cURL request accesses the /metrics endpoint. Include the --insecure option only if you do not want to verify the serving certificate:

$ curl --insecure --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://verticadb-operator-metrics-service.vertica:8443/metrics

For additional details about service account credentials, see the Kubernetes documentation.

TLS client certificate authentication

Some environments might prevent you from authenticating to the /metrics endpoint with the service account token. For example, you might run Prometheus outside of Kubernetes. To allow external client connections to the /metrics endpoint, you have to supply the RBAC proxy sidecar with TLS certificates.

You must create a Secret that contains the certificates, and then use the prometheus.tlsSecret Helm chart parameter to pass the Secret to the RBAC proxy sidecar when you install the Helm chart. The following steps create the Secret and install the Helm chart:

Create a Secret that contains the certificates:

$ kubectl create secret generic metrics-tls --from-file=tls.key=/path/to/tls.key --from-file=tls.crt=/path/to/tls.crt --from-file=ca.crt=/path/to/ca.crt

Install the Helm chart with prometheus.tlsSecret set to the Secret that you just created:
```
$ helm install operator-name --namespace namespace --create-namespace vertica-charts/verticadb-operator \
  --set prometheus.tlsSecret=metrics-tls
```
The prometheus.tlsSecret parameter forces the RBAC proxy to use the TLS certificates stored in the Secret. Otherwise, the RBAC proxy sidecar generates its own self-signed certificate.

After you install the Helm chart, you can authenticate to the /metrics endpoint with the certificates in the Secret. For example:

$ curl --key tls.key --cert tls.crt --cacert ca.crt https://verticadb-operator-metrics-service.vertica.svc:8443/metrics

HTTP access

You might have an environment that does not require privileged access to Prometheus metrics. For example, you might run Prometheus outside of Kubernetes.

To allow external access to the /metrics endpoint with HTTP, set prometheus.expose to EnableWithoutAuth. For example:

$ helm install operator-name --namespace namespace --create-namespace vertica-charts/verticadb-operator \
    --set prometheus.expose=EnableWithoutAuth

Service object

Vertica provides a service object verticadb-operator-metrics-service to access the Prometheus /metrics endpoint. The VerticaDB operator does not manage this service object. By default, the service object uses the ClusterIP service type, so you must change the serviceType for external client access. The service object's fully-qualified domain name (FQDN) is as follows:

verticadb-operator-metrics-service.namespace.svc.cluster.local

Connect to the /metrics endpoint at port 8443 with the following path:

http://verticadb-operator-metrics-service.namespace.svc.cluster.local:8443/metrics

Prometheus operator integration (optional)

Vertica on Kubernetes integrates with the Prometheus operator, which provides custom resources (CRs) that simplify targeting metrics. Vertica supports the ServiceMonitor CR that discovers the VerticaDB operator automatically, and authenticates requests with a bearer token.

The ServiceMonitor CR is available as a release artifact in our GitHub repository. See Helm chart parameters for details about the prometheus.createServiceMonitor parameter.

Disabling Prometheus

To disable Prometheus, set the prometheus.expose Helm chart parameter to Disable:

$ helm install operator-name --namespace namespace --create-namespace vertica-charts/verticadb-operator \
    --set prometheus.expose=Disable

For details about Helm install commands, see Installing the VerticaDB operator.

Metrics

The following table describes the available VerticaDB operator metrics:

Name	Type	Description
`controller_runtime_active_workers`	gauge	Number of currently used workers per controller.
`controller_runtime_max_concurrent_reconciles`	gauge	Maximum number of concurrent reconciles per controller.
`controller_runtime_reconcile_errors_total`	counter	Total number of reconciliation errors per controller.
`controller_runtime_reconcile_time_seconds`	histogram	Length of time per reconciliation per controller.
`controller_runtime_reconcile_total`	counter	Total number of reconciliations per controller.
`controller_runtime_webhook_latency_seconds`	histogram	Histogram of the latency of processing admission requests.
`controller_runtime_webhook_requests_in_flight`	gauge	Current number of admission requests being served.
`controller_runtime_webhook_requests_total`	counter	Total number of admission requests by HTTP status code.
`go_gc_duration_seconds`	summary	A summary of the pause duration of garbage collection cycles.
`go_goroutines`	gauge	Number of goroutines that currently exist.
`go_info`	gauge	Information about the Go environment.
`go_memstats_alloc_bytes`	gauge	Number of bytes allocated and still in use.
`go_memstats_alloc_bytes_total`	counter	Total number of bytes allocated, even if freed.
`go_memstats_buck_hash_sys_bytes`	gauge	Number of bytes used by the profiling bucket hash table.
`go_memstats_frees_total`	counter	Total number of frees.
`go_memstats_gc_sys_bytes`	gauge	Number of bytes used for garbage collection system metadata.
`go_memstats_heap_alloc_bytes`	gauge	Number of heap bytes allocated and still in use.
`go_memstats_heap_idle_bytes`	gauge	Number of heap bytes waiting to be used.
`go_memstats_heap_inuse_bytes`	gauge	Number of heap bytes that are in use.
`go_memstats_heap_objects`	gauge	Number of allocated objects.
`go_memstats_heap_released_bytes`	gauge	Number of heap bytes released to OS.
`go_memstats_heap_sys_bytes`	gauge	Number of heap bytes obtained from system.
`go_memstats_last_gc_time_seconds`	gauge	Number of seconds since 1970 of last garbage collection.
`go_memstats_lookups_total`	counter	Total number of pointer lookups.
`go_memstats_mallocs_total`	counter	Total number of mallocs.
`go_memstats_mcache_inuse_bytes`	gauge	Number of bytes in use by mcache structures.
`go_memstats_mcache_sys_bytes`	gauge	Number of bytes used for mcache structures obtained from system.
`go_memstats_mspan_inuse_bytes`	gauge	Number of bytes in use by mspan structures.
`go_memstats_mspan_sys_bytes`	gauge	Number of bytes used for mspan structures obtained from system.
`go_memstats_next_gc_bytes`	gauge	Number of heap bytes when next garbage collection will take place.
`go_memstats_other_sys_bytes`	gauge	Number of bytes used for other system allocations.
`go_memstats_stack_inuse_bytes`	gauge	Number of bytes in use by the stack allocator.
`go_memstats_stack_sys_bytes`	gauge	Number of bytes obtained from system for stack allocator.
`go_memstats_sys_bytes`	gauge	Number of bytes obtained from system.
`go_threads`	gauge	Number of OS threads created.
`process_cpu_seconds_total`	counter	Total user and system CPU time spent in seconds.
`process_max_fds`	gauge	Maximum number of open file descriptors.
`process_open_fds`	gauge	Number of open file descriptors.
`process_resident_memory_bytes`	gauge	Resident memory size in bytes.
`process_start_time_seconds`	gauge	Start time of the process since unix epoch in seconds.
`process_virtual_memory_bytes`	gauge	Virtual memory size in bytes.
`process_virtual_memory_max_bytes`	gauge	Maximum amount of virtual memory available in bytes.
`vertica_cluster_restart_attempted_total`	counter	The number of times we attempted a full cluster restart.
`vertica_cluster_restart_failed_total`	counter	The number of times we failed when attempting a full cluster restart.
`vertica_cluster_restart_seconds`	histogram	The number of seconds it took to do a full cluster restart.
`vertica_nodes_restart_attempted_total`	counter	The number of times we attempted to restart down nodes.
`vertica_nodes_restart_failed_total`	counter	The number of times we failed when trying to restart down nodes.
`vertica_nodes_restart_seconds`	histogram	The number of seconds it took to restart down nodes.
`vertica_running_nodes_count`	gauge	The number of nodes that have a running pod associated with it.
`vertica_subclusters_count`	gauge	The number of subclusters that exist.
`vertica_total_nodes_count`	gauge	The number of nodes that currently exist.
`vertica_up_nodes_count`	gauge	The number of nodes that have vertica running and can accept connections.
`vertica_upgrade_total`	counter	The number of times the operator performed an upgrade caused by an image change.
`workqueue_adds_total`	counter	Total number of adds handled by workqueue.
`workqueue_depth`	gauge	Current depth of workqueue.
`workqueue_longest_running_processor_seconds`	gauge	How many seconds has the longest running processor for workqueue been running.
`workqueue_queue_duration_seconds`	histogram	How long in seconds an item stays in workqueue before being requested.
`workqueue_retries_total`	counter	Total number of retries handled by workqueue.
`workqueue_unfinished_work_seconds`	gauge	How many seconds of work has been done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases.
`workqueue_work_duration_seconds`	histogram	How long in seconds processing an item from workqueue takes.