Prometheus integration
Vertica on Kubernetes integrates with Prometheus to scrape time series metrics about the VerticaDB operator and Vertica server process. These metrics create a detailed model of your application over time to provide valuable performance and troubleshooting insights as well as facilitate internal and external communications and service discovery in microservice and containerized architectures.
Prometheus requires that you set up targets—metrics that you want to monitor. Each target is exposed on an endpoint, and Prometheus periodically scrapes that endpoint to collect target data. Vertica exports metrics and provides access methods for both the VerticaDB operator and server process.
Server metrics
Vertica exports server metrics on port 8443 at the following endpoint:
https://host-address:8443/api-version/metrics
Only the superuser can authenticate to the HTTPS service, and the service accepts only mutual TLS (mTLS) authentication. The setup for both Vertica on Kubernetes and non-containerized Vertica environments is identical. For details, see HTTPS service.
Vertica on Kubernetes lets you set a custom port for its HTTP service with the subclusters[i].verticaHTTPNodePort
custom resource parameter. This parameter sets a custom port for the HTTPS service for NodePort
serviceTypes.
For request and response examples, see the /metrics
endpoint description. For a list of available metrics, see Prometheus metrics.
Grafana dashboards
You can visualize Vertica server time series metrics with Grafana dashboards. Vertica dashboards that use a Prometheus data source are available at Grafana Dashboards:
- Vertica Overview (Prometheus)
- Vertica Queries (Prometheus)
- Vertica Resource Management (Prometheus)
- Vertica Depot (Prometheus)
You can also download the source for each dashboard from the vertica/grafana-dashboards repository.
Operator metrics
The VerticaDB operator supports the Operator SDK framework, which requires that an authorization proxy impose role-based-access control (RBAC) to access operator metrics over HTTPS. To increase flexibility, Vertica provides the following options to access the Prometheus /metrics
endpoint:
-
HTTPS access: Meet operator SDK requirements and use a sidecar container as an RBAC proxy to authorize connections.
-
HTTP access: Expose the
/metrics
endpoint to external connections without RBAC. Any client with network access can read from/metrics
. -
Disable Prometheus entirely.
Vertica provides Helm chart parameters and YAML manifests to configure each option.
Note
If you installed the VerticaDB operator with OperatorHub.io, you can use the Prometheus integration with the default Helm chart settings. OperatorHub.io installations cannot configure any Helm chart parameters.Prerequisites
-
Complete Installing the VerticaDB operator.
-
Install the kubectl command line tool.
HTTPS with RBAC
The operator SDK framework requires that operators use an authorization proxy for metrics access. Because the operator sends metrics to localhost only, Vertica meets these requirements with a sidecar container with localhost access that enforces RBAC.
RBAC rules are cluster-scoped, and the sidecar authorizes connections from clients associated with a service account that has the correct ClusterRole and ClusterRoleBindings. Vertica provides the following example manifests:
-
verticadb-operator-proxy-role-cr: ClusterRole that has TokenReviews and SubjectAccessReviews access so that the sidecar can verify privileges on connections.
-
verticadb-operator-proxy-rolebinding-crb: ClusterRoleBinding that associates the ClusterRole that verifies sidecar privileges to a service account.
-
verticadb-operator-metrics-reader-cr: ClusterRole that allows HTTP GET requests on the
/metrics
endpoint for non-Kubernetes resources. -
verticadb-operator-metrics-reader-crb: ClusterRoleBinding that associates the metrics reader ClusterRole with a service account.
For additional details about ClusterRoles and ClusterRoleBindings, see the Kubernetes documentation.
Create RBAC rules
Note
This section details how to create RBAC rules for environments that require that you set up ClusterRole and ClusterRoleBinding objects outside of the Helm chart installation.The following steps create the ClusterRole and ClusterRoleBindings objects that grant access to the /metrics
endpoint to a non-Kubernetes resource such as Prometheus. Because RBAC rules are cluster-scoped, you must create or add to an existing ClusterRoleBinding:
-
Create a ClusterRoleBinding that binds the role for the RBAC sidecar proxy with a service account:
-
Create a ClusterRoleBinding:
$ kubectl create clusterrolebinding verticadb-operator-proxy-rolebinding \ --clusterrole=verticadb-operator-proxy-role \ --serviceaccount=namespace:serviceaccount
-
Add a service account to an existing ClusterRoleBinding:
$ kubectl patch clusterrolebinding verticadb-operator-proxy-rolebinding \ --type='json' \ -p='[{"op": "add", "path": "/subjects/-", "value": {"kind": "ServiceAccount", "name": "serviceaccount","namespace": "namespace" } }]'
-
-
Create a ClusterRoleBinding that binds the role for the non-Kubernetes object to the RBAC sidecar proxy service account:
-
Create a ClusterRoleBinding:
$ kubectl create clusterrolebinding verticadb-operator-metrics-reader \ --clusterrole=verticadb-operator-metrics-reader \ --serviceaccount=namespace:serviceaccount \ --group=system:authenticated
-
Bind the service account to an existing ClusterRoleBinding:
$ kubectl patch clusterrolebinding verticadb-operator-metrics-reader \ --type='json' \ -p='[{"op": "add", "path": "/subjects/-", "value": {"kind": "ServiceAccount", "name": "serviceaccount","namespace": "namespace"},{"op":"add","path":"/subjects/-","value":{"kind": "Group", "name": "system:authenticated"} }]'
$ kubectl patch clusterrolebinding verticadb-operator-metrics-reader \ --type='json' \ -p='[{"op": "add", "path": "/subjects/-", "value": {"kind": "ServiceAccount", "name": "serviceaccount","namespace": "namespace" } }]'
-
When you install the Helm chart, the ClusterRole and ClusterRoleBindings are created automatically. By default, the prometheus.expose parameter is set to EnableWithProxy, which creates the service object and exposes the operator's /metrics
endpoint.
For details about creating a sidecar container, see VerticaDB custom resource definition.
Service object
Vertica provides a service object verticadb-operator-metrics-service
to access the Prometheus /metrics
endpoint. The VerticaDB operator does not manage this service object. By default, the service object uses the ClusterIP service type to support RBAC.
Connect to the /metrics
endpoint at port 8443 with the following path:
https://verticadb-operator-metrics-service.namespace.svc.cluster.local:8443/metrics
Bearer token authentication
Kubernetes authenticates requests to the API server with service account credentials. Each pod is associated with a service account and has the following credentials stored in the filesystem of each container in the pod:
-
Token at
/var/run/secrets/kubernetes.io/serviceaccount/token
-
Certificate authority (CA) bundle at
/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Use these credentials to authenticate to the /metrics
endpoint through the service object. You must use the credentials for the service account that you used to create the ClusterRoleBindings.
For example, the following cURL request accesses the /metrics
endpoint. Include the --insecure
option only if you do not want to verify the serving certificate:
$ curl --insecure --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://verticadb-operator-metrics-service.vertica:8443/metrics
For additional details about service account credentials, see the Kubernetes documentation.
TLS client certificate authentication
Some environments might prevent you from authenticating to the /metrics
endpoint with the service account token. For example, you might run Prometheus outside of Kubernetes. To allow external client connections to the /metrics
endpoint, you have to supply the RBAC proxy sidecar with TLS certificates.
You must create a Secret that contains the certificates, and then use the prometheus.tlsSecret
Helm chart parameter to pass the Secret to the RBAC proxy sidecar when you install the Helm chart. The following steps create the Secret and install the Helm chart:
-
Create a Secret that contains the certificates:
$ kubectl create secret generic metrics-tls --from-file=tls.key=/path/to/tls.key --from-file=tls.crt=/path/to/tls.crt --from-file=ca.crt=/path/to/ca.crt
-
Install the Helm chart with
prometheus.tlsSecret
set to the Secret that you just created:$ helm install operator-name --namespace namespace --create-namespace vertica-charts/verticadb-operator \ --set prometheus.tlsSecret=metrics-tls
The
prometheus.tlsSecret
parameter forces the RBAC proxy to use the TLS certificates stored in the Secret. Otherwise, the RBAC proxy sidecar generates its own self-signed certificate.
After you install the Helm chart, you can authenticate to the /metrics
endpoint with the certificates in the Secret. For example:
$ curl --key tls.key --cert tls.crt --cacert ca.crt https://verticadb-operator-metrics-service.vertica.svc:8443/metrics
HTTP access
You might have an environment that does not require privileged access to Prometheus metrics. For example, you might run Prometheus outside of Kubernetes.
To allow external access to the /metrics
endpoint with HTTP, set prometheus.expose to EnableWithoutAuth. For example:
$ helm install operator-name --namespace namespace --create-namespace vertica-charts/verticadb-operator \
--set prometheus.expose=EnableWithoutAuth
Service object
Vertica provides a service object verticadb-operator-metrics-service
to access the Prometheus /metrics
endpoint. The VerticaDB operator does not manage this service object. By default, the service object uses the ClusterIP service type, so you must change the serviceType for external client access. The service object's fully-qualified domain name (FQDN) is as follows:
verticadb-operator-metrics-service.namespace.svc.cluster.local
Connect to the /metrics
endpoint at port 8443 with the following path:
http://verticadb-operator-metrics-service.namespace.svc.cluster.local:8443/metrics
Prometheus operator integration (optional)
Vertica on Kubernetes integrates with the Prometheus operator, which provides custom resources (CRs) that simplify targeting metrics. Vertica supports the ServiceMonitor CR that discovers the VerticaDB operator automatically, and authenticates requests with a bearer token.
The ServiceMonitor CR is available as a release artifact in our GitHub repository. See Helm chart parameters for details about the prometheus.createServiceMonitor
parameter.
Disabling Prometheus
To disable Prometheus, set the prometheus.expose Helm chart parameter to Disable
:
$ helm install operator-name --namespace namespace --create-namespace vertica-charts/verticadb-operator \
--set prometheus.expose=Disable
For details about Helm install commands, see Installing the VerticaDB operator.
Metrics
The following table describes the available VerticaDB operator metrics:
Name | Type | Description |
---|---|---|
controller_runtime_active_workers |
gauge | Number of currently used workers per controller. |
controller_runtime_max_concurrent_reconciles |
gauge | Maximum number of concurrent reconciles per controller. |
controller_runtime_reconcile_errors_total |
counter | Total number of reconciliation errors per controller. |
controller_runtime_reconcile_time_seconds |
histogram | Length of time per reconciliation per controller. |
controller_runtime_reconcile_total |
counter | Total number of reconciliations per controller. |
controller_runtime_webhook_latency_seconds |
histogram | Histogram of the latency of processing admission requests. |
controller_runtime_webhook_requests_in_flight |
gauge | Current number of admission requests being served. |
controller_runtime_webhook_requests_total |
counter | Total number of admission requests by HTTP status code. |
go_gc_duration_seconds |
summary | A summary of the pause duration of garbage collection cycles. |
go_goroutines |
gauge | Number of goroutines that currently exist. |
go_info |
gauge | Information about the Go environment. |
go_memstats_alloc_bytes |
gauge | Number of bytes allocated and still in use. |
go_memstats_alloc_bytes_total |
counter | Total number of bytes allocated, even if freed. |
go_memstats_buck_hash_sys_bytes |
gauge | Number of bytes used by the profiling bucket hash table. |
go_memstats_frees_total |
counter | Total number of frees. |
go_memstats_gc_sys_bytes |
gauge | Number of bytes used for garbage collection system metadata. |
go_memstats_heap_alloc_bytes |
gauge | Number of heap bytes allocated and still in use. |
go_memstats_heap_idle_bytes |
gauge | Number of heap bytes waiting to be used. |
go_memstats_heap_inuse_bytes |
gauge | Number of heap bytes that are in use. |
go_memstats_heap_objects |
gauge | Number of allocated objects. |
go_memstats_heap_released_bytes |
gauge | Number of heap bytes released to OS. |
go_memstats_heap_sys_bytes |
gauge | Number of heap bytes obtained from system. |
go_memstats_last_gc_time_seconds |
gauge | Number of seconds since 1970 of last garbage collection. |
go_memstats_lookups_total |
counter | Total number of pointer lookups. |
go_memstats_mallocs_total |
counter | Total number of mallocs. |
go_memstats_mcache_inuse_bytes |
gauge | Number of bytes in use by mcache structures. |
go_memstats_mcache_sys_bytes |
gauge | Number of bytes used for mcache structures obtained from system. |
go_memstats_mspan_inuse_bytes |
gauge | Number of bytes in use by mspan structures. |
go_memstats_mspan_sys_bytes |
gauge | Number of bytes used for mspan structures obtained from system. |
go_memstats_next_gc_bytes |
gauge | Number of heap bytes when next garbage collection will take place. |
go_memstats_other_sys_bytes |
gauge | Number of bytes used for other system allocations. |
go_memstats_stack_inuse_bytes |
gauge | Number of bytes in use by the stack allocator. |
go_memstats_stack_sys_bytes |
gauge | Number of bytes obtained from system for stack allocator. |
go_memstats_sys_bytes |
gauge | Number of bytes obtained from system. |
go_threads |
gauge | Number of OS threads created. |
process_cpu_seconds_total |
counter | Total user and system CPU time spent in seconds. |
process_max_fds |
gauge | Maximum number of open file descriptors. |
process_open_fds |
gauge | Number of open file descriptors. |
process_resident_memory_bytes |
gauge | Resident memory size in bytes. |
process_start_time_seconds |
gauge | Start time of the process since unix epoch in seconds. |
process_virtual_memory_bytes |
gauge | Virtual memory size in bytes. |
process_virtual_memory_max_bytes |
gauge | Maximum amount of virtual memory available in bytes. |
vertica_cluster_restart_attempted_total |
counter | The number of times we attempted a full cluster restart. |
vertica_cluster_restart_failed_total |
counter | The number of times we failed when attempting a full cluster restart. |
vertica_cluster_restart_seconds |
histogram | The number of seconds it took to do a full cluster restart. |
vertica_nodes_restart_attempted_total |
counter | The number of times we attempted to restart down nodes. |
vertica_nodes_restart_failed_total |
counter | The number of times we failed when trying to restart down nodes. |
vertica_nodes_restart_seconds |
histogram | The number of seconds it took to restart down nodes. |
vertica_running_nodes_count |
gauge | The number of nodes that have a running pod associated with it. |
vertica_subclusters_count |
gauge | The number of subclusters that exist. |
vertica_total_nodes_count |
gauge | The number of nodes that currently exist. |
vertica_up_nodes_count |
gauge | The number of nodes that have vertica running and can accept connections. |
vertica_upgrade_total |
counter | The number of times the operator performed an upgrade caused by an image change. |
workqueue_adds_total |
counter | Total number of adds handled by workqueue. |
workqueue_depth |
gauge | Current depth of workqueue. |
workqueue_longest_running_processor_seconds |
gauge | How many seconds has the longest running processor for workqueue been running. |
workqueue_queue_duration_seconds |
histogram | How long in seconds an item stays in workqueue before being requested. |
workqueue_retries_total |
counter | Total number of retries handled by workqueue. |
workqueue_unfinished_work_seconds |
gauge | How many seconds of work has been done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases. |
workqueue_work_duration_seconds |
histogram | How long in seconds processing an item from workqueue takes. |