This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Monitoring Vertica

You can monitor the activity and health of a Vertica database through various log files and system tables.

1: Monitoring log files
2: Rotating log files
3: Monitoring process status (ps)
4: Monitoring Linux resource usage
5: Monitoring disk space usage
6: Monitoring elastic cluster rebalancing
7: Monitoring events

7.1: Event logging mechanisms
7.2: Event codes
7.3: Event data
7.4: Configuring event reporting

7.4.1: Configuring reporting for the simple notification service (SNS)
7.4.2: Configuring reporting for syslog
7.4.3: Configuring reporting for SNMP
7.4.4: Configuring event trapping for SNMP
7.4.5: Verifying SNMP configuration

7.5: Event reporting examples

8: Using system tables
9: Data collector utility

9.1: Configuring data retention policies
9.2: Querying data collector tables
9.3: Managing data collection logs

10: Monitoring partition reorganization
11: Monitoring resource pools
12: Monitoring recovery

12.1: Viewing log files on each node
12.2: Using system tables to monitor recovery
12.3: Viewing cluster state and recovery status
12.4: Monitoring cluster status after recovery

13: Clearing projection refresh history
14: Monitoring Vertica using notifiers

You can monitor the activity and health of a Vertica database through various log files and system tables. Vertica provides various configuration parameters that control monitoring options. You can also use the Management Console to observe database activity.

1 - Monitoring log files

When a Vertica database is running, each in the writes messages into a file named vertica.log.

When a database is running

When a Vertica database is running, each node in the cluster writes messages into a file named vertica.log. For example, the Tuple Mover and the transaction manager write INFO messages into vertica.log at specific intervals even when there is no mergeout activity.

You configure the location of the vertica.log file. By default, the log file is in:

catalog-path/database-name/node-name_catalog/vertica.log

catalog-path is the path shown in the NODES system table minus the Catalog directory at the end.
database-name is the name of your database.
node-name is the name of the node shown in the NODES system table.

Note

Vertica often changes the format or content of log files in subsequent releases to benefit both customers and customer support.

To monitor one node in a running database in real time:

In a terminal window enter:

$ tail -f catalog-path/database-name/node-name_catalog/vertica.log

Note

To monitor your overall database (rather than an individual node/host), use the Data Collector, which records system activities and performance. See Data collector utility for more on Data Collector.

`catalog-path`	The catalog pathname specified when you created the database. See Creating a database.
`database-name`	The database name (case sensitive)
`node-name`	The node name, as specified in the database definition. See Viewing a database.

When the database/node is starting up

During system startup, before the Vertica log has been initialized to write messages, each node in the cluster writes messages into a file named dbLog. This log is useful to diagnose situations where the database fails to start before it can write messages into vertica.log. The dblog is located at the following path, using catalog-path and database-name as described above:

catalog-path/database-name/dbLog

2 - Rotating log files

Most Linux distributions include the logrotate utility.

Most Linux distributions include the logrotate utility. Using this utility simplifies log file administration. By setting up a logrotate configuration file, you can use the utility to complete one or more of these tasks automatically:

Compress and rotate log files
Remove log files automatically
Email log files to named recipients

You can configure logrotate to complete these tasks at specific intervals, or when log files reach a particular size.

If logrotate is present when Vertica is installed, then Vertica automatically sets this utility to look for configuration files. Thus, logrotate searches for configuration files in the /opt/vertica/config/logrotate directory on each node.

When you create a database, Vertica creates database-specific logrotate configurations on each node in your cluster, which are used by the logrotate utility. It then creates a file with the path /opt/vertica/config/logrotate/ for each individual database.

For information about additional settings, use the man logrotate command.

Executing the Python script through the dbadmin logrotate cron job

During the installation of Vertica, the installer configures a cron job for the dbadmin user. This cron job is configured to execute a Python script that runs the logrotate utility. You can view the details of this cron job by viewing the dbadmin.cron file, which is located in the /opt/vertica/config directory.

If you want to customize a cron job to configure logrotate for your Vertica database, you must create the cron job under the dbadmin user.

Using the administration tools logrotate utility

You can use the admintools logrotate option to help configure logrotate scripts for a database and distribute the scripts across the cluster. The logrotate option allows you to specify:

How often to rotate logs
How large logs can become before being rotated
How long to keep the logs

Example:

The following example shows you how to set up log rotation on a weekly schedule and keeps for three months (12 logs).

$ admintools -t logrotate -d <dbname> -r weekly -k 12

See Writing administration tools scripts for more usage information.

Configure logrotate for MC

The Management Console log file is:

/opt/vconsole/log/mc/mconsole.log

To configure logrotate for MC, configure the following file:

/opt/vconsole/temp/webapp/WEB-INF/classes/log4j.xml

Edit the log4j.xml file and set these parameters as follows:

Restrict the size of the log:

<param name="MaxFileSize" value="1MB"/>

Restrict the number of file backups for the log:
```
<param name="MaxBackupIndex" value="1"/>
```
Restart MC as the root user:
```
# etc/init.d/vertica-consoled restart
```

Rotating logs manually

To implement a custom log rotation process, follow these steps:

Rename or archive the existing vertica.log file. For example:
```
$ mv vertica.log vertica.log.1
```
Send the Vertica process the USR1 signal, using either of the following approaches:
```
$ killall -USR1 vertica
```
or
```
$ ps -ef | grep -i vertica
$ kill -USR1 process-id
```

3 - Monitoring process status (ps)

You can use ps to monitor the database and Spread processes running on each node in the cluster.

You can use ps to monitor the database and Spread processes running on each node in the cluster. For example:

$ ps aux | grep /opt/vertica/bin/vertica
$ ps aux | grep /opt/vertica/spread/sbin/spread

You should see one Vertica process and one Spread process on each node for common configurations. To monitor Administration Tools and connector processes:

$ ps aux | grep vertica

There can be many connection processes but only one Administration Tools process.

4 - Monitoring Linux resource usage

You should monitor system resource usage on any or all nodes in the cluster.

You should monitor system resource usage on any or all nodes in the cluster. You can use System Activity Reporting (SAR) to monitor resource usage.

Note

OpenText recommends that you install pstack and sysstat to help monitor Linux resources. The SYSSTAT package contains utilities for monitoring system performance and usage activity, such as sar, as well as tools you can schedule via cron to collect performance and activity data. See the SYSSTAT Web page for details.

The pstack utility lets you print a stack trace of a running process. See the PSTACK man page for details.

Run the top utility

$ top

A high CPU percentage in top indicates that Vertica is CPU-bound. For example:

top - 11:44:28 up 53 days, 23:47,  9 users,  load average: 0.91, 0.97, 0.81
Tasks: 123 total,   1 running, 122 sleeping,   0 stopped,   0 zombie
Cpu(s):  26.9%us,  1.3%sy,  0.0%ni, 71.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4053136 total,  3882020k used,  171116 free,   407688 buffers
Swap:  4192956 total,   176k used,  4192780 free,  1526436 cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
13703 dbadmin    1   0 1374m 678m  55m S 99.9 17.1   6:21.70 vertica
 2606 root      16   0 32152  11m 2508 S  1.0  0.3   0:16.97 X
    1 root      16   0  4748  552  456 S  0.0  0.0   0:01.51 init
    2 root      RT  -5     0    0    0 S  0.0  0.0   0:04.92 migration/0
    3 root      34  19     0    0    0 S  0.0  0.0   0:11.75 ksoftirqd/0
...

Some possible reasons for high CPU usage are:

The Tuple Mover runs automatically and thus consumes CPU time even if there are no connections to the database.
The swappiness kernel parameter may not be set to 0. Execute the following command from the Linux command line to see the value of this parameter:
```
$ cat /proc/sys/vm/swappiness
```
If this value is not 0, change it by following the steps in Check for swappiness.
Some information sources:
- Red Hat
- Indiana University Unix Systems Support Group

Run the iostat utility. A high idle time in top at the same time as a high rate of blocks read in iostat indicates that Vertica is disk-bound. For example:

$ /usr/bin/iostat
Linux 2.6.18-164.el5 (qa01)     02/05/2011
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.77    2.32    0.76    0.68    0.00   95.47
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
hda               0.37         3.40        10.37    2117723    6464640
sda               0.46         1.94        18.96    1208130   11816472
sdb               0.26         1.79        15.69    1114792    9781840
sdc               0.24         1.80        16.06    1119304   10010328
sdd               0.22         1.79        15.52    1117472    9676200
md0               8.37         7.31        66.23    4554834   41284840

5 - Monitoring disk space usage

You can use these system tables to monitor disk space usage on your cluster:

System table	Description
`DISK_STORAGE`	Monitors the amount of disk storage used by the database on each node.
`COLUMN_STORAGE`	Monitors the amount of disk storage used by each column of each projection on each node.
`PROJECTION_STORAGE`	Monitors the amount of disk storage used by each projection on each node.

6 - Monitoring elastic cluster rebalancing

Vertica includes system tables that can be used to monitor the rebalance status of an elastic cluster and gain general insight to the status of elastic cluster on your nodes.

REBALANCE_TABLE_STATUS provides general information about a rebalance. It shows, for each table, the amount of data that has been separated, the amount that is currently being separated, and the amount to be separated. It also shows the amount of data transferred, the amount that is currently being transferred, and the remaining amount to be transferred (or an estimate if storage is not separated).

Note
If multiple rebalance methods were used for a single table (for example, the table has unsegmented and segmented projections), the table may appear multiple times - once for each rebalance method.
REBALANCE_PROJECTION_STATUS can be used to gain more insight into the details for a particular projection that is being rebalanced. It provides the same type of information as above, but in terms of a projection instead of a table.

In each table, the columns SEPARATED_PERCENT and TRANSFERRED_PERCENTcan be used to determine overall progress.

Historical rebalance information

Historical information about work completed is retained, so query with the table column IS_LATEST to restrict output to only the most recent or current rebalance activity. Historical data may include information about dropped projections or tables. If a table or projection has been dropped and information about the anchor table is not available, then NULL is displayed for the table ID and <unknown> for table name. Information on dropped tables is still useful, for example, in providing justification for the duration of a task.

7 - Monitoring events

To help you monitor your database system, Vertica traps and logs significant events that affect database performance and functionality if you do not address their root causes.

To help you monitor your database system, Vertica traps and logs significant events that affect database performance and functionality if you do not address their root causes. This section describes where events are logged, the types of events that Vertica logs, how to respond to these events, the information that Vertica provides for these events, and how to configure event monitoring.

7.1 - Event logging mechanisms

Vertica posts events to the following mechanisms:.

Vertica posts events to the following mechanisms:

Mechanism	Description
`vertica.log`	All events are automatically posted to `vertica.log`. See Monitoring the Log Files.
`ACTIVE_EVENTS`	This SQL system table provides information about all open events. See Using system tables and ACTIVE_EVENTS.
`SNMP`	To post traps to SNMP, enable global reporting in addition to each individual event you want trapped. See Configuring event reporting.
`Syslog`	To log events to syslog, enable event reporting for each individual event you want logged. See Configuring event reporting.

7.2 - Event codes

The following table lists the event codes that Vertica logs to the events system tables.

Event Code	Severity	Event Code Description	Description/Action
0	Warning	Low Disk Space	Warning indicates one of the following issues: Lack of disk space for database Disk failure I/O hardware failure Action: Add more disk space, or replace the failing disk or hardware as soon as possible. Check `dmesg` to see what caused the problem. Also, use the DISK_RESOURCE_REJECTIONS system table to determine the types of disk space requests that are being rejected and the hosts where they are rejected. See Managing disk space for details.
1	Warning	Read Only File System	Database lacks write access to the file system for data or catalog paths. This sometimes occurs if Linux remounts a drive due to a kernel issue. Action: Give the database write access.
2	Emergency	Loss Of K Safety	The database is no longer K-safe because insufficient nodes are functioning within the cluster. Loss of K-safety causes the database to shut down. Action: Recover the system.
3	Critical	Current Fault Tolerance at Critical Level	One or more nodes in the cluster failed. If the database loses one more node, it will no longer be K-safe and shut down. Action: Restore nodes that failed or shut down.
4	Warning	Too Many ROS Containers	Heavy load activity on one or more projections sometimes generates more ROS containers than the Tuple Mover can handle. Vertica allows up to 1024 ROS containers per projection before it rolls back additional load jobs and returns a ROS pushback error message. Action: The Tuple Mover typically catches up with pending mergeout requests and the Optimizer can resume executing queries on affected tables (see Mergeout). If this problem does not resolve quickly, or if it occurs frequently, it is probably related to insufficient RAM allocated to MAXMEMORY in the TM resource pool.
5	Informational	WOS Over Flow	Deprecated
6	Informational	Node State Change	The node state changed. Action: Check node status.
7	Warning	Recovery Failure	Database was not restored to a functional state after a hardware or software related failure. Action: Reasons for the warning can vary, see the event description for details.
8	Warning	Recovery Error	Database encountered an error while attempting to recover. If the number of recovery errors exceeds Max Tries, the Recovery Failure event is triggered. Action: Reasons for the warning can vary, see the event description for details.
9	n/a	Recovery Lock Error	Unused
10	n/a	Recovery Projection Retrieval Error	Unused
11	Warning	Refresh Error	The database encountered an error while attempting to refresh. Action: Reasons for the warning can vary, see the event description for details.
12	n/a	Refresh Lock Error	Unused
13	n/a	Tuple Mover Error	Deprecated
14	Warning	Timer Service Task Error	Error occurred in an internal scheduled task. Action: None, internal use only
15	Warning	Stale Checkpoint	Deprecated
16	Notice	License Size Compliance	Database size exceeds license size allowance. Action: See Monitoring database size for license compliance.
17	Notice	License Term Compliance	Database is not in compliance with your Vertica license. Action: Check compliance status with GET_COMPLIANCE_STATUS.
18	Error	CRC Mismatch	Cyclic Redundancy Check (CRC) returned an error or errors while fetching data. Action: Review the `vertica.log` file or the SNMP trap utility to evaluate CRC errors.
19	Critical/Warning	Catalog Sync Exceeds Durability Threshold	Severity: Critical when exceeding hard limit, Warning when exceeding soft limit.
20	Critical	Cluster Read-only	Eon Mode) Quorum or primary shard coverage loss forced database into read-only mode. Action: Restart down nodes.

7.3 - Event data

To help you interpret and solve the issue that triggered an event, each event provides a variety of data, depending upon the event logging mechanism used.

The following table describes the event data and indicates where it is used.

vertica.log	ACTIVE_EVENTS (column names)	SNMP	Syslog	Description
N/A	NODE_NAME	N/A	N/A	The node where the event occurred.
Event Code	EVENT_CODE	Event Type	Event Code	A numeric ID that indicates the type of event. See Event Types in the previous table for a list of event type codes.
Event Id	EVENT_ID	Event OID	Event Id	A unique numeric ID that identifies the specific event.
Event Severity	EVENT_ SEVERITY	Event Severity	Event Severity	The severity of the event from highest to lowest. These events are based on standard syslog severity types: `0 – Emergency` `1 – Alert` `2 – Critical` `3 – Error` `4 – Warning` `5 – Notice` `6 – Info` `7 – Debug`
PostedTimestamp	EVENT_ POSTED_ TIMESTAMP	N/A	PostedTimestamp	The year, month, day, and time the event was reported. Time is provided as military time.
ExpirationTimestamp	EVENT_ EXPIRATION	N/A	ExpirationTimestamp	The time at which this event expires. If the same event is posted again prior to its expiration time, this field gets updated to a new expiration time.
EventCodeDescription	EVENT_CODE_ DESCRIPTION	Description	EventCodeDescription	A brief description of the event and details pertinent to the specific situation.
ProblemDescription	EVENT_PROBLEM_ DESCRIPTION	Event Short Description	ProblemDescription	A generic description of the event.
N/A	REPORTING_ NODE	Node Name	N/A	The name of the node within the cluster that reported the event.
DatabaseName	N/A	Database Name	DatabaseName	The name of the database that is impacted by the event.
N/A	N/A	Host Name	Hostname	The name of the host within the cluster that reported the event.
N/A	N/A	Event Status	N/A	The status of the event. It can be either: `1 – Open` `2 – Clear`

7.4 - Configuring event reporting

Event reporting is automatically configured for .log, and current events are automatically posted to the ACTIVE_EVENTS system table.

Event reporting is automatically configured for vertica.log, and current events are automatically posted to the ACTIVE_EVENTS system table. You can also configure Vertica to post events to syslog and SNMP.

7.4.1 - Configuring reporting for the simple notification service (SNS)

You can monitor (DC) components and send new rows to Amazon Web Services (AWS) Simple Notification Service (SNS).

You can monitor Data collector (DC) components and send new rows to Amazon Web Services (AWS) Simple Notification Service (SNS). SNS notifiers are configured with database-level SNS parameters.

Note

Several SNS configuration parameters have and fall back to equivalents for S3. This lets you to share configurations between S3 and SNS. For example, if you set the values for AWSAuth but not for SNSAuth, Vertica automatically uses the AWSAuth credentials. For brevity, the procedures on this page will not use this fallback behavior and instead use the SNS configuration parameters.

For details, see SNS parameters.

Minimally, to send DC data to SNS topics, you must configure and specify the following:

An SNS notifier
A Simple Notification Service (SNS) topic
An AWS region (SNSRegion)
An SNS endpoint (SNSEndpoint) (FIPS only)
Credentials to authenticate to AWS
Information about how to handle HTTPS

To create an SNS notifier, use CREATE NOTIFIER, specifying sns as the ACTION.

SNS topics and their subscribers should be configured with AWS. For details, see the AWS documentation:

AWS region and endpoint

In most use cases, you only need to set the AWS region with the SNSRegion parameter; if the SNSEndpoint is set to an empty string (default) and the SNSRegion is set, Vertica automatically finds and uses the appropriate endpoint:

=> ALTER DATABASE DEFAULT SET SNSRegion='us-east-1';
=> ALTER DATABASE DEFAULT SET SNSEndpoint='';

If you want to specify an endpoint, its region must match the region specified in SNSRegion:

=> ALTER DATABASE DEFAULT SET SNSEndpoint='sns.us-east-1.amazonaws.com';

If you use FIPS, you should manually set SNSEndpoint to a FIPS-compliant endpoint:

=> ALTER DATABASE DEFAULT SET SNSEndpoint='sns-fips.us-east-1.amazonaws.com';

AWS credentials

AWS credentials can be set with SNSAuth, which takes an access key and secret access key in the following format:

access_key:secret_access_key

To set SNSAuth:

=> ALTER DATABASE DEFAULT SET SNSAuth='VNDDNVOPIUQF917O5PDB:+mcnVONVIbjOnf1ekNis7nm3mE83u9fjdwmlq36Z';

Handling HTTPS

The SNSEnableHttps parameter determines whether the SNS notifier uses TLS to secure the connection between Vertica and AWS. HTTPS is enabled by default and can be manually enabled with:

=> ALTER DATABASE DEFAULT SET SNSEnableHttps=1;

If SNSEnableHttps is enabled, depending on your configuration, you might need to specify a custom set of CA bundles with SNSCAFile or SNSCAPath. Amazon root certificates are typically contained in the set of trusted CA certificates already, so you should not have to set these parameters in most environments:

=> ALTER DATABASE DEFAULT SET SNSCAFile='path/to/ca/bundle.pem'
=> ALTER DATABASE DEFAULT SET SNSCAPath='path/to/ca/bundles/'

HTTPS can be manually disabled with:

=> ALTER DATABASE DEFAULT SET SNSEnableHttps=0;

Examples

The following example creates an SNS topic, subscribes to it with an SQS queue, and then configures an SNS notifier for the DC component LoginFailures:

Create an SNS topic.
Create an SQS queue.
Subscribe the SQS queue to the SNS topic.

Set SNSAuth with your AWS credentials:

=> ALTER DATABASE DEFAULT SET SNSAuth='VNDDNVOPIUQF917O5PDB:+mcnVONVIbjOnf1ekNis7nm3mE83u9fjdwmlq36Z';

Set SNSRegion:

=> ALTER DATABASE DEFAULT SET SNSRegion='us-east-1'

Enable HTTPS:

=> ALTER DATABASE DEFAULT SET SNSEnableHttps=1;

Create an SNS notifier:

=> CREATE NOTIFIER v_sns_notifier ACTION 'sns' MAXPAYLOAD '256K' MAXMEMORYSIZE '10M' CHECK COMMITTED;

Verify that the SNS notifier, SNS topic, and SQS queue are properly configured:
1. Manually send a message from the notifier to the SNS topic with NOTIFY:
```
=> SELECT NOTIFY('test message', 'v_sns_notifier', 'arn:aws:sns:us-east-1:123456789012:MyTopic')
```
2. Poll the SQS queue for your message.

Attach the SNS notifier to the LoginFailures component with SET_DATA_COLLECTOR_NOTIFY_POLICY:

=> SELECT SET_DATA_COLLECTOR_NOTIFY_POLICY('LoginFailures', 'v_sns_notifier', 'Login failed!', true)

To disable an SNS notifier:

=> SELECT SET_DATA_COLLECTOR_NOTIFY_POLICY('LoginFailures', 'v_sns_notifier', 'Login failed!', false)

7.4.2 - Configuring reporting for syslog

Syslog is a network-logging utility that issues, stores, and processes log messages.

Syslog is a network-logging utility that issues, stores, and processes log messages. It is a useful way to get heterogeneous data into a single data repository.

To log events to syslog, enable event reporting for each individual event you want logged. Messages are logged, by default, to /var/log/messages.

Configuring event reporting to syslog consists of:

Enabling Vertica to trap events for syslog.
Defining which events Vertica traps for syslog.

Vertica strongly suggests that you trap the Stale Checkpoint event.
Defining which syslog facility to use.

Enabling Vertica to trap events for syslog

To enable event trapping for syslog, issue the following SQL command:

=> ALTER DATABASE DEFAULT SET SyslogEnabled = 1;

To disable event trapping for syslog, issue the following SQL command:

=> ALTER DATABASE DEFAULT SET SyslogEnabled = 0;

Defining events to trap for syslog

To define events that generate a syslog entry, issue the following SQL command, one of the events described in the list below the command:

=> ALTER DATABASE DEFAULT SET SyslogEvents = 'events-list';

where events-list is a comma-delimited list of events, one or more of the following:

Low Disk Space
Read Only File System
Loss Of K Safety
Current Fault Tolerance at Critical Level
Too Many ROS Containers
Node State Change
Recovery Failure
Recovery Error
Recovery Lock Error
Recovery Projection Retrieval Error
Refresh Error
Refresh Lock Error
Tuple Mover Error
Timer Service Task Error
Stale Checkpoint

The following example generates a syslog entry for low disk space and recovery failure:

=> ALTER DATABASE DEFAULT SET SyslogEvents = 'Low Disk Space, Recovery Failure';

Defining the SyslogFacility to use for reporting

The syslog mechanism allows for several different general classifications of logging messages, called facilities. Typically, all authentication-related messages are logged with the auth (or authpriv) facility. These messages are intended to be secure and hidden from unauthorized eyes. Normal operational messages are logged with the daemon facility, which is the collector that receives and optionally stores messages.

The SyslogFacility directive allows all logging messages to be directed to a different facility than the default. When the directive is used, all logging is done using the specified facility, both authentication (secure) and otherwise.

To define which SyslogFacility Vertica uses, issue the following SQL command:

=> ALTER DATABASE DEFAULT SET SyslogFacility = 'Facility_Name';

Where the facility-level argument <Facility_Name> is one of the following:

auth
authpriv (Linux only)
cron
uucp (UUCP subsystem)
daemon
ftp (Linux only)
lpr (line printer subsystem)
mail (mail system)
news (network news subsystem)
user (default system)
local0 (local use 0)
local1 (local use 1)
local2 (local use 2)
local3 (local use 3)
local4 (local use 4)
local5 (local use 5)
local6 (local use 6)
local7 (local use 7)

Trapping other event types

To trap events other than the ones listed above, create a syslog notifier and allow it to trap the desired events with SET_DATA_COLLECTOR_NOTIFY_POLICY.

Events monitored by this notifier type are not logged to MONITORING_EVENTS nor vertica.log.

The following example creates a notifier that writes a message to syslog when the Data collector (DC) component LoginFailures updates:

Enable syslog notifiers for the current database:

=> ALTER DATABASE DEFAULT SET SyslogEnabled = 1;

Create and enable a syslog notifier v_syslog_notifier:

=> CREATE NOTIFIER v_syslog_notifier ACTION 'syslog'
    ENABLE
    MAXMEMORYSIZE '10M'
    IDENTIFIED BY 'f8b0278a-3282-4e1a-9c86-e0f3f042a971'
    PARAMETERS 'eventSeverity = 5';

Configure the syslog notifier v_syslog_notifier for updates to the LoginFailures DC component with SET_DATA_COLLECTOR_NOTIFY_POLICY:

=> SELECT SET_DATA_COLLECTOR_NOTIFY_POLICY('LoginFailures','v_syslog_notifier', 'Login failed!', true);

This notifier writes the following message to syslog (default location: /var/log/messages) when a user fails to authenticate as the user Bob:

Apr 25 16:04:58
vertica_host_01
vertica:
    Event Posted:
        Event Code:21
        Event Id:0
        Event Severity: Notice [5]
        PostedTimestamp: 2022-04-25 16:04:58.083063
        ExpirationTimestamp: 2022-04-25 16:04:58.083063
        EventCodeDescription: Notifier
        ProblemDescription: (Login failed!)
    {
       "_db":"VMart",
       "_schema":"v_internal",
       "_table":"dc_login_failures",
       "_uuid":"f8b0278a-3282-4e1a-9c86-e0f3f042a971",
       "authentication_method":"Reject",
       "client_authentication_name":"default: Reject",
       "client_hostname":"::1",
       "client_label":"",
       "client_os_user_name":"dbadmin",
       "client_pid":523418,
       "client_version":"",
       "database_name":"dbadmin",
       "effective_protocol":"3.8",
       "node_name":"v_vmart_node0001",
       "reason":"REJECT",
       "requested_protocol":"3.8",
       "ssl_client_fingerprint":"",
       "ssl_client_subject":"",
       "time":"2022-04-25 16:04:58.082568-05",
       "user_name":"Bob"
    }#012
    DatabaseName: VMart
    Hostname: vertica_host_01

7.4.3 - Configuring reporting for SNMP

Configuring event reporting for SNMP consists of:.

Configuring event reporting for SNMP consists of:

Configuring Vertica to enable event trapping for SNMP as described below.
Importing the Vertica Management Information Base (MIB) file into the SNMP monitoring device.

The Vertica MIB file allows the SNMP trap receiver to understand the traps it receives from Vertica. This, in turn, allows you to configure the actions it takes when it receives traps.

Vertica supports the SNMP V1 trap protocol, and it is located in /opt/vertica/sbin/VERTICA-MIB. See the documentation for your SNMP monitoring device for more information about importing MIB files.
Configuring the SNMP trap receiver to handle traps from Vertica.

SNMP trap receiver configuration differs greatly from vendor to vendor. As such, the directions presented here for configuring the SNMP trap receiver to handle traps from Vertica are generic.

Vertica traps are single, generic traps that contain several fields of identifying information. These fields equate to the event data described in Monitoring events. However, the format used for the field names differs slightly. Under SNMP, the field names contain no spaces. Also, field names are pre-pended with “vert”. For example, Event Severity becomes vertEventSeverity.

When configuring your trap receiver, be sure to use the same hostname, port, and community string you used to configure event trapping in Vertica.

Examples of network management providers:
- Network Node Manager i
- IBM Tivoli
- AdventNet
- Net-SNMP (Open Source)
- Nagios (Open Source)
- Open NMS (Open Source)

7.4.4 - Configuring event trapping for SNMP

The following events are trapped by default when you configure Vertica to trap events for SNMP:.

The following events are trapped by default when you configure Vertica to trap events for SNMP:

Low Disk Space
Read Only File System
Loss of K Safety
Current Fault Tolerance at Critical Level
Too Many ROS Containers
Node State Change
Recovery Failure
Stale Checkpoint
CRC Mismatch

To configure Vertica to trap events for SNMP

Enable Vertica to trap events for SNMP.
Define where Vertica sends the traps.
Optionally redefine which SNMP events Vertica traps.

Note

After you complete steps 1 and 2 above, Vertica automatically traps the default SNMP events. Only perform step 3 if you want to redefine which SNMP events are trapped. Vertica recommends that you trap the Stale Checkpoint event even if you decide to reduce the number events Vertica traps for SNMP. The specific settings you define have no effect on traps sent to the log. All events are trapped to the log.

To enable event trapping for SNMP

Use the following SQL command:

=> ALTER DATABASE DEFAULT SET SnmpTrapsEnabled = 1;

To define where Vertica send traps

Use the following SQL command, where Host_name and port identify the computer where SNMP resides, and CommunityString acts like a password to control Vertica's access to the server:

=> ALTER DATABASE DEFAULT SET SnmpTrapDestinationsList = 'host_name port CommunityString';

For example:

=> ALTER DATABASE DEFAULT SET SnmpTrapDestinationsList = 'localhost 162 public';

You can also specify multiple destinations by specifying a list of destinations, separated by commas:

=> ALTER DATABASE DEFAULT SET SnmpTrapDestinationsList = 'host_name1 port1 CommunityString1, hostname2 port2 CommunityString2';

Note

: Setting multiple destinations sends any SNMP trap notification to all destinations listed.

To define which events Vertica traps

Use the following SQL command, where Event_Name is one of the events in the list below the command:

=> ALTER DATABASE DEFAULT SET SnmpTrapEvents = 'Event_Name1, Even_Name2';

Low Disk Space
Read Only File System
Loss Of K Safety
Current Fault Tolerance at Critical Level
Too Many ROS Containers
Node State Change
Recovery Failure
Recovery Error
Recovery Lock Error
Recovery Projection Retrieval Error
Refresh Error
Tuple Mover Error
Stale Checkpoint
CRC Mismatch

Note

The above values are case sensitive.

The following example specifies two event names:

=> ALTER DATABASE DEFAULT SET SnmpTrapEvents = 'Low Disk Space, Recovery Failure';

7.4.5 - Verifying SNMP configuration

To create a set of test events that checks SNMP configuration:.

To create a set of test events that checks SNMP configuration:

Set up SNMP trap handlers to catch Vertica events.

Test your setup with the following command:

SELECT SNMP_TRAP_TEST();
    SNMP_TRAP_TEST
--------------------------
 Completed SNMP Trap Test
(1 row)

7.5 - Event reporting examples

The following example illustrates a Too Many ROS Containers event posted and cleared within vertica.log:.

Vertica.log

The following example illustrates a Too Many ROS Containers event posted and cleared within vertica.log:

08/14/15 15:07:59 thr:nameless:0x45a08940 [INFO] Event Posted: Event Code:4 Event Id:0 Event Severity: Warning [4] PostedTimestamp:
2015-08-14 15:07:59.253729 ExpirationTimestamp: 2015-08-14 15:08:29.253729
EventCodeDescription: Too Many ROS Containers ProblemDescription:
Too many ROS containers exist on this node. DatabaseName: TESTDB
Hostname: fc6-1.example.com
08/14/15 15:08:54 thr:Ageout Events:0x2aaab0015e70 [INFO] Event Cleared:
Event Code:4 Event Id:0 Event Severity: Warning [4] PostedTimestamp:
2015-08-14 15:07:59.253729 ExpirationTimestamp: 2015-08-14 15:08:53.012669
EventCodeDescription: Too Many ROS Containers ProblemDescription:
Too many ROS containers exist on this node. DatabaseName: TESTDB
Hostname: fc6-1.example.com

SNMP

The following example illustrates a Too Many ROS Containers event posted to SNMP:

Version: 1, type: TRAPREQUESTEnterprise OID: .1.3.6.1.4.1.31207.2.0.1
Trap agent: 72.0.0.0
Generic trap: ENTERPRISESPECIFIC (6)
Specific trap: 0
.1.3.6.1.4.1.31207.1.1 ---> 4
.1.3.6.1.4.1.31207.1.2 ---> 0
.1.3.6.1.4.1.31207.1.3 ---> 2008-08-14 11:30:26.121292
.1.3.6.1.4.1.31207.1.4 ---> 4
.1.3.6.1.4.1.31207.1.5 ---> 1
.1.3.6.1.4.1.31207.1.6 ---> site01
.1.3.6.1.4.1.31207.1.7 ---> suse10-1
.1.3.6.1.4.1.31207.1.8 ---> Too many ROS containers exist on this node.
.1.3.6.1.4.1.31207.1.9 ---> QATESTDB
.1.3.6.1.4.1.31207.1.10 ---> Too Many ROS Containers

Syslog

The following example illustrates a Too Many ROS Containers event posted and cleared within syslog:

Aug 14 15:07:59 fc6-1 vertica: Event Posted: Event Code:4 Event Id:0 Event Severity: Warning [4] PostedTimestamp: 2015-08-14 15:07:59.253729 ExpirationTimestamp:
2015-08-14 15:08:29.253729 EventCodeDescription: Too Many ROS Containers ProblemDescription:
Too many ROS containers exist on this node. DatabaseName: TESTDB Hostname: fc6-1.example.com
Aug 14 15:08:54 fc6-1 vertica: Event Cleared: Event Code:4 Event Id:0 Event Severity:
Warning [4] PostedTimestamp: 2015-08-14 15:07:59.253729 ExpirationTimestamp:
2015-08-14 15:08:53.012669 EventCodeDescription: Too Many ROS Containers ProblemDescription:
Too many ROS containers exist on this node. DatabaseName: TESTDB Hostname: fc6-1.example.com

8 - Using system tables

Vertica system tables provide information about system resources, background processes, workload, and performance—for example, load streams, query profiles, and tuple mover operations.

Vertica system tables provide information about system resources, background processes, workload, and performance—for example, load streams, query profiles, and tuple mover operations. Vertica collects and refreshes this information automatically.

You can query system tables using expressions, predicates, aggregates, analytics, subqueries, and joins. You can also save system table query results into a user table for future analysis. For example, the following query creates a table, mynode, selecting three node-related columns from the NODES system table:

=> CREATE TABLE mynode AS SELECT node_name, node_state, node_address FROM nodes;
CREATE TABLE
=> SELECT * FROM mynode;
    node_name     | node_state |  node_address
------------------+------------+----------------
 v_vmart_node0001 | UP         | 192.168.223.11
(1 row)

Note

You cannot query system tables if the database cluster is in a recovering state. The database refuses connection requests and cannot be monitored. Vertica also does not support DDL and DML operations on system tables.

Where system tables reside

System tables are grouped into two schemas:

V_CATALOG schema: Provides information about persistent objects in the catalog
V_MONITOR schema: Provides information about transient system state

These schemas reside in the default search path. Unless you change the search path to exclude V_MONITOR or V_CATALOG or both, queries can specify a system table name that omits its schema.

You can query the SYSTEM_TABLES table for all Vertica system tables and their schemas. For example:

SELECT * FROM system_tables ORDER BY table_schema, table_name;

System table categories

Vertica system tables can be grouped into the following areas:

System information
System resources
Background processes
Workload and performance

Vertica reserves some memory to help monitor busy systems. Using simple system table queries makes it easier to troubleshoot issues. See also SYSQUERY.

Note

You can use external monitoring tools or scripts to query the system tables and act upon the information, as necessary. For example, when a host failure causes the K-safety level to fall below the desired level, the tool or script can notify the database administrator and/or appropriate IT personnel of the change, typically in the form of an e-mail.

Privileges

You can GRANT and REVOKE privileges on system tables, with the following restrictions:

You cannot GRANT privileges on system tables to the SYSMONITOR or PSEUDOSUPERUSER roles.
You cannot GRANT on system schemas.

Case-sensitive system table data

Some system table data might be stored in mixed case. For example, Vertica stores mixed-case identifier names the way you specify them in the CREATE statement, even though case is ignored when you reference them in queries. When these object names appear as data in the system tables, you'll encounter errors if you query them with an equality (=) operator because the case must exactly match the stored identifier. In particular, data in columns TABLE_SCHEMA and TABLE_NAME in system table TABLES are case sensitive.

If you don't know how the identifiers are stored, use the case-insensitive operator ILIKE. For example, given the following schema:

=> CREATE SCHEMA SS;
=> CREATE TABLE SS.TT (c1 int);
=> CREATE PROJECTION SS.TTP1 AS SELECT * FROM ss.tt UNSEGMENTED ALL NODES;
=> INSERT INTO ss.tt VALUES (1);

A query that uses the = operator returns 0 rows:

=> SELECT table_schema, table_name FROM v_catalog.tables WHERE table_schema ='ss';
table_schema | table_name
--------------+------------
(0 rows)

A query that uses case-insensitive ILIKE returns the expected results:

=> SELECT table_schema, table_name FROM v_catalog.tables WHERE table_schema ILIKE 'ss';
table_schema | table_name
--------------+------------
 SS           | TT
(1 row)

Examples

The following examples illustrate simple ways to use system tables in queries.


=> SELECT current_epoch, designed_fault_tolerance, current_fault_tolerance FROM SYSTEM;
 current_epoch | designed_fault_tolerance | current_fault_tolerance
---------------+--------------------------+-------------------------
           492 |                        1 |                       1
(1 row)

=> SELECT node_name, total_user_session_count, executed_query_count FROM query_metrics;
    node_name     | total_user_session_count | executed_query_count
------------------+--------------------------+----------------------
 v_vmart_node0001 |                      115 |                  353
 v_vmart_node0002 |                      114 |                   35
 v_vmart_node0003 |                      116 |                   34
(3 rows)

=> SELECT DISTINCT(schema_name), schema_owner FROM schemata;
 schema_name  | schema_owner
--------------+--------------
 v_catalog    | dbadmin
 v_txtindex   | dbadmin
 v_func       | dbadmin
 TOPSCHEMA    | dbadmin
 online_sales | dbadmin
 v_internal   | dbadmin
 v_monitor    | dbadmin
 structs      | dbadmin
 public       | dbadmin
 store        | dbadmin
(10 rows)

9 - Data collector utility

The Data Collector collects and retains history of important system activities, and records essential performance and resource utilization counters.

Data Collector extends system table functionality with minimal overhead, performing the following tasks:

Provides a framework for recording events.
Propagates information to system tables.

You can query Data Collector information to obtain the past state of system tables and extract aggregate information. It can also help you:

See what actions users have taken.
Locate performance bottlenecks.
Identify potential improvements to Vertica configuration.

Note

Data Collector does not collect data for nodes that are down, so no historical data is available for that node.

Data Collector works with Workload Analyzer, a tool that intelligently monitors the performance of SQL queries and workloads, and recommends tuning actions based on observations of the actual workload history.

Configuring and accessing data collector information

Data Collector retains the data it gathers according to configurable retention policies. Data Collector is on by default; you can disable it by setting set configuration parameter EnableDataCollector to 0, at the database and node levels, with ALTER DATABASE and ALTER NODE, respectively.

You can access metadata on collected data of all components through system table DATA_COLLECTOR. This table includes information about current collection policies on that component, and how much data is retained in memory and on disk.

Collected data is logged on disk in the DataCollector directory under the Vertica /catalog path. You can query logged data from component-specific Data Collector tables. You can also manage logged data with Vertica meta-functions; see Managing data collection logs for details.

9.1 - Configuring data retention policies

maintains retention policies for each Vertica component that it monitors—for example, TupleMoverEvents, or DepotEvictions.

Data collector maintains retention policies for each Vertica component that it monitors—for example, TupleMoverEvents, or DepotEvictions. You can identify monitored components by querying system table DATA_COLLECTOR. For example, the following query returns partition activity components:

=> SELECT DISTINCT component FROM data_collector WHERE component ILIKE '%partition%';
      component
----------------------
 HiveCustomPartitions
 CopyPartitions
 MovePartitions
 SwapPartitions
(4 rows)

Each component has its own retention policy, which is comprised of several properties:

MEMORY_BUFFER_SIZE_KB specifies in kilobytes the maximum amount of collected data that the Data Collector buffers in memory before moving it to disk.
DISK_SIZE_KB specifies in kilobytes the maximum disk space allocated for this component's Data Collector table.
INTERVAL_TIME is an INTERVAL data type that specifies how long data of a given component is retained in that component's Data Collector table.

Vertica sets default values on all properties, which you can modify with meta-functions SET_DATA_COLLECTOR_POLICY and SET_DATA_COLLECTOR_TIME_POLICY.

You can view retention policy settings by calling GET_DATA_COLLECTOR_POLICY. For example, the following statement returns the retention policy for the TupleMoverEvents component:

=> SELECT get_data_collector_policy('TupleMoverEvents');
                          get_data_collector_policy
-----------------------------------------------------------------------------
 1000KB kept in memory, 15000KB kept on disk. Time based retention disabled.
(1 row)

Setting retention memory and disk storage

Retention policy properties MEMORY_BUFFER_SIZE_KB and DISK_SIZE_KB combine to determine how much collected data is available at any given time. The two properties have the following dependencies: if MEMORY_BUFFER_SIZE_KB is set to 0, the Data Collector does not retain any data for this component either in memory or on disk; and if DISK_SIZE_KB is set to 0, then the Data Collector retains only as much component data as it can buffer, as set by MEMORY_BUFFER_SIZE_KB .

For example, the following statement changes memory and disk setting for component ResourceAcquisitions from its current setting of 1,000 KB memory and 10,000 KB disk space to 1500 KB and 25000 KB, respectively:

=> SELECT set_data_collector_policy('ResourceAcquisitions', '1500', '25000');
 set_data_collector_policy
---------------------------
 SET
(1 row)

You should consider setting MEMORY_BUFFER_SIZE_KB to a high value in the following cases:

Unusually high levels of data collection. If MEMORY_BUFFER_SIZE_KB is set too low, the Data Collector might be unable to flush buffered data to disk fast enough to keep up with the activity level, which can lead to loss of in-memory data.
Very large data collector records—for example, records with very long query strings. The Data Collector uses double-buffering, so it cannot retain in memory records that are more than 50 percent larger than MEMORY_BUFFER_SIZE_KB.

Setting time-based retention

By default, all collected data of a given component remain on disk and are accessible in the component's Data Collector table, up to the disk storage limit of that component's retention policy as set by its DISK_SIZE_KB property. You can call SET_DATA_COLLECTOR_POLICY to limit how long data is retained in a component's Data Collector table. In the following example, SET_DATA_COLLECTOR_POLICY is called on component TupleMoverEvents and sets its INTERVAL_TIME property to an interval of 30 minutes:

SELECT set_data_collector_policy('TupleMoverEvents ', '30 minutes'::interval);
set_data_collector_time_policy
--------------------------------
SET
(1 row)

After this call, the Data Collector table dc_tuple_mover_events only retains records of Tuple Mover activity that occurred in the last 30 minutes. Older Tuple Mover data are automatically dropped from this table. For example, after the previous call to SET_DATA_COLLECTOR_POLICY, querying dc_tuple_mover_events returns data of Tuple Mover activity that was collected over the last 30 minutes—in this case, since 07:58:21:

=> SELECT current_timestamp(0)  - '30 minutes'::interval AS '30 minutes ago';
   30 minutes ago
---------------------
 2020-08-13 07:58:21
(1 row)

=> SELECT time, node_name, session_id, user_name, transaction_id, operation FROM dc_tuple_mover_events WHERE node_name='v_vmart_node0001' ORDER BY transaction_id;
             time              |    node_name     |           session_id            | user_name |  transaction_id   | operation
-------------------------------+------------------+---------------------------------+-----------+-------------------+-----------
 2020-08-13 08:16:54.360597-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807826 | Mergeout
 2020-08-13 08:16:54.397346-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807826 | Mergeout
 2020-08-13 08:16:54.424002-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807826 | Mergeout
 2020-08-13 08:16:54.425989-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807829 | Mergeout
 2020-08-13 08:16:54.456829-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807829 | Mergeout
 2020-08-13 08:16:54.485097-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807829 | Mergeout
 2020-08-13 08:19:45.8045-04   | v_vmart_node0001 | v_vmart_node0001-190508:0x37b08 | dbadmin   | 45035996273807855 | Mergeout
 2020-08-13 08:19:45.742-04    | v_vmart_node0001 | v_vmart_node0001-190508:0x37b08 | dbadmin   | 45035996273807855 | Mergeout
 2020-08-13 08:19:45.684764-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x37b08 | dbadmin   | 45035996273807855 | Mergeout
 2020-08-13 08:19:45.799796-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807865 | Mergeout
 2020-08-13 08:19:45.768856-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807865 | Mergeout
 2020-08-13 08:19:45.715424-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807865 | Mergeout
 2020-08-13 08:25:20.465604-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.497266-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.518839-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.52099-04  | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
 2020-08-13 08:25:20.549075-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
 2020-08-13 08:25:20.569072-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
(18 rows)

After 25 minutes elapse, 12 of these records age out of the 30 minute interval set for TupleMoverEvents., and are dropped from dc_tuple_mover_events:

=> SELECT current_timestamp(0)  - '30 minutes'::interval AS '30 minutes ago';
   30 minutes ago
---------------------
 2020-08-13 08:23:33
(1 row)


=> SELECT time, node_name, session_id, user_name, transaction_id, operation FROM dc_tuple_mover_events WHERE node_name='v_vmart_node0001' ORDER BY transaction_id;
             time              |    node_name     |           session_id            | user_name |  transaction_id   | operation
-------------------------------+------------------+---------------------------------+-----------+-------------------+-----------
 2020-08-13 08:25:20.465604-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.497266-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.518839-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.52099-04  | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
 2020-08-13 08:25:20.549075-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
 2020-08-13 08:25:20.569072-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
(6 rows)

Note

Setting a component policy's INTERVAL_TIME property has no effect on how much data storage the Data Collector retains on disk for that component. Maximum disk storage capacity is determined by the DISK_SIZE_KB property. Setting the INTERVAL_TIME property only affects how long data is retained by the component's Data Collector table.

The meta-function SET_DATA_COLLECTOR_TIME_POLICY also sets a retention policy's INTERVAL_TIME property. Unlike SET_DATA_COLLECTOR_POLICY, this meta-function only sets the INTERVAL_TIME property . It also differs in that you can use this meta-function to update INTERVAL_TIME on all components, by omitting the component argument. For example:

SELECT set_data_collector_time_policy('1 day'::interval);
set_data_collector_time_policy
--------------------------------
SET
(1 row)

=> SELECT DISTINCT component, INTERVAL_SET, INTERVAL_TIME FROM DATA_COLLECTOR WHERE component ILIKE '%partition%';
      component       | INTERVAL_SET | INTERVAL_TIME
----------------------+--------------+---------------
 HiveCustomPartitions | t            | 1
 MovePartitions       | t            | 1
 CopyPartitions       | t            | 1
 SwapPartitions       | t            | 1
(4 rows)

To clear the INTERVAL_TIME policy property, call SET_DATA_COLLECTOR_TIME_POLICY with a negative integer argument. For example:

=> SELECT set_data_collector_time_policy('-1');
 set_data_collector_time_policy
--------------------------------
 SET
(1 row)

=> SELECT DISTINCT component, INTERVAL_SET, INTERVAL_TIME FROM DATA_COLLECTOR WHERE component ILIKE '%partition%';
      component       | INTERVAL_SET | INTERVAL_TIME
----------------------+--------------+---------------
 MovePartitions       | f            | 0
 SwapPartitions       | f            | 0
 HiveCustomPartitions | f            | 0
 CopyPartitions       | f            | 0
(4 rows)

Note

Setting INTERVAL_TIME on a retention policy also sets its BOOLEAN property INTERVAL_SET.

9.2 - Querying data collector tables

Data Collector tables (prefixed by dc_) are in the V_INTERNAL schema.

Caution

Data Collector tables (prefixed by dc_) are in the V_INTERNAL schema. If you use Data Collector tables in scripts or monitoring tools, be aware that any Vertica upgrade is liable to remove or change them without notice.

You can obtain component-specific data from Data Collector tables. The Data Collector compiles the component data from its log files in a table format that you can query with standard SQL queries. You can identify Data Collector table names for specific components through system table Data Collector. For example:

=> SELECT distinct component, table_name FROM data_collector where component ILIKE 'lock%';
  component   |    table_name
--------------+------------------
 LockRequests | dc_lock_requests
 LockReleases | dc_lock_releases
 LockAttempts | dc_lock_attempts
(3 rows)

You can then query the desired Data Collector tables—for example, check for lock delays in dc_lock_attempts:

=> SELECT * from dc_lock_attempts WHERE description != 'Granted immediately';
-[ RECORD 1 ]------+------------------------------
time               | 2020-08-17 00:14:07.187607-04
node_name          | v_vmart_node0001
session_id         | v_vmart_node0001-319647:0x1d
user_id            | 45035996273704962
user_name          | dbadmin
transaction_id     | 45035996273819050
object             | 0
object_name        | Global Catalog
mode               | X
promoted_mode      | X
scope              | TRANSACTION
start_time         | 2020-08-17 00:14:07.184663-04
timeout_in_seconds | 300
result             | granted
description        | Granted after waiting

9.3 - Managing data collection logs

On startup, Vertica creates a DataCollector directory under the database catalog directory of each node.

On startup, Vertica creates a DataCollector directory under the database catalog directory of each node. This directory contains one or more logs for individual components. For example:

[dbadmin@doch01 DataCollector]$ pwd
/home/dbadmin/VMart/v_vmart_node0001_catalog/DataCollector
[dbadmin@doch01 DataCollector]$ ls -1 -g Lock*
-rw------- 1 verticadba 2559879 Aug 17 00:14 LockAttempts_650572441057355.log
-rw------- 1 verticadba  614579 Aug 17 05:28 LockAttempts_650952885486175.log
-rw------- 1 verticadba 2559895 Aug 14 18:31 LockReleases_650306482037650.log
-rw------- 1 verticadba 1411127 Aug 17 05:28 LockReleases_650759468041873.log

The DataCollector directory also contains a pair of SQL template files for each component:

CREATE_component_TABLE.sql provides DDL for creating a table where you can load Data Collector logs for a given component—for example, LockAttempts:

[dbadmin@doch01 DataCollector]$ cat CREATE_LockAttempts_TABLE.sql
\set dcschema 'echo ${DCSCHEMA:-dc}'
CREATE TABLE :dcschema.dc_lock_attempts(
  "time" TIMESTAMP WITH TIME ZONE,
  "node_name" VARCHAR(128),
  "session_id" VARCHAR(128),
  "user_id" INTEGER,
  "user_name" VARCHAR(128),
  "transaction_id" INTEGER,
  "object" INTEGER,
  "object_name" VARCHAR(128),
  "mode" VARCHAR(128),
  "promoted_mode" VARCHAR(128),
  "scope" VARCHAR(128),
  "start_time" TIMESTAMP WITH TIME ZONE,
  "timeout_in_seconds" INTEGER,
  "result" VARCHAR(128),
  "description" VARCHAR(64000)
);

COPY_component_TABLE.sql contains SQL for loading (with COPY) the data log files into the table that the CREATE script creates. For example:

[dbadmin@doch01 DataCollector]$ cat COPY_LockAttempts_TABLE.sql
\set dcpath 'echo ${DCPATH:-$PWD}'
\set dcschema 'echo ${DCSCHEMA:-dc}'
\set logfiles '''':dcpath'/LockAttempts_*.log'''
COPY :dcschema.dc_lock_attempts(
  LockAttempts_start_filler FILLER VARCHAR(64) DELIMITER E'\n',
  "time_nfiller" FILLER VARCHAR(32) DELIMITER ':',
  "time" FORMAT '_internal' DELIMITER E'\n',
  "node_name_nfiller" FILLER VARCHAR(32) DELIMITER ':',
  "node_name" ESCAPE E'\001' DELIMITER E'\n',
  "session_id_nfiller" FILLER VARCHAR(32) DELIMITER ':',
  "session_id" ESCAPE E'\001' DELIMITER E'\n',
  "user_id_nfiller" FILLER VARCHAR(32) DELIMITER ':',
  "user_id" FORMAT 'd' DELIMITER E'\n',
  "user_name_nfiller" FILLER VARCHAR(32) DELIMITER ':',
  "user_name" ESCAPE E'\001' DELIMITER E'\n',
  "transaction_id_nfiller" FILLER VARCHAR(32) DELIMITER ':',
  "transaction_id" FORMAT 'd' DELIMITER E'\n',
  "object_nfiller" FILLER VARCHAR(32) DELIMITER ':',
  "object" FORMAT 'd' DELIMITER E'\n',
  "object_name_nfiller" FILLER VARCHAR(32) DELIMITER ':',
  "object_name" ESCAPE E'\001' DELIMITER E'\n',
  "mode_nfiller" FILLER VARCHAR(32) DELIMITER ':',
  "mode" ESCAPE E'\001' DELIMITER E'\n',
  "promoted_mode_nfiller" FILLER VARCHAR(32) DELIMITER ':',
  "promoted_mode" ESCAPE E'\001' DELIMITER E'\n',
  "scope_nfiller" FILLER VARCHAR(32) DELIMITER ':',
  "scope" ESCAPE E'\001' DELIMITER E'\n',
  "start_time_nfiller" FILLER VARCHAR(32) DELIMITER ':',
  "start_time" FORMAT '_internal' DELIMITER E'\n',
  "timeout_in_seconds_nfiller" FILLER VARCHAR(32) DELIMITER ':',
  "timeout_in_seconds" FORMAT 'd' DELIMITER E'\n',
  "result_nfiller" FILLER VARCHAR(32) DELIMITER ':',
  "result" ESCAPE E'\001' DELIMITER E'\n',
  "description_nfiller" FILLER VARCHAR(32) DELIMITER ':',
  "description" ESCAPE E'\001'
)  FROM :logfiles RECORD TERMINATOR E'\n.\n' DELIMITER E'\n';

Log management meta-functions

You can manage Data Collector logs with Vertica meta-functions FLUSH_DATA_COLLECTOR and CLEAR_DATA_COLLECTOR. Both functions can specify a single component, or execute on all components:

FLUSH_DATA_COLLECTOR waits until memory logs are moved to disk and then flushes the Data Collector, synchronizing the log with disk storage. For example, the following statement executes on all components:
```
=> SELECT flush_data_collector();
 flush_data_collector
----------------------
 FLUSH
(1 row)
```
CLEAR_DATA_COLLECTOR clears all memory and disk records from Data Collector tables and logs, and resets collection statistics in system table DATA_COLLECTOR. For example, the following statement executes on data collected for component ResourceAcquisitions:
```
=> SELECT clear_data_collector('ResourceAcquisitions');
 clear_data_collector
----------------------
 CLEAR
(1 row)
```

10 - Monitoring partition reorganization

When you use ALTER TABLE...REORGANIZE, the operation reorganizes the data in the background.

When you use ALTER TABLE...REORGANIZE, the operation reorganizes the data in the background.

You can monitor details of the reorganization process by polling the following system tables:

V_MONITOR.PARTITION_STATUS displays the fraction of each table that is partitioned correctly.
V_MONITOR.PARTITION_REORGANIZE_ERRORS logs errors issued by the reorganize process.
V_MONITOR.PARTITIONS displays NULL in the partition_key column for any ROS that was not reorganized.

Note

The corresponding foreground process to ALTER TABLE...REORGANIZE is PARTITION_TABLE.

11 - Monitoring resource pools

You can use the following to find information about resource pools:.

You can use the following to find information about resource pools:

RESOURCE_POOL_STATUS returns current data from resource pools—for example, current memory usage, resources requested and acquired by various requests, and state of the queues.
RESOURCE_ACQUISITIONS displays all resources granted to the queries that are currently running.
SHOW SESSION shows, along with other session-level parameters, the current session's resource pool.

You can also use the Management Console to obtain run-time data on resource pool usage.

Note

The Linux top command returns data on overall CPU usage and I/O wait time across the system. Because of file system caching, the resident memory size returned by top is not the best indicator of actual memory use or available reserves.

Viewing resource pool status

The following example queries RESOURCE_POOL_STATUS for memory size data:

=> SELECT pool_name poolName,
    node_name nodeName,
    max_query_memory_size_kb maxQueryMemSizeKb,
    max_memory_size_kb maxMemSizeKb,
    memory_size_actual_kb memSizeActualKb
    FROM resource_pool_status WHERE pool_name='ceo_pool';
 poolName |     nodeName     | maxQueryMemSizeKb | maxMemSizeKb | memSizeActualKb
----------+------------------+-------------------+--------------+-----------------
 ceo_pool | v_vmart_node0001 |          12179388 |     13532654 |         1843200
 ceo_pool | v_vmart_node0002 |          12191191 |     13545768 |         1843200
 ceo_pool | v_vmart_node0003 |          12191170 |     13545745 |         1843200
(3 rows)

Viewing query resource acquisitions

The following example displays all resources granted to the queries that are currently running. The information shown is stored in system table RESOURCE_ACQUISITIONS table. You can see that the query execution used 708504 KB of memory from the GENERAL pool.

=> SELECT pool_name, thread_count, open_file_handle_count, memory_inuse_kb,
   queue_entry_timestamp, acquisition_timestamp
   FROM V_MONITOR.RESOURCE_ACQUISITIONS WHERE node_name ILIKE '%node0001';

-[ RECORD 1 ]----------+------------------------------
pool_name              | sysquery
thread_count           | 4
open_file_handle_count | 0
memory_inuse_kb        | 4103
queue_entry_timestamp  | 2013-12-05 07:07:08.815362-05
acquisition_timestamp  | 2013-12-05 07:07:08.815367-05
-[ RECORD 2 ]----------+------------------------------
...
-[ RECORD 8 ]----------+------------------------------
pool_name              | general
thread_count           | 12
open_file_handle_count | 18
memory_inuse_kb        | 708504
queue_entry_timestamp  | 2013-12-04 12:55:38.566614-05
acquisition_timestamp  | 2013-12-04 12:55:38.566623-05
-[ RECORD 9 ]----------+------------------------------
...

You can determine how long a query waits in the queue before it can run. To do so, you obtain the difference between acquisition_timestamp and queue_entry_timestamp using a query as this example shows:

=> SELECT pool_name, queue_entry_timestamp, acquisition_timestamp,
    (acquisition_timestamp-queue_entry_timestamp) AS 'queue wait'
   FROM V_MONITOR.RESOURCE_ACQUISITIONS WHERE node_name ILIKE '%node0001';

-[ RECORD 1 ]---------+------------------------------
pool_name             | sysquery
queue_entry_timestamp | 2013-12-05 07:07:08.815362-05
acquisition_timestamp | 2013-12-05 07:07:08.815367-05
queue wait            | 00:00:00.000005
-[ RECORD 2 ]---------+------------------------------
pool_name             | sysquery
queue_entry_timestamp | 2013-12-05 07:07:14.714412-05
acquisition_timestamp | 2013-12-05 07:07:14.714417-05
queue wait            | 00:00:00.000005
-[ RECORD 3 ]---------+------------------------------
pool_name             | sysquery
queue_entry_timestamp | 2013-12-05 07:09:57.238521-05
acquisition_timestamp | 2013-12-05 07:09:57.281708-05
queue wait            | 00:00:00.043187
-[ RECORD 4 ]---------+------------------------------
...

Querying user-defined resource pools

The Boolean column IS_INTERNAL in system tables RESOURCE_POOLS and RESOURCE_POOL_STATUS lets you get data on user-defined resource pools only. For example:

SELECT name, subcluster_oid, subcluster_name, memorysize, maxmemorysize, priority, maxconcurrency
dbadmin->     FROM V_CATALOG.RESOURCE_POOLS where is_internal ='f';
   name       |  subcluster_oid   | subcluster_name | memorysize | maxmemorysize | priority | maxconcurrency
--------------+-------------------+-----------------+------------+---------------+----------+----------------
 load_pool    | 72947297254957395 | default         | 0%         |               |       10 |
 ceo_pool     | 63570532589529860 | c_subcluster    | 250M       |               |       10 |
 ad hoc_pool  |                 0 |                 | 200M       | 200M          |        0 |
 billing_pool | 45579723408647896 | ar_subcluster   | 0%         |               |        0 |              3
 web_pool     |                 0 | analytics_1     | 25M        |               |       10 |              5
 batch_pool   | 47479274633682648 | default         | 150M       | 150M          |        0 |             10
 dept1_pool   |                 0 |                 | 0%         |               |        5 |
 dept2_pool   |                 0 |                 | 0%         |               |        8 |
 dashboard    | 45035996273843504 | analytics_1     | 0%         |               |        0 |
(9 rows)

12 - Monitoring recovery

When your Vertica database is recovering from a failure, it's important to monitor the recovery process.

When your Vertica database is recovering from a failure, it's important to monitor the recovery process. There are several ways to monitor database recovery:

12.1 - Viewing log files on each node

During database recovery, Vertica adds logging information to the .log on each host.

During database recovery, Vertica adds logging information to the vertica.log on each host. Each message is identified with a [Recover]string.

Use the tail command to monitor recovery progress by viewing the relevant status messages, as follows.

$ tail -f catalog-path/database-name/node-name_catalog/vertica.log
01/23/08 10:35:31 thr:Recover:0x2a98700970 [Recover] <INFO> Changing host v_vmart_node0001 startup state from INITIALIZING to RECOVERING
01/23/08 10:35:31 thr:CatchUp:0x1724b80 [Recover] <INFO> Recovering to specified epoch 0x120b6
01/23/08 10:35:31 thr:CatchUp:0x1724b80 [Recover] <INFO> Running 1 split queries
01/23/08 10:35:31 thr:CatchUp:0x1724b80 [Recover] <INFO> Running query: ALTER PROJECTION proj_tradesquotes_0 SPLIT v_vmart_node0001 FROM 73911;

12.2 - Using system tables to monitor recovery

Use the following system tables to monitor recover:.

Use the following system tables to monitor recover:

Specifically, the recovery_status system table includes information about the node that is recovering, the epoch being recovered, the current recovery phase, and running status:

=>select node_name, recover_epoch, recovery_phase, current_completed, is_running from recovery_status;
node_name            | recover_epoch | recovery_phase    | current_completed | is_running
---------------------+---------------+-------------------+-------------------+--------------
 v_vmart_node0001    |               |                   | 0                 | f
 v_vmart_node0002    | 0             | historical pass 1 | 0                 | t
 v_vmart_node0003    | 1             | current           | 0                 | f

The projection_recoveries system table maintains history of projection recoveries. To check the recovery status, you can summarize the data for the recovering node, and run the same query several times to see if the counts change. Differing counts indicate that the recovery is working and in the process of recovering all missing data.

=> select node_name, status , progress from projection_recoveries;
node_name              | status      | progress
-----------------------+-------------+---------
v_vmart_node0001       | running     | 61

To see a single record from the projection_recoveries system table, add limit 1 to the query.

After a recovery has completed, Vertica continues to store information from the most recent recovery in these tables.

12.3 - Viewing cluster state and recovery status

Use the admintools view_cluster tool from the command line to see the cluster state:.

Use the admintools view_cluster tool from the command line to see the cluster state:

$ /opt/vertica/bin/admintools -t view_cluster
DB | Host | State
---------+--------------+------------
<data_base> | 112.17.31.10 | RECOVERING
<data_base> | 112.17.31.11 | UP
<data_base> | 112.17.31.12 | UP
<data_base> | 112.17.31.17 | UP
________________________________

12.4 - Monitoring cluster status after recovery

When recovery has completed:.

When recovery has completed:

Launch Administration Tools.
From the Main Menu, select View Database Cluster State and click OK.

The utility reports your node's status as UP.

Note

You can also monitor the state of your database nodes on the Management Console Overview page under the Database section, which tells you the number of nodes that are up, critical, recovering, or down. To get node-specific information, click Manage at the bottom of the page.

13 - Clearing projection refresh history

To immediately purge this information, call CLEAR_PROJECTION_REFRESHES:.

System table PROJECTION_REFRESHES records information about refresh operations, successful and unsuccessful. PROJECTION_REFRESHES retains projection refresh data until one of the following events occurs:

Another refresh operation starts on a given projection.
CLEAR_PROJECTION_REFRESHES is called and clears data on all projections.
The table's storage quota is exceeded.

To immediately purge this information, call CLEAR_PROJECTION_REFRESHES:

=> SELECT clear_projection_refreshes();
clear_projection_refreshes
----------------------------
 CLEAR
(1 row)

Note

PROJECTION_REFRESHES checks the Boolean column IS_EXECUTING in PROJECTION_REFRESHES to determine whether refresh operations are still running or are complete. The function only removes information for refresh operations that are complete.

14 - Monitoring Vertica using notifiers

A Vertica notifier is a push-based mechanism for sending messages from Vertica to endpoints like Apache Kafka or syslog.

A Vertica notifier is a push-based mechanism for sending messages from Vertica to endpoints like Apache Kafka or syslog. For example, you can configure a long-running script to send notifications at various stages and then at the completion of a task.

To use a notifier:

Use CREATE NOTIFIER to create one of the following notifier types:
- Syslog
- SNS
- Kafka
Send a notification to the NOTIFIER endpoint with any of the following:
- NOTIFY: Manually sends a message to the NOTIFIER endpoint.
- SET_DATA_COLLECTOR_NOTIFY_POLICY: Creates a notification policy, which automatically sends a message to the NOTIFIER endpoint when a specified event occurs.

Monitoring Vertica

1 - Monitoring log files

When a database is running

Note

Note

When the database/node is starting up

See also

2 - Rotating log files

Executing the Python script through the dbadmin logrotate cron job

Using the administration tools logrotate utility

Configure logrotate for MC

Rotating logs manually

See also

3 - Monitoring process status (ps)

4 - Monitoring Linux resource usage

Note

5 - Monitoring disk space usage

6 - Monitoring elastic cluster rebalancing

Note

Historical rebalance information

7 - Monitoring events

7.1 - Event logging mechanisms

7.2 - Event codes

7.3 - Event data

7.4 - Configuring event reporting

7.4.1 - Configuring reporting for the simple notification service (SNS)

Note

Creating an SNS notifier

SNS configuration

AWS region and endpoint

AWS credentials

Handling HTTPS

Examples

7.4.2 - Configuring reporting for syslog

Enabling Vertica to trap events for syslog

Defining events to trap for syslog

Defining the SyslogFacility to use for reporting

Trapping other event types

See also

7.4.3 - Configuring reporting for SNMP

7.4.4 - Configuring event trapping for SNMP

To configure Vertica to trap events for SNMP

Note

To enable event trapping for SNMP

To define where Vertica send traps

Note

To define which events Vertica traps

Note

7.4.5 - Verifying SNMP configuration

7.5 - Event reporting examples

Vertica.log

SNMP

Syslog

8 - Using system tables

Note

Where system tables reside

System table categories

Note

Privileges

Case-sensitive system table data

Examples

9 - Data collector utility

Note

Configuring and accessing data collector information

9.1 - Configuring data retention policies

Setting retention memory and disk storage

Setting time-based retention

Note

Note

9.2 - Querying data collector tables

Caution

9.3 - Managing data collection logs

Log management meta-functions

10 - Monitoring partition reorganization

Note

11 - Monitoring resource pools

Note

Viewing resource pool status

Viewing query resource acquisitions

Querying user-defined resource pools