This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Monitoring events

To help you monitor your database system, Vertica traps and logs significant events that affect database performance and functionality if you do not address their root causes.

1: Event logging mechanisms
2: Event codes
3: Event data
4: Configuring event reporting

4.1: Configuring reporting for syslog
4.2: Configuring reporting for SNMP
4.3: Configuring event trapping for SNMP
4.4: Verifying SNMP configuration

5: Event reporting examples

To help you monitor your database system, Vertica traps and logs significant events that affect database performance and functionality if you do not address their root causes. This section describes where events are logged, the types of events that Vertica logs, how to respond to these events, the information that Vertica provides for these events, and how to configure event monitoring.

1 - Event logging mechanisms

Vertica posts events to the following mechanisms:.

Vertica posts events to the following mechanisms:

Mechanism	Description
`vertica.log`	All events are automatically posted to `vertica.log`. See Monitoring the Log Files.
`ACTIVE_EVENTS`	This SQL system table provides information about all open events. See Using system tables and ACTIVE_EVENTS.
`SNMP`	To post traps to SNMP, enable global reporting in addition to each individual event you want trapped. See Configuring event reporting.
`Syslog`	To log events to syslog, enable event reporting for each individual event you want logged. See Configuring event reporting.

2 - Event codes

The following table lists the event codes that Vertica logs to the events system tables.

Event Code	Event Code Description	Description	Action
0	Low Disk Space	The database is running out of disk space or a disk is failing or there is a I/O hardware failure.	It is imperative that you add more disk space or replace the failing disk or hardware as soon as possible. Check `dmesg` to see what caused the problem. Also, use the DISK_RESOURCE_REJECTIONS system table to determine the types of disk space requests that are being rejected and the hosts on which they are being rejected. See Managing disk space for more information about low disk space.
1	Read Only File System	The database does not have write access to the file system for the data or catalog paths. This can sometimes occur if Linux remounts a drive due to a kernel issue.	Modify the privileges on the file system to give the database write access.
2	Loss Of K Safety	The database is no longer K-Safe because there are insufficient nodes functioning within the cluster. Loss of K-safety causes the database to shut down. In a four-node cluster, for example, K-safety=1. If one node fails, the fault tolerance is at a critical level. If two nodes fail, the system loses K-safety.	If a system shuts down due to loss of K-safety, you need to recover the system. See Failure recovery.
3	Current Fault Tolerance at Critical Level	One or more nodes in the cluster have failed. If the database loses one more node, it is no longer K-Safe and it shuts down. (For example, a four-node cluster is no longer K-safe if two nodes fail.)	Restore any nodes that have failed or been shut down.
4	Too Many ROS Containers	Heavy load activity on one or more projections can sometimes generate more ROS containers than the Tuple Mover can handle. Vertica allows up to 1024 ROS containers per projection before it rolls back additional load jobs and returns a ROS pushback error message.	Typically, the Tuple Mover catches up with pending mergeout requests and the Optimizer can resume executing queries on the affected tables (see Mergeout). If this problem does not resolve quickly, or if it occurs frequently, it is probably related to insufficient RAM allocated to MAXMEMORY in the TM resource pool.
5	WOS Over Flow	Deprecated
6	Node State Change	The node state has changed.	Check the status of the node.
7	Recovery Failure	The database was not restored to a functional state after a hardware or software related failure.	The reason for recovery failure can vary. See the event description for more information about your specific situation.
8	Recovery Error	The database encountered an error while attempting to recover. If the number of recovery errors exceeds Max Tries, the Recovery Failure event is triggered. See Recovery Failure within this table.	The reason for a recovery error can vary. See the event description for more information about your specific situation.
9	Recovery Lock Error	A recovering node could not obtain an S lock on the table. If you have a continuous stream of COPY commands in progress, recovery might not be able to obtain this lock even after multiple re-tries.	Either momentarily stop the loads or pick a time when the cluster is not busy to restart the node and let recovery proceed.
10	Recovery Projection Retrieval Error	Vertica was unable to retrieve information about a projection.	The reason for a recovery projection retrieval error can vary. See the event description for more information about your specific situation.
11	Refresh Error	The database encountered an error while attempting to refresh.	The reason for a refresh error can vary. See the event description for more information about your specific situation.
12	Refresh Lock Error	The database encountered a locking error during refresh.	The reason for a refresh error can vary. See the event description for more information about your specific situation.
13	Tuple Mover Error	Deprecated
14	Timer Service Task Error	An error occurred in an internal scheduled task.	Internal use only
15	Stale Checkpoint	Deprecated
16	CRC Mismatch	The Cyclic Redundancy Check returned an error or errors while fetching data.	Review the `vertica.log` file or the SNMP trap utility to review the errors. For more information see Evaluating CRC errors.
20	Cluster Read-only	Cluster cannot perform updates due to quorum loss and can only be queried.	Restart failed nodes. See Recover from Read-Only Mode.

3 - Event data

To help you interpret and solve the issue that triggered an event, each event provides a variety of data, depending upon the event logging mechanism used.

The following table describes the event data and indicates where it is used.

vertica.log	ACTIVE_EVENTS (column names)	SNMP	Syslog	Description
N/A	NODE_NAME	N/A	N/A	The node where the event occurred.
Event Code	EVENT_CODE	Event Type	Event Code	A numeric ID that indicates the type of event. See Event Types in the previous table for a list of event type codes.
Event Id	EVENT_ID	Event OID	Event Id	A unique numeric ID that identifies the specific event.
Event Severity	EVENT_ SEVERITY	Event Severity	Event Severity	The severity of the event from highest to lowest. These events are based on standard syslog severity types: `0 – Emergency` `1 – Alert` `2 – Critical` `3 – Error` `4 – Warning` `5 – Notice` `6 – Info` `7 – Debug`
PostedTimestamp	EVENT_ POSTED_ TIMESTAMP	N/A	PostedTimestamp	The year, month, day, and time the event was reported. Time is provided as military time.
ExpirationTimestamp	EVENT_ EXPIRATION	N/A	ExpirationTimestamp	The time at which this event expires. If the same event is posted again prior to its expiration time, this field gets updated to a new expiration time.
EventCodeDescription	EVENT_CODE_ DESCRIPTION	Description	EventCodeDescription	A brief description of the event and details pertinent to the specific situation.
ProblemDescription	EVENT_PROBLEM_ DESCRIPTION	Event Short Description	ProblemDescription	A generic description of the event.
N/A	REPORTING_ NODE	Node Name	N/A	The name of the node within the cluster that reported the event.
DatabaseName	N/A	Database Name	DatabaseName	The name of the database that is impacted by the event.
N/A	N/A	Host Name	Hostname	The name of the host within the cluster that reported the event.
N/A	N/A	Event Status	N/A	The status of the event. It can be either: `1 – Open` `2 – Clear`

4 - Configuring event reporting

Event reporting is automatically configured for , and current events are automatically posted to the ACTIVE_EVENTS system table.

Event reporting is automatically configured for vertica.log, and current events are automatically posted to the ACTIVE_EVENTS system table. You can also configure Vertica to post events to syslog and SNMP.

4.1 - Configuring reporting for syslog

Syslog is a network-logging utility that issues, stores, and processes log messages.

Syslog is a network-logging utility that issues, stores, and processes log messages. It is a useful way to get heterogeneous data into a single data repository.

To log events to syslog, enable event reporting for each individual event you want logged. Messages are logged, by default, to /var/log/messages.

Configuring event reporting to syslog consists of:

Enabling Vertica to trap events for syslog.
Defining which events Vertica traps for syslog.

Vertica strongly suggests that you trap the Stale Checkpoint event.
Defining which syslog facility to use.

Enabling Vertica to trap events for syslog

To enable event trapping for syslog, issue the following SQL command:

=> ALTER DATABASE DEFAULT SET SyslogEnabled = 1;

To disable event trapping for syslog, issue the following SQL command:

=> ALTER DATABASE DEFAULT SET SyslogEnabled = 0;

Defining events to trap for syslog

To define events that generate a syslog entry, issue the following SQL command, one of the events described in the list below the command:

=> ALTER DATABASE DEFAULT SET SyslogEvents = 'events-list';

where events-list is a comma-delimited list of events, one or more of the following:

Low Disk Space
Read Only File System
Loss Of K Safety
Current Fault Tolerance at Critical Level
Too Many ROS Containers
Node State Change
Recovery Failure
Recovery Error
Recovery Lock Error
Recovery Projection Retrieval Error
Refresh Error
Refresh Lock Error
Tuple Mover Error
Timer Service Task Error
Stale Checkpoint

The following example generates a syslog entry for low disk space and recovery failure:

=> ALTER DATABASE DEFAULT SET SyslogEvents = 'Low Disk Space, Recovery Failure';

Defining the SyslogFacility to use for reporting

The syslog mechanism allows for several different general classifications of logging messages, called facilities. Typically, all authentication-related messages are logged with the auth (or authpriv) facility. These messages are intended to be secure and hidden from unauthorized eyes. Normal operational messages are logged with the daemon facility, which is the collector that receives and optionally stores messages.

The SyslogFacility directive allows all logging messages to be directed to a different facility than the default. When the directive is used, all logging is done using the specified facility, both authentication (secure) and otherwise.

To define which SyslogFacility Vertica uses, issue the following SQL command:

=> ALTER DATABASE DEFAULT SET SyslogFacility = 'Facility_Name';

Where the facility-level argument <Facility_Name> is one of the following:

auth
authpriv (Linux only)
cron
uucp (UUCP subsystem)
daemon
ftp (Linux only)
lpr (line printer subsystem)
mail (mail system)
news (network news subsystem)
user (default system)
local0 (local use 0)
local1 (local use 1)
local2 (local use 2)
local3 (local use 3)
local4 (local use 4)
local5 (local use 5)
local6 (local use 6)
local7 (local use 7)

Trapping other event types

To trap events other than the ones listed above, create a syslog notifier and allow it to trap the desired events with SET_DATA_COLLECTOR_NOTIFY_POLICY.

Events monitored by this notifier type are not logged to MONITORING_EVENTS nor vertica.log.

The following example creates a notifier that writes a message to syslog when the Data collector (DC) component LoginFailures updates:

Enable syslog notifiers for the current database:

=> ALTER DATABASE DEFAULT SET SyslogEnabled = 1;

Create and enable a syslog notifier v_syslog_notifier:

=> CREATE NOTIFIER v_syslog_notifier ACTION 'syslog'
    ENABLE
    MAXMEMORYSIZE '10M'
    IDENTIFIED BY 'f8b0278a-3282-4e1a-9c86-e0f3f042a971'
    PARAMETERS 'eventSeverity = 5';

Configure the syslog notifier v_syslog_notifier for updates to the LoginFailures DC component with SET_DATA_COLLECTOR_NOTIFY_POLICY:

=> SELECT SET_DATA_COLLECTOR_NOTIFY_POLICY('LoginFailures','v_syslog_notifier', 'Login failed!', true);

This notifier writes the following message to syslog (default location: /var/log/messages) when a user fails to authenticate as the user Bob:

Apr 25 16:04:58
vertica_host_01
vertica:
    Event Posted:
        Event Code:21
        Event Id:0
        Event Severity: Notice [5]
        PostedTimestamp: 2022-04-25 16:04:58.083063
        ExpirationTimestamp: 2022-04-25 16:04:58.083063
        EventCodeDescription: Notifier
        ProblemDescription: (Login failed!)
    {
       "_db":"VMart",
       "_schema":"v_internal",
       "_table":"dc_login_failures",
       "_uuid":"f8b0278a-3282-4e1a-9c86-e0f3f042a971",
       "authentication_method":"Reject",
       "client_authentication_name":"default: Reject",
       "client_hostname":"::1",
       "client_label":"",
       "client_os_user_name":"dbadmin",
       "client_pid":523418,
       "client_version":"",
       "database_name":"dbadmin",
       "effective_protocol":"3.8",
       "node_name":"v_vmart_node0001",
       "reason":"REJECT",
       "requested_protocol":"3.8",
       "ssl_client_fingerprint":"",
       "ssl_client_subject":"",
       "time":"2022-04-25 16:04:58.082568-05",
       "user_name":"Bob"
    }#012
    DatabaseName: VMart
    Hostname: vertica_host_01

4.2 - Configuring reporting for SNMP

Configuring event reporting for SNMP consists of:.

Configuring event reporting for SNMP consists of:

Configuring Vertica to enable event trapping for SNMP as described below.
Importing the Vertica Management Information Base (MIB) file into the SNMP monitoring device.

The Vertica MIB file allows the SNMP trap receiver to understand the traps it receives from Vertica. This, in turn, allows you to configure the actions it takes when it receives traps.

Vertica supports the SNMP V1 trap protocol, and it is located in /opt/vertica/sbin/VERTICA-MIB. See the documentation for your SNMP monitoring device for more information about importing MIB files.
Configuring the SNMP trap receiver to handle traps from Vertica.

SNMP trap receiver configuration differs greatly from vendor to vendor. As such, the directions presented here for configuring the SNMP trap receiver to handle traps from Vertica are generic.

Vertica traps are single, generic traps that contain several fields of identifying information. These fields equate to the event data described in Monitoring events. However, the format used for the field names differs slightly. Under SNMP, the field names contain no spaces. Also, field names are pre-pended with “vert”. For example, Event Severity becomes vertEventSeverity.

When configuring your trap receiver, be sure to use the same hostname, port, and community string you used to configure event trapping in Vertica.

Examples of network management providers:
- Network Node Manager i
- IBM Tivoli
- AdventNet
- Net-SNMP (Open Source)
- Nagios (Open Source)
- Open NMS (Open Source)

4.3 - Configuring event trapping for SNMP

The following events are trapped by default when you configure Vertica to trap events for SNMP:.

The following events are trapped by default when you configure Vertica to trap events for SNMP:

Low Disk Space
Read Only File System
Loss of K Safety
Current Fault Tolerance at Critical Level
Too Many ROS Containers
Node State Change
Recovery Failure
Stale Checkpoint
CRC Mismatch

To configure Vertica to trap events for SNMP

Enable Vertica to trap events for SNMP.
Define where Vertica sends the traps.
Optionally redefine which SNMP events Vertica traps.

Note

After you complete steps 1 and 2 above, Vertica automatically traps the default SNMP events. Only perform step 3 if you want to redefine which SNMP events are trapped. Vertica recommends that you trap the Stale Checkpoint event even if you decide to reduce the number events Vertica traps for SNMP. The specific settings you define have no effect on traps sent to the log. All events are trapped to the log.

To enable event trapping for SNMP

Use the following SQL command:

=> ALTER DATABASE DEFAULT SET SnmpTrapsEnabled = 1;

To define where Vertica send traps

Use the following SQL command, where Host_name and port identify the computer where SNMP resides, and CommunityString acts like a password to control Vertica's access to the server:

=> ALTER DATABASE DEFAULT SET SnmpTrapDestinationsList = 'host_name port CommunityString';

For example:

=> ALTER DATABASE DEFAULT SET SnmpTrapDestinationsList = 'localhost 162 public';

You can also specify multiple destinations by specifying a list of destinations, separated by commas:

=> ALTER DATABASE DEFAULT SET SnmpTrapDestinationsList = 'host_name1 port1 CommunityString1, hostname2 port2 CommunityString2';

Note

: Setting multiple destinations sends any SNMP trap notification to all destinations listed.

To define which events Vertica traps

Use the following SQL command, where Event_Name is one of the events in the list below the command:

=> ALTER DATABASE DEFAULT SET SnmpTrapEvents = 'Event_Name1, Even_Name2';

Low Disk Space
Read Only File System
Loss Of K Safety
Current Fault Tolerance at Critical Level
Too Many ROS Containers
Node State Change
Recovery Failure
Recovery Error
Recovery Lock Error
Recovery Projection Retrieval Error
Refresh Error
Tuple Mover Error
Stale Checkpoint
CRC Mismatch

Note

The above values are case sensitive.

The following example specifies two event names:

=> ALTER DATABASE DEFAULT SET SnmpTrapEvents = 'Low Disk Space, Recovery Failure';

4.4 - Verifying SNMP configuration

To create a set of test events that checks SNMP configuration:.

To create a set of test events that checks SNMP configuration:

Set up SNMP trap handlers to catch Vertica events.

Test your setup with the following command:

SELECT SNMP_TRAP_TEST();
    SNMP_TRAP_TEST
--------------------------
 Completed SNMP Trap Test
(1 row)

5 - Event reporting examples

The following example illustrates a Too Many ROS Containers event posted and cleared within vertica.log:.

Vertica.log

The following example illustrates a Too Many ROS Containers event posted and cleared within vertica.log:

08/14/15 15:07:59 thr:nameless:0x45a08940 [INFO] Event Posted: Event Code:4 Event Id:0 Event Severity: Warning [4] PostedTimestamp:
2015-08-14 15:07:59.253729 ExpirationTimestamp: 2015-08-14 15:08:29.253729
EventCodeDescription: Too Many ROS Containers ProblemDescription:
Too many ROS containers exist on this node. DatabaseName: TESTDB
Hostname: fc6-1.example.com
08/14/15 15:08:54 thr:Ageout Events:0x2aaab0015e70 [INFO] Event Cleared:
Event Code:4 Event Id:0 Event Severity: Warning [4] PostedTimestamp:
2015-08-14 15:07:59.253729 ExpirationTimestamp: 2015-08-14 15:08:53.012669
EventCodeDescription: Too Many ROS Containers ProblemDescription:
Too many ROS containers exist on this node. DatabaseName: TESTDB
Hostname: fc6-1.example.com

SNMP

The following example illustrates a Too Many ROS Containers event posted to SNMP:

Version: 1, type: TRAPREQUESTEnterprise OID: .1.3.6.1.4.1.31207.2.0.1
Trap agent: 72.0.0.0
Generic trap: ENTERPRISESPECIFIC (6)
Specific trap: 0
.1.3.6.1.4.1.31207.1.1 ---> 4
.1.3.6.1.4.1.31207.1.2 ---> 0
.1.3.6.1.4.1.31207.1.3 ---> 2008-08-14 11:30:26.121292
.1.3.6.1.4.1.31207.1.4 ---> 4
.1.3.6.1.4.1.31207.1.5 ---> 1
.1.3.6.1.4.1.31207.1.6 ---> site01
.1.3.6.1.4.1.31207.1.7 ---> suse10-1
.1.3.6.1.4.1.31207.1.8 ---> Too many ROS containers exist on this node.
.1.3.6.1.4.1.31207.1.9 ---> QATESTDB
.1.3.6.1.4.1.31207.1.10 ---> Too Many ROS Containers

Syslog

The following example illustrates a Too Many ROS Containers event posted and cleared within syslog:

Aug 14 15:07:59 fc6-1 vertica: Event Posted: Event Code:4 Event Id:0 Event Severity: Warning [4] PostedTimestamp: 2015-08-14 15:07:59.253729 ExpirationTimestamp:
2015-08-14 15:08:29.253729 EventCodeDescription: Too Many ROS Containers ProblemDescription:
Too many ROS containers exist on this node. DatabaseName: TESTDB Hostname: fc6-1.example.com
Aug 14 15:08:54 fc6-1 vertica: Event Cleared: Event Code:4 Event Id:0 Event Severity:
Warning [4] PostedTimestamp: 2015-08-14 15:07:59.253729 ExpirationTimestamp:
2015-08-14 15:08:53.012669 EventCodeDescription: Too Many ROS Containers ProblemDescription:
Too many ROS containers exist on this node. DatabaseName: TESTDB Hostname: fc6-1.example.com

Monitoring events

1 - Event logging mechanisms

2 - Event codes

3 - Event data

4 - Configuring event reporting

4.1 - Configuring reporting for syslog

Enabling Vertica to trap events for syslog

Defining events to trap for syslog

Defining the SyslogFacility to use for reporting

Trapping other event types

See also

4.2 - Configuring reporting for SNMP

See also

4.3 - Configuring event trapping for SNMP

To configure Vertica to trap events for SNMP

Note

To enable event trapping for SNMP

To define where Vertica send traps

Note

To define which events Vertica traps

Note

4.4 - Verifying SNMP configuration

5 - Event reporting examples

Vertica.log

SNMP

Syslog