This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Monitoring events

To help you monitor your database system, Vertica traps and logs significant events that affect database performance and functionality if you do not address their root causes.

To help you monitor your database system, Vertica traps and logs significant events that affect database performance and functionality if you do not address their root causes. This section describes where events are logged, the types of events that Vertica logs, how to respond to these events, the information that Vertica provides for these events, and how to configure event monitoring.

1 - Event logging mechanisms

Vertica posts events to the following mechanisms:.

Vertica posts events to the following mechanisms:

Mechanism Description
vertica.log All events are automatically posted to vertica.log. See Monitoring the Log Files.
ACTIVE_EVENTS This SQL system table provides information about all open events. See Using system tables and ACTIVE_EVENTS.
SNMP To post traps to SNMP, enable global reporting in addition to each individual event you want trapped. See Configuring event reporting.
Syslog To log events to syslog, enable event reporting for each individual event you want logged. See Configuring event reporting.

2 - Event codes

The following table lists the event codes that Vertica logs to the events system tables.

The following table lists the event codes that Vertica logs to the events system tables.

Event Code Event Code Description Description Action
0 Low Disk Space The database is running out of disk space or a disk is failing or there is a I/O hardware failure.

It is imperative that you add more disk space or replace the failing disk or hardware as soon as possible.

Check dmesg to see what caused the problem.

Also, use the DISK_RESOURCE_REJECTIONS system table to determine the types of disk space requests that are being rejected and the hosts on which they are being rejected. See Managing disk space for more information about low disk space.

1 Read Only File System The database does not have write access to the file system for the data or catalog paths. This can sometimes occur if Linux remounts a drive due to a kernel issue. Modify the privileges on the file system to give the database write access.
2 Loss Of K Safety

The database is no longer
K-Safe because there are insufficient nodes functioning within the cluster. Loss of
K-safety causes the database to shut down.

In a four-node cluster, for example, K-safety=1. If one node fails, the fault tolerance is at a critical level. If two nodes fail, the system loses K-safety.

If a system shuts down due to loss of K-safety, you need to recover the system. See Failure recovery.
3 Current Fault Tolerance at Critical Level One or more nodes in the cluster have failed. If the database loses one more node, it is no longer K-Safe and it shuts down. (For example, a four-node cluster is no longer K-safe if two nodes fail.) Restore any nodes that have failed or been shut down.
4 Too Many ROS Containers Heavy load activity on one or more projections can sometimes generate more ROS containers than the Tuple Mover can handle. Vertica allows up to 1024 ROS containers per projection before it rolls back additional load jobs and returns a ROS pushback error message.

Typically, the Tuple Mover catches up with pending mergeout requests and the Optimizer can resume executing queries on the affected tables (see Mergeout).

If this problem does not resolve quickly, or if it occurs frequently, it is probably related to insufficient RAM allocated to MAXMEMORY in the TM resource pool.

5 WOS Over Flow Deprecated
6 Node State Change The node state has changed. Check the status of the node.
7 Recovery Failure The database was not restored to a functional state after a hardware or software related failure. The reason for recovery failure can vary. See the event description for more information about your specific situation.
8 Recovery Error The database encountered an error while attempting to recover. If the number of recovery errors exceeds Max Tries, the Recovery Failure event is triggered. See Recovery Failure within this table. The reason for a recovery error can vary. See the event description for more information about your specific situation.
9 Recovery Lock Error

A recovering node could not obtain an S lock on the table.

If you have a continuous stream of COPY commands in progress, recovery might not be able to obtain this lock even after multiple re-tries.

Either momentarily stop the loads or pick a time when the cluster is not busy to restart the node and let recovery proceed.
10 Recovery Projection Retrieval Error Vertica was unable to retrieve information about a projection. The reason for a recovery projection retrieval error can vary. See the event description for more information about your specific situation.
11 Refresh Error The database encountered an error while attempting to refresh. The reason for a refresh error can vary. See the event description for more information about your specific situation.
12 Refresh Lock Error The database encountered a locking error during refresh. The reason for a refresh error can vary. See the event description for more information about your specific situation.
13 Tuple Mover Error Deprecated
14 Timer Service Task Error An error occurred in an internal scheduled task. Internal use only
15 Stale Checkpoint Deprecated
16 CRC Mismatch The Cyclic Redundancy Check returned an error or errors while fetching data. Review the vertica.log file or the SNMP trap utility to review the errors. For more information see Evaluating CRC errors.
20 Cluster Read-only Cluster cannot perform updates due to quorum loss and can only be queried. Restart failed nodes. See Recover from Read-Only Mode.

3 - Event data

To help you interpret and solve the issue that triggered an event, each event provides a variety of data, depending upon the event logging mechanism used.

To help you interpret and solve the issue that triggered an event, each event provides a variety of data, depending upon the event logging mechanism used.

The following table describes the event data and indicates where it is used.

vertica.log ACTIVE_EVENTS
(column names)
SNMP Syslog Description
N/A NODE_NAME N/A N/A The node where the event occurred.
Event Code EVENT_CODE Event Type Event Code A numeric ID that indicates the type of event. See Event Types in the previous table for a list of event type codes.
Event Id EVENT_ID Event OID Event Id A unique numeric ID that identifies the specific event.
Event Severity

EVENT_

SEVERITY

Event Severity Event Severity

The severity of the event from highest to lowest. These events are based on standard syslog severity types:

0 – Emergency

1 – Alert

2 – Critical

3 – Error

4 – Warning

5 – Notice

6 – Info

7 – Debug

PostedTimestamp

EVENT_

POSTED_

TIMESTAMP

N/A PostedTimestamp The year, month, day, and time the event was reported. Time is provided as military time.
ExpirationTimestamp

EVENT_

EXPIRATION

N/A ExpirationTimestamp The time at which this event expires. If the same event is posted again prior to its expiration time, this field gets updated to a new expiration time.
EventCodeDescription

EVENT_CODE_

DESCRIPTION

Description EventCodeDescription A brief description of the event and details pertinent to the specific situation.
ProblemDescription

EVENT_PROBLEM_

DESCRIPTION

Event Short Description ProblemDescription A generic description of the event.
N/A

REPORTING_

NODE

Node Name N/A The name of the node within the cluster that reported the event.
DatabaseName N/A Database Name DatabaseName The name of the database that is impacted by the event.
N/A N/A Host Name Hostname The name of the host within the cluster that reported the event.
N/A N/A Event Status N/A

The status of the event. It can be either:

1 – Open

2 – Clear

4 - Configuring event reporting

Event reporting is automatically configured for , and current events are automatically posted to the ACTIVE_EVENTS system table.

Event reporting is automatically configured for vertica.log, and current events are automatically posted to the ACTIVE_EVENTS system table. You can also configure Vertica to post events to syslog and SNMP.

4.1 - Configuring reporting for syslog

Syslog is a network-logging utility that issues, stores, and processes log messages.

Syslog is a network-logging utility that issues, stores, and processes log messages. It is a useful way to get heterogeneous data into a single data repository.

To log events to syslog, enable event reporting for each individual event you want logged. Messages are logged, by default, to /var/log/messages.

Configuring event reporting to syslog consists of:

  1. Enabling Vertica to trap events for syslog.

  2. Defining which events Vertica traps for syslog.

    Vertica strongly suggests that you trap the Stale Checkpoint event.

  3. Defining which syslog facility to use.

Enabling Vertica to trap events for syslog

To enable event trapping for syslog, issue the following SQL command:

=> ALTER DATABASE DEFAULT SET SyslogEnabled = 1;

To disable event trapping for syslog, issue the following SQL command:

=> ALTER DATABASE DEFAULT SET SyslogEnabled = 0;

Defining events to trap for syslog

To define events that generate a syslog entry, issue the following SQL command, one of the events described in the list below the command:

=> ALTER DATABASE DEFAULT SET SyslogEvents = 'events-list';

where events-list is a comma-delimited list of events, one or more of the following:

  • Low Disk Space

  • Read Only File System

  • Loss Of K Safety

  • Current Fault Tolerance at Critical Level

  • Too Many ROS Containers

  • Node State Change

  • Recovery Failure

  • Recovery Error

  • Recovery Lock Error

  • Recovery Projection Retrieval Error

  • Refresh Error

  • Refresh Lock Error

  • Tuple Mover Error

  • Timer Service Task Error

  • Stale Checkpoint

The following example generates a syslog entry for low disk space and recovery failure:

=> ALTER DATABASE DEFAULT SET SyslogEvents = 'Low Disk Space, Recovery Failure';

Defining the SyslogFacility to use for reporting

The syslog mechanism allows for several different general classifications of logging messages, called facilities. Typically, all authentication-related messages are logged with the auth (or authpriv) facility. These messages are intended to be secure and hidden from unauthorized eyes. Normal operational messages are logged with the daemon facility, which is the collector that receives and optionally stores messages.

The SyslogFacility directive allows all logging messages to be directed to a different facility than the default. When the directive is used, all logging is done using the specified facility, both authentication (secure) and otherwise.

To define which SyslogFacility Vertica uses, issue the following SQL command:

=> ALTER DATABASE DEFAULT SET SyslogFacility = 'Facility_Name';

Where the facility-level argument <Facility_Name> is one of the following:

  • auth

  • authpriv (Linux only)

  • cron

  • uucp (UUCP subsystem)

  • daemon

  • ftp (Linux only)

  • lpr (line printer subsystem)

  • mail (mail system)

  • news (network news subsystem)

  • user (default system)

  • local0 (local use 0)

  • local1 (local use 1)

  • local2 (local use 2)

  • local3 (local use 3)

  • local4 (local use 4)

  • local5 (local use 5)

  • local6 (local use 6)

  • local7 (local use 7)

Trapping other event types

To trap events other than the ones listed above, create a syslog notifier and allow it to trap the desired events with SET_DATA_COLLECTOR_NOTIFY_POLICY.

Events monitored by this notifier type are not logged to MONITORING_EVENTS nor vertica.log.

The following example creates a notifier that writes a message to syslog when the Data collector (DC) component LoginFailures updates:

  1. Enable syslog notifiers for the current database:

    => ALTER DATABASE DEFAULT SET SyslogEnabled = 1;
    
  2. Create and enable a syslog notifier v_syslog_notifier:

    => CREATE NOTIFIER v_syslog_notifier ACTION 'syslog'
        ENABLE
        MAXMEMORYSIZE '10M'
        IDENTIFIED BY 'f8b0278a-3282-4e1a-9c86-e0f3f042a971'
        PARAMETERS 'eventSeverity = 5';
    
  3. Configure the syslog notifier v_syslog_notifier for updates to the LoginFailures DC component with SET_DATA_COLLECTOR_NOTIFY_POLICY:

    => SELECT SET_DATA_COLLECTOR_NOTIFY_POLICY('LoginFailures','v_syslog_notifier', 'Login failed!', true);
    

    This notifier writes the following message to syslog (default location: /var/log/messages) when a user fails to authenticate as the user Bob:

    Apr 25 16:04:58
    vertica_host_01
    vertica:
        Event Posted:
            Event Code:21
            Event Id:0
            Event Severity: Notice [5]
            PostedTimestamp: 2022-04-25 16:04:58.083063
            ExpirationTimestamp: 2022-04-25 16:04:58.083063
            EventCodeDescription: Notifier
            ProblemDescription: (Login failed!)
        {
           "_db":"VMart",
           "_schema":"v_internal",
           "_table":"dc_login_failures",
           "_uuid":"f8b0278a-3282-4e1a-9c86-e0f3f042a971",
           "authentication_method":"Reject",
           "client_authentication_name":"default: Reject",
           "client_hostname":"::1",
           "client_label":"",
           "client_os_user_name":"dbadmin",
           "client_pid":523418,
           "client_version":"",
           "database_name":"dbadmin",
           "effective_protocol":"3.8",
           "node_name":"v_vmart_node0001",
           "reason":"REJECT",
           "requested_protocol":"3.8",
           "ssl_client_fingerprint":"",
           "ssl_client_subject":"",
           "time":"2022-04-25 16:04:58.082568-05",
           "user_name":"Bob"
        }#012
        DatabaseName: VMart
        Hostname: vertica_host_01
    

See also

4.2 - Configuring reporting for SNMP

Configuring event reporting for SNMP consists of:.

Configuring event reporting for SNMP consists of:

  1. Configuring Vertica to enable event trapping for SNMP as described below.

  2. Importing the Vertica Management Information Base (MIB) file into the SNMP monitoring device.

    The Vertica MIB file allows the SNMP trap receiver to understand the traps it receives from Vertica. This, in turn, allows you to configure the actions it takes when it receives traps.

    Vertica supports the SNMP V1 trap protocol, and it is located in /opt/vertica/sbin/VERTICA-MIB. See the documentation for your SNMP monitoring device for more information about importing MIB files.

  3. Configuring the SNMP trap receiver to handle traps from Vertica.

    SNMP trap receiver configuration differs greatly from vendor to vendor. As such, the directions presented here for configuring the SNMP trap receiver to handle traps from Vertica are generic.

    Vertica traps are single, generic traps that contain several fields of identifying information. These fields equate to the event data described in Monitoring events. However, the format used for the field names differs slightly. Under SNMP, the field names contain no spaces. Also, field names are pre-pended with “vert”. For example, Event Severity becomes vertEventSeverity.

    When configuring your trap receiver, be sure to use the same hostname, port, and community string you used to configure event trapping in Vertica.

    Examples of network management providers:

    • Network Node Manager i

    • IBM Tivoli

    • AdventNet

    • Net-SNMP (Open Source)

    • Nagios (Open Source)

    • Open NMS (Open Source)

See also

4.3 - Configuring event trapping for SNMP

The following events are trapped by default when you configure Vertica to trap events for SNMP:.

The following events are trapped by default when you configure Vertica to trap events for SNMP:

  • Low Disk Space

  • Read Only File System

  • Loss of K Safety

  • Current Fault Tolerance at Critical Level

  • Too Many ROS Containers

  • Node State Change

  • Recovery Failure

  • Stale Checkpoint

  • CRC Mismatch

To configure Vertica to trap events for SNMP

  1. Enable Vertica to trap events for SNMP.

  2. Define where Vertica sends the traps.

  3. Optionally redefine which SNMP events Vertica traps.

To enable event trapping for SNMP

Use the following SQL command:

=> ALTER DATABASE DEFAULT SET SnmpTrapsEnabled = 1;

To define where Vertica send traps

Use the following SQL command, where Host_name and port identify the computer where SNMP resides, and CommunityString acts like a password to control Vertica's access to the server:

=> ALTER DATABASE DEFAULT SET SnmpTrapDestinationsList = 'host_name port CommunityString';

For example:

=> ALTER DATABASE DEFAULT SET SnmpTrapDestinationsList = 'localhost 162 public';

You can also specify multiple destinations by specifying a list of destinations, separated by commas:

=> ALTER DATABASE DEFAULT SET SnmpTrapDestinationsList = 'host_name1 port1 CommunityString1, hostname2 port2 CommunityString2';

To define which events Vertica traps

Use the following SQL command, where Event_Name is one of the events in the list below the command:

=> ALTER DATABASE DEFAULT SET SnmpTrapEvents = 'Event_Name1, Even_Name2';
  • Low Disk Space

  • Read Only File System

  • Loss Of K Safety

  • Current Fault Tolerance at Critical Level

  • Too Many ROS Containers

  • Node State Change

  • Recovery Failure

  • Recovery Error

  • Recovery Lock Error

  • Recovery Projection Retrieval Error

  • Refresh Error

  • Tuple Mover Error

  • Stale Checkpoint

  • CRC Mismatch

The following example specifies two event names:

=> ALTER DATABASE DEFAULT SET SnmpTrapEvents = 'Low Disk Space, Recovery Failure';

4.4 - Verifying SNMP configuration

To create a set of test events that checks SNMP configuration:.

To create a set of test events that checks SNMP configuration:

  1. Set up SNMP trap handlers to catch Vertica events.

  2. Test your setup with the following command:

    SELECT SNMP_TRAP_TEST();
        SNMP_TRAP_TEST
    --------------------------
     Completed SNMP Trap Test
    (1 row)
    

5 - Event reporting examples

The following example illustrates a Too Many ROS Containers event posted and cleared within vertica.log:.

Vertica.log

The following example illustrates a Too Many ROS Containers event posted and cleared within vertica.log:

08/14/15 15:07:59 thr:nameless:0x45a08940 [INFO] Event Posted: Event Code:4 Event Id:0 Event Severity: Warning [4] PostedTimestamp:
2015-08-14 15:07:59.253729 ExpirationTimestamp: 2015-08-14 15:08:29.253729
EventCodeDescription: Too Many ROS Containers ProblemDescription:
Too many ROS containers exist on this node. DatabaseName: TESTDB
Hostname: fc6-1.example.com
08/14/15 15:08:54 thr:Ageout Events:0x2aaab0015e70 [INFO] Event Cleared:
Event Code:4 Event Id:0 Event Severity: Warning [4] PostedTimestamp:
2015-08-14 15:07:59.253729 ExpirationTimestamp: 2015-08-14 15:08:53.012669
EventCodeDescription: Too Many ROS Containers ProblemDescription:
Too many ROS containers exist on this node. DatabaseName: TESTDB
Hostname: fc6-1.example.com

SNMP

The following example illustrates a Too Many ROS Containers event posted to SNMP:

Version: 1, type: TRAPREQUESTEnterprise OID: .1.3.6.1.4.1.31207.2.0.1
Trap agent: 72.0.0.0
Generic trap: ENTERPRISESPECIFIC (6)
Specific trap: 0
.1.3.6.1.4.1.31207.1.1 ---> 4
.1.3.6.1.4.1.31207.1.2 ---> 0
.1.3.6.1.4.1.31207.1.3 ---> 2008-08-14 11:30:26.121292
.1.3.6.1.4.1.31207.1.4 ---> 4
.1.3.6.1.4.1.31207.1.5 ---> 1
.1.3.6.1.4.1.31207.1.6 ---> site01
.1.3.6.1.4.1.31207.1.7 ---> suse10-1
.1.3.6.1.4.1.31207.1.8 ---> Too many ROS containers exist on this node.
.1.3.6.1.4.1.31207.1.9 ---> QATESTDB
.1.3.6.1.4.1.31207.1.10 ---> Too Many ROS Containers

Syslog

The following example illustrates a Too Many ROS Containers event posted and cleared within syslog:

Aug 14 15:07:59 fc6-1 vertica: Event Posted: Event Code:4 Event Id:0 Event Severity: Warning [4] PostedTimestamp: 2015-08-14 15:07:59.253729 ExpirationTimestamp:
2015-08-14 15:08:29.253729 EventCodeDescription: Too Many ROS Containers ProblemDescription:
Too many ROS containers exist on this node. DatabaseName: TESTDB Hostname: fc6-1.example.com
Aug 14 15:08:54 fc6-1 vertica: Event Cleared: Event Code:4 Event Id:0 Event Severity:
Warning [4] PostedTimestamp: 2015-08-14 15:07:59.253729 ExpirationTimestamp:
2015-08-14 15:08:53.012669 EventCodeDescription: Too Many ROS Containers ProblemDescription:
Too many ROS containers exist on this node. DatabaseName: TESTDB Hostname: fc6-1.example.com