This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Monitoring events

To help you monitor your database system, Vertica traps and logs significant events that affect database performance and functionality if you do not address their root causes.

To help you monitor your database system, Vertica traps and logs significant events that affect database performance and functionality if you do not address their root causes. This section describes where events are logged, the types of events that Vertica logs, how to respond to these events, the information that Vertica provides for these events, and how to configure event monitoring.

1 - Event logging mechanisms

Vertica posts events to the following mechanisms:.

Vertica posts events to the following mechanisms:

Mechanism Description
vertica.log All events are automatically posted to vertica.log. See Monitoring the Log Files.
ACTIVE_EVENTS This SQL system table provides information about all open events. See Using system tables and ACTIVE_EVENTS.
SNMP To post traps to SNMP, enable global reporting in addition to each individual event you want trapped. See Configuring event reporting.
Syslog To log events to syslog, enable event reporting for each individual event you want logged. See Configuring event reporting.

2 - Event codes

The following table lists the event codes that Vertica logs to the events system tables.

The following table lists the event codes that Vertica logs to the events system tables.

Event Code Severity Event Code Description Description/Action
0 Warning Low Disk Space

Warning indicates one of the following issues:

  • Lack of disk space for database

  • Disk failure

  • I/O hardware failure

Action: Add more disk space, or replace the failing disk or hardware as soon as possible.

Check dmesg to see what caused the problem.

Also, use the DISK_RESOURCE_REJECTIONS system table to determine the types of disk space requests that are being rejected and the hosts where they are rejected. See Managing disk space for details.

1 Warning Read Only File System

Database lacks write access to the file system for data or catalog paths. This sometimes occurs if Linux remounts a drive due to a kernel issue.

Action: Give the database write access.

2 Emergency Loss Of K Safety

The database is no longer K-safe because insufficient nodes are functioning within the cluster. Loss of K-safety causes the database to shut down.

Action: Recover the system.

3 Critical Current Fault Tolerance at Critical Level

One or more nodes in the cluster failed. If the database loses one more node, it will no longer be K-safe and shut down.

Action: Restore nodes that failed or shut down.

4 Warning Too Many ROS Containers

Heavy load activity on one or more projections sometimes generates more ROS containers than the Tuple Mover can handle. Vertica allows up to 1024 ROS containers per projection before it rolls back additional load jobs and returns a ROS pushback error message.

Action: The Tuple Mover typically catches up with pending mergeout requests and the Optimizer can resume executing queries on affected tables (see Mergeout).

If this problem does not resolve quickly, or if it occurs frequently, it is probably related to insufficient RAM allocated to MAXMEMORY in the TM resource pool.

5 Informational WOS Over Flow Deprecated
6 Informational Node State Change

The node state changed.

Action: Check node status.

7 Warning Recovery Failure

Database was not restored to a functional state after a hardware or software related failure.

Action: Reasons for the warning can vary, see the event description for details.

8 Warning Recovery Error

Database encountered an error while attempting to recover. If the number of recovery errors exceeds Max Tries, the Recovery Failure event is triggered.

Action: Reasons for the warning can vary, see the event description for details.

9 n/a Recovery Lock Error Unused
10 n/a Recovery Projection Retrieval Error Unused
11 Warning Refresh Error

The database encountered an error while attempting to refresh.

Action: Reasons for the warning can vary, see the event description for details.

12 n/a Refresh Lock Error Unused
13 n/a Tuple Mover Error Deprecated
14 Warning Timer Service Task Error

Error occurred in an internal scheduled task.

Action: None, internal use only

15 Warning Stale Checkpoint Deprecated
16 Notice License Size Compliance

Database size exceeds license size allowance.

Action: See Monitoring database size for license compliance.

17 Notice License Term Compliance

Database is not in compliance with your Vertica license.

Action: Check compliance status with GET_COMPLIANCE_STATUS.

18 Error CRC Mismatch

Cyclic Redundancy Check (CRC) returned an error or errors while fetching data.

Action: Review the vertica.log file or the SNMP trap utility to evaluate CRC errors.

19 Critical/Warning Catalog Sync Exceeds Durability Threshold Severity: Critical when exceeding hard limit, Warning when exceeding soft limit.
20 Critical Cluster Read-only

Eon Mode) Quorum or primary shard coverage loss forced database into read-only mode.

Action: Restart down nodes.

3 - Event data

To help you interpret and solve the issue that triggered an event, each event provides a variety of data, depending upon the event logging mechanism used.

To help you interpret and solve the issue that triggered an event, each event provides a variety of data, depending upon the event logging mechanism used.

The following table describes the event data and indicates where it is used.

vertica.log ACTIVE_EVENTS
(column names)
SNMP Syslog Description
N/A NODE_NAME N/A N/A The node where the event occurred.
Event Code EVENT_CODE Event Type Event Code A numeric ID that indicates the type of event. See Event Types in the previous table for a list of event type codes.
Event Id EVENT_ID Event OID Event Id A unique numeric ID that identifies the specific event.
Event Severity

EVENT_

SEVERITY

Event Severity Event Severity

The severity of the event from highest to lowest. These events are based on standard syslog severity types:

0 – Emergency

1 – Alert

2 – Critical

3 – Error

4 – Warning

5 – Notice

6 – Info

7 – Debug

PostedTimestamp

EVENT_

POSTED_

TIMESTAMP

N/A PostedTimestamp The year, month, day, and time the event was reported. Time is provided as military time.
ExpirationTimestamp

EVENT_

EXPIRATION

N/A ExpirationTimestamp The time at which this event expires. If the same event is posted again prior to its expiration time, this field gets updated to a new expiration time.
EventCodeDescription

EVENT_CODE_

DESCRIPTION

Description EventCodeDescription A brief description of the event and details pertinent to the specific situation.
ProblemDescription

EVENT_PROBLEM_

DESCRIPTION

Event Short Description ProblemDescription A generic description of the event.
N/A

REPORTING_

NODE

Node Name N/A The name of the node within the cluster that reported the event.
DatabaseName N/A Database Name DatabaseName The name of the database that is impacted by the event.
N/A N/A Host Name Hostname The name of the host within the cluster that reported the event.
N/A N/A Event Status N/A

The status of the event. It can be either:

1 – Open

2 – Clear

4 - Configuring event reporting

Event reporting is automatically configured for .log, and current events are automatically posted to the ACTIVE_EVENTS system table.

Event reporting is automatically configured for vertica.log, and current events are automatically posted to the ACTIVE_EVENTS system table. You can also configure Vertica to post events to syslog and SNMP.

4.1 - Configuring reporting for the simple notification service (SNS)

You can monitor (DC) components and send new rows to Amazon Web Services (AWS) Simple Notification Service (SNS).

You can monitor Data collector (DC) components and send new rows to Amazon Web Services (AWS) Simple Notification Service (SNS). SNS notifiers are configured with database-level SNS parameters.

Minimally, to send DC data to SNS topics, you must configure and specify the following:

  • An SNS notifier

  • A Simple Notification Service (SNS) topic

  • An AWS region (SNSRegion)

  • An SNS endpoint (SNSEndpoint) (FIPS only)

  • Credentials to authenticate to AWS

  • Information about how to handle HTTPS

Creating an SNS notifier

To create an SNS notifier, use CREATE NOTIFIER, specifying sns as the ACTION.

SNS configuration

SNS topics and their subscribers should be configured with AWS. For details, see the AWS documentation:

AWS region and endpoint

In most use cases, you only need to set the AWS region with the SNSRegion parameter; if the SNSEndpoint is set to an empty string (default) and the SNSRegion is set, Vertica automatically finds and uses the appropriate endpoint:

=> ALTER DATABASE DEFAULT SET SNSRegion='us-east-1';
=> ALTER DATABASE DEFAULT SET SNSEndpoint='';

If you want to specify an endpoint, its region must match the region specified in SNSRegion:

=> ALTER DATABASE DEFAULT SET SNSEndpoint='sns.us-east-1.amazonaws.com';

If you use FIPS, you should manually set SNSEndpoint to a FIPS-compliant endpoint:

=> ALTER DATABASE DEFAULT SET SNSEndpoint='sns-fips.us-east-1.amazonaws.com';

AWS credentials

AWS credentials can be set with SNSAuth, which takes an access key and secret access key in the following format:

access_key:secret_access_key

To set SNSAuth:

=> ALTER DATABASE DEFAULT SET SNSAuth='VNDDNVOPIUQF917O5PDB:+mcnVONVIbjOnf1ekNis7nm3mE83u9fjdwmlq36Z';

Handling HTTPS

The SNSEnableHttps parameter determines whether the SNS notifier uses TLS to secure the connection between Vertica and AWS. HTTPS is enabled by default and can be manually enabled with:

=> ALTER DATABASE DEFAULT SET SNSEnableHttps=1;

If SNSEnableHttps is enabled, depending on your configuration, you might need to specify a custom set of CA bundles with SNSCAFile or SNSCAPath. Amazon root certificates are typically contained in the set of trusted CA certificates already, so you should not have to set these parameters in most environments:

=> ALTER DATABASE DEFAULT SET SNSCAFile='path/to/ca/bundle.pem'
=> ALTER DATABASE DEFAULT SET SNSCAPath='path/to/ca/bundles/'

HTTPS can be manually disabled with:

=> ALTER DATABASE DEFAULT SET SNSEnableHttps=0;

Examples

The following example creates an SNS topic, subscribes to it with an SQS queue, and then configures an SNS notifier for the DC component LoginFailures:

  1. Create an SNS topic.

  2. Create an SQS queue.

  3. Subscribe the SQS queue to the SNS topic.

  4. Set SNSAuth with your AWS credentials:

    => ALTER DATABASE DEFAULT SET SNSAuth='VNDDNVOPIUQF917O5PDB:+mcnVONVIbjOnf1ekNis7nm3mE83u9fjdwmlq36Z';
    
  5. Set SNSRegion:

    => ALTER DATABASE DEFAULT SET SNSRegion='us-east-1'
    
  6. Enable HTTPS:

    => ALTER DATABASE DEFAULT SET SNSEnableHttps=1;
    
  7. Create an SNS notifier:

    => CREATE NOTIFIER v_sns_notifier ACTION 'sns' MAXPAYLOAD '256K' MAXMEMORYSIZE '10M' CHECK COMMITTED;
    
  8. Verify that the SNS notifier, SNS topic, and SQS queue are properly configured:

    1. Manually send a message from the notifier to the SNS topic with NOTIFY:

      => SELECT NOTIFY('test message', 'v_sns_notifier', 'arn:aws:sns:us-east-1:123456789012:MyTopic')
      
    2. Poll the SQS queue for your message.

  9. Attach the SNS notifier to the LoginFailures component with SET_DATA_COLLECTOR_NOTIFY_POLICY:

    => SELECT SET_DATA_COLLECTOR_NOTIFY_POLICY('LoginFailures', 'v_sns_notifier', 'Login failed!', true)
    

To disable an SNS notifier:

=> SELECT SET_DATA_COLLECTOR_NOTIFY_POLICY('LoginFailures', 'v_sns_notifier', 'Login failed!', false)

4.2 - Configuring reporting for syslog

Syslog is a network-logging utility that issues, stores, and processes log messages.

Syslog is a network-logging utility that issues, stores, and processes log messages. It is a useful way to get heterogeneous data into a single data repository.

To log events to syslog, enable event reporting for each individual event you want logged. Messages are logged, by default, to /var/log/messages.

Configuring event reporting to syslog consists of:

  1. Enabling Vertica to trap events for syslog.

  2. Defining which events Vertica traps for syslog.

    Vertica strongly suggests that you trap the Stale Checkpoint event.

  3. Defining which syslog facility to use.

Enabling Vertica to trap events for syslog

To enable event trapping for syslog, issue the following SQL command:

=> ALTER DATABASE DEFAULT SET SyslogEnabled = 1;

To disable event trapping for syslog, issue the following SQL command:

=> ALTER DATABASE DEFAULT SET SyslogEnabled = 0;

Defining events to trap for syslog

To define events that generate a syslog entry, issue the following SQL command, one of the events described in the list below the command:

=> ALTER DATABASE DEFAULT SET SyslogEvents = 'events-list';

where events-list is a comma-delimited list of events, one or more of the following:

  • Low Disk Space

  • Read Only File System

  • Loss Of K Safety

  • Current Fault Tolerance at Critical Level

  • Too Many ROS Containers

  • Node State Change

  • Recovery Failure

  • Recovery Error

  • Recovery Lock Error

  • Recovery Projection Retrieval Error

  • Refresh Error

  • Refresh Lock Error

  • Tuple Mover Error

  • Timer Service Task Error

  • Stale Checkpoint

The following example generates a syslog entry for low disk space and recovery failure:

=> ALTER DATABASE DEFAULT SET SyslogEvents = 'Low Disk Space, Recovery Failure';

Defining the SyslogFacility to use for reporting

The syslog mechanism allows for several different general classifications of logging messages, called facilities. Typically, all authentication-related messages are logged with the auth (or authpriv) facility. These messages are intended to be secure and hidden from unauthorized eyes. Normal operational messages are logged with the daemon facility, which is the collector that receives and optionally stores messages.

The SyslogFacility directive allows all logging messages to be directed to a different facility than the default. When the directive is used, all logging is done using the specified facility, both authentication (secure) and otherwise.

To define which SyslogFacility Vertica uses, issue the following SQL command:

=> ALTER DATABASE DEFAULT SET SyslogFacility = 'Facility_Name';

Where the facility-level argument <Facility_Name> is one of the following:

  • auth

  • authpriv (Linux only)

  • cron

  • uucp (UUCP subsystem)

  • daemon

  • ftp (Linux only)

  • lpr (line printer subsystem)

  • mail (mail system)

  • news (network news subsystem)

  • user (default system)

  • local0 (local use 0)

  • local1 (local use 1)

  • local2 (local use 2)

  • local3 (local use 3)

  • local4 (local use 4)

  • local5 (local use 5)

  • local6 (local use 6)

  • local7 (local use 7)

Trapping other event types

To trap events other than the ones listed above, create a syslog notifier and allow it to trap the desired events with SET_DATA_COLLECTOR_NOTIFY_POLICY.

Events monitored by this notifier type are not logged to MONITORING_EVENTS nor vertica.log.

The following example creates a notifier that writes a message to syslog when the Data collector (DC) component LoginFailures updates:

  1. Enable syslog notifiers for the current database:

    => ALTER DATABASE DEFAULT SET SyslogEnabled = 1;
    
  2. Create and enable a syslog notifier v_syslog_notifier:

    => CREATE NOTIFIER v_syslog_notifier ACTION 'syslog'
        ENABLE
        MAXMEMORYSIZE '10M'
        IDENTIFIED BY 'f8b0278a-3282-4e1a-9c86-e0f3f042a971'
        PARAMETERS 'eventSeverity = 5';
    
  3. Configure the syslog notifier v_syslog_notifier for updates to the LoginFailures DC component with SET_DATA_COLLECTOR_NOTIFY_POLICY:

    => SELECT SET_DATA_COLLECTOR_NOTIFY_POLICY('LoginFailures','v_syslog_notifier', 'Login failed!', true);
    

    This notifier writes the following message to syslog (default location: /var/log/messages) when a user fails to authenticate as the user Bob:

    Apr 25 16:04:58
    vertica_host_01
    vertica:
        Event Posted:
            Event Code:21
            Event Id:0
            Event Severity: Notice [5]
            PostedTimestamp: 2022-04-25 16:04:58.083063
            ExpirationTimestamp: 2022-04-25 16:04:58.083063
            EventCodeDescription: Notifier
            ProblemDescription: (Login failed!)
        {
           "_db":"VMart",
           "_schema":"v_internal",
           "_table":"dc_login_failures",
           "_uuid":"f8b0278a-3282-4e1a-9c86-e0f3f042a971",
           "authentication_method":"Reject",
           "client_authentication_name":"default: Reject",
           "client_hostname":"::1",
           "client_label":"",
           "client_os_user_name":"dbadmin",
           "client_pid":523418,
           "client_version":"",
           "database_name":"dbadmin",
           "effective_protocol":"3.8",
           "node_name":"v_vmart_node0001",
           "reason":"REJECT",
           "requested_protocol":"3.8",
           "ssl_client_fingerprint":"",
           "ssl_client_subject":"",
           "time":"2022-04-25 16:04:58.082568-05",
           "user_name":"Bob"
        }#012
        DatabaseName: VMart
        Hostname: vertica_host_01
    

See also

Event reporting examples

4.3 - Configuring reporting for SNMP

Configuring event reporting for SNMP consists of:.

Configuring event reporting for SNMP consists of:

  1. Configuring Vertica to enable event trapping for SNMP as described below.

  2. Importing the Vertica Management Information Base (MIB) file into the SNMP monitoring device.

    The Vertica MIB file allows the SNMP trap receiver to understand the traps it receives from Vertica. This, in turn, allows you to configure the actions it takes when it receives traps.

    Vertica supports the SNMP V1 trap protocol, and it is located in /opt/vertica/sbin/VERTICA-MIB. See the documentation for your SNMP monitoring device for more information about importing MIB files.

  3. Configuring the SNMP trap receiver to handle traps from Vertica.

    SNMP trap receiver configuration differs greatly from vendor to vendor. As such, the directions presented here for configuring the SNMP trap receiver to handle traps from Vertica are generic.

    Vertica traps are single, generic traps that contain several fields of identifying information. These fields equate to the event data described in Monitoring events. However, the format used for the field names differs slightly. Under SNMP, the field names contain no spaces. Also, field names are pre-pended with “vert”. For example, Event Severity becomes vertEventSeverity.

    When configuring your trap receiver, be sure to use the same hostname, port, and community string you used to configure event trapping in Vertica.

    Examples of network management providers:

    • Network Node Manager i

    • IBM Tivoli

    • AdventNet

    • Net-SNMP (Open Source)

    • Nagios (Open Source)

    • Open NMS (Open Source)

4.4 - Configuring event trapping for SNMP

The following events are trapped by default when you configure Vertica to trap events for SNMP:.

The following events are trapped by default when you configure Vertica to trap events for SNMP:

  • Low Disk Space

  • Read Only File System

  • Loss of K Safety

  • Current Fault Tolerance at Critical Level

  • Too Many ROS Containers

  • Node State Change

  • Recovery Failure

  • Stale Checkpoint

  • CRC Mismatch

To configure Vertica to trap events for SNMP

  1. Enable Vertica to trap events for SNMP.

  2. Define where Vertica sends the traps.

  3. Optionally redefine which SNMP events Vertica traps.

To enable event trapping for SNMP

Use the following SQL command:

=> ALTER DATABASE DEFAULT SET SnmpTrapsEnabled = 1;

To define where Vertica send traps

Use the following SQL command, where Host_name and port identify the computer where SNMP resides, and CommunityString acts like a password to control Vertica's access to the server:

=> ALTER DATABASE DEFAULT SET SnmpTrapDestinationsList = 'host_name port CommunityString';

For example:

=> ALTER DATABASE DEFAULT SET SnmpTrapDestinationsList = 'localhost 162 public';

You can also specify multiple destinations by specifying a list of destinations, separated by commas:

=> ALTER DATABASE DEFAULT SET SnmpTrapDestinationsList = 'host_name1 port1 CommunityString1, hostname2 port2 CommunityString2';

To define which events Vertica traps

Use the following SQL command, where Event_Name is one of the events in the list below the command:

=> ALTER DATABASE DEFAULT SET SnmpTrapEvents = 'Event_Name1, Even_Name2';
  • Low Disk Space

  • Read Only File System

  • Loss Of K Safety

  • Current Fault Tolerance at Critical Level

  • Too Many ROS Containers

  • Node State Change

  • Recovery Failure

  • Recovery Error

  • Recovery Lock Error

  • Recovery Projection Retrieval Error

  • Refresh Error

  • Tuple Mover Error

  • Stale Checkpoint

  • CRC Mismatch

The following example specifies two event names:

=> ALTER DATABASE DEFAULT SET SnmpTrapEvents = 'Low Disk Space, Recovery Failure';

4.5 - Verifying SNMP configuration

To create a set of test events that checks SNMP configuration:.

To create a set of test events that checks SNMP configuration:

  1. Set up SNMP trap handlers to catch Vertica events.

  2. Test your setup with the following command:

    SELECT SNMP_TRAP_TEST();
        SNMP_TRAP_TEST
    --------------------------
     Completed SNMP Trap Test
    (1 row)
    

5 - Event reporting examples

The following example illustrates a Too Many ROS Containers event posted and cleared within vertica.log:.

Vertica.log

The following example illustrates a Too Many ROS Containers event posted and cleared within vertica.log:

08/14/15 15:07:59 thr:nameless:0x45a08940 [INFO] Event Posted: Event Code:4 Event Id:0 Event Severity: Warning [4] PostedTimestamp:
2015-08-14 15:07:59.253729 ExpirationTimestamp: 2015-08-14 15:08:29.253729
EventCodeDescription: Too Many ROS Containers ProblemDescription:
Too many ROS containers exist on this node. DatabaseName: TESTDB
Hostname: fc6-1.example.com
08/14/15 15:08:54 thr:Ageout Events:0x2aaab0015e70 [INFO] Event Cleared:
Event Code:4 Event Id:0 Event Severity: Warning [4] PostedTimestamp:
2015-08-14 15:07:59.253729 ExpirationTimestamp: 2015-08-14 15:08:53.012669
EventCodeDescription: Too Many ROS Containers ProblemDescription:
Too many ROS containers exist on this node. DatabaseName: TESTDB
Hostname: fc6-1.example.com

SNMP

The following example illustrates a Too Many ROS Containers event posted to SNMP:

Version: 1, type: TRAPREQUESTEnterprise OID: .1.3.6.1.4.1.31207.2.0.1
Trap agent: 72.0.0.0
Generic trap: ENTERPRISESPECIFIC (6)
Specific trap: 0
.1.3.6.1.4.1.31207.1.1 ---> 4
.1.3.6.1.4.1.31207.1.2 ---> 0
.1.3.6.1.4.1.31207.1.3 ---> 2008-08-14 11:30:26.121292
.1.3.6.1.4.1.31207.1.4 ---> 4
.1.3.6.1.4.1.31207.1.5 ---> 1
.1.3.6.1.4.1.31207.1.6 ---> site01
.1.3.6.1.4.1.31207.1.7 ---> suse10-1
.1.3.6.1.4.1.31207.1.8 ---> Too many ROS containers exist on this node.
.1.3.6.1.4.1.31207.1.9 ---> QATESTDB
.1.3.6.1.4.1.31207.1.10 ---> Too Many ROS Containers

Syslog

The following example illustrates a Too Many ROS Containers event posted and cleared within syslog:

Aug 14 15:07:59 fc6-1 vertica: Event Posted: Event Code:4 Event Id:0 Event Severity: Warning [4] PostedTimestamp: 2015-08-14 15:07:59.253729 ExpirationTimestamp:
2015-08-14 15:08:29.253729 EventCodeDescription: Too Many ROS Containers ProblemDescription:
Too many ROS containers exist on this node. DatabaseName: TESTDB Hostname: fc6-1.example.com
Aug 14 15:08:54 fc6-1 vertica: Event Cleared: Event Code:4 Event Id:0 Event Severity:
Warning [4] PostedTimestamp: 2015-08-14 15:07:59.253729 ExpirationTimestamp:
2015-08-14 15:08:53.012669 EventCodeDescription: Too Many ROS Containers ProblemDescription:
Too many ROS containers exist on this node. DatabaseName: TESTDB Hostname: fc6-1.example.com