To help you monitor your database system, Vertica traps and logs significant events that affect database performance and functionality if you do not address their root causes. This section describes where events are logged, the types of events that Vertica logs, how to respond to these events, the information that Vertica provides for these events, and how to configure event monitoring.
This is the multi-page printable view of this section. Click here to print.
Monitoring events
- 1: Event logging mechanisms
- 2: Event codes
- 3: Event data
- 4: Configuring event reporting
- 4.1: Configuring reporting for syslog
- 4.2: Configuring reporting for SNMP
- 4.3: Configuring event trapping for SNMP
- 4.4: Verifying SNMP configuration
- 5: Event reporting examples
1 - Event logging mechanisms
Vertica posts events to the following mechanisms:
Mechanism | Description |
---|---|
vertica.log |
All events are automatically posted to vertica.log . See Monitoring the Log Files. |
ACTIVE_EVENTS |
This SQL system table provides information about all open events. See Using system tables and ACTIVE_EVENTS. |
SNMP |
To post traps to SNMP, enable global reporting in addition to each individual event you want trapped. See Configuring event reporting. |
Syslog |
To log events to syslog, enable event reporting for each individual event you want logged. See Configuring event reporting. |
2 - Event codes
The following table lists the event codes that Vertica logs to the events system tables.
Event Code | Event Code Description | Description | Action |
---|---|---|---|
0 | Low Disk Space | The database is running out of disk space or a disk is failing or there is a I/O hardware failure. |
It is imperative that you add more disk space or replace the failing disk or hardware as soon as possible. Check Also, use the DISK_RESOURCE_REJECTIONS system table to determine the types of disk space requests that are being rejected and the hosts on which they are being rejected. See Managing disk space for more information about low disk space. |
1 | Read Only File System | The database does not have write access to the file system for the data or catalog paths. This can sometimes occur if Linux remounts a drive due to a kernel issue. | Modify the privileges on the file system to give the database write access. |
2 | Loss Of K Safety |
The database is no longer In a four-node cluster, for example, K-safety=1. If one node fails, the fault tolerance is at a critical level. If two nodes fail, the system loses K-safety. |
If a system shuts down due to loss of K-safety, you need to recover the system. See Failure recovery. |
3 | Current Fault Tolerance at Critical Level | One or more nodes in the cluster have failed. If the database loses one more node, it is no longer K-Safe and it shuts down. (For example, a four-node cluster is no longer K-safe if two nodes fail.) | Restore any nodes that have failed or been shut down. |
4 | Too Many ROS Containers | Heavy load activity on one or more projections can sometimes generate more ROS containers than the Tuple Mover can handle. Vertica allows up to 1024 ROS containers per projection before it rolls back additional load jobs and returns a ROS pushback error message. |
Typically, the Tuple Mover catches up with pending mergeout requests and the Optimizer can resume executing queries on the affected tables (see Mergeout). If this problem does not resolve quickly, or if it occurs frequently, it is probably related to insufficient RAM allocated to MAXMEMORY in the TM resource pool. |
5 | WOS Over Flow | Deprecated | |
6 | Node State Change | The node state has changed. | Check the status of the node. |
7 | Recovery Failure | The database was not restored to a functional state after a hardware or software related failure. | The reason for recovery failure can vary. See the event description for more information about your specific situation. |
8 | Recovery Error | The database encountered an error while attempting to recover. If the number of recovery errors exceeds Max Tries, the Recovery Failure event is triggered. See Recovery Failure within this table. | The reason for a recovery error can vary. See the event description for more information about your specific situation. |
9 | Recovery Lock Error |
A recovering node could not obtain an S lock on the table. If you have a continuous stream of COPY commands in progress, recovery might not be able to obtain this lock even after multiple re-tries. |
Either momentarily stop the loads or pick a time when the cluster is not busy to restart the node and let recovery proceed. |
10 | Recovery Projection Retrieval Error | Vertica was unable to retrieve information about a projection. | The reason for a recovery projection retrieval error can vary. See the event description for more information about your specific situation. |
11 | Refresh Error | The database encountered an error while attempting to refresh. | The reason for a refresh error can vary. See the event description for more information about your specific situation. |
12 | Refresh Lock Error | The database encountered a locking error during refresh. | The reason for a refresh error can vary. See the event description for more information about your specific situation. |
13 | Tuple Mover Error | Deprecated | |
14 | Timer Service Task Error | An error occurred in an internal scheduled task. | Internal use only |
15 | Stale Checkpoint | Deprecated | |
16 | CRC Mismatch | The Cyclic Redundancy Check returned an error or errors while fetching data. |
Review the vertica.log file or the SNMP trap utility to review the errors. For more information see Evaluating CRC errors. |
20 | Cluster Read-only | Cluster cannot perform updates due to quorum loss and can only be queried. | Restart failed nodes. See Recover from Read-Only Mode. |
3 - Event data
To help you interpret and solve the issue that triggered an event, each event provides a variety of data, depending upon the event logging mechanism used.
The following table describes the event data and indicates where it is used.
vertica.log |
ACTIVE_EVENTS (column names) |
SNMP | Syslog | Description |
---|---|---|---|---|
N/A | NODE_NAME | N/A | N/A | The node where the event occurred. |
Event Code | EVENT_CODE | Event Type | Event Code | A numeric ID that indicates the type of event. See Event Types in the previous table for a list of event type codes. |
Event Id | EVENT_ID | Event OID | Event Id | A unique numeric ID that identifies the specific event. |
Event Severity |
EVENT_ SEVERITY |
Event Severity | Event Severity |
The severity of the event from highest to lowest. These events are based on standard syslog severity types:
|
PostedTimestamp |
EVENT_ POSTED_ TIMESTAMP |
N/A | PostedTimestamp | The year, month, day, and time the event was reported. Time is provided as military time. |
ExpirationTimestamp |
EVENT_ EXPIRATION |
N/A | ExpirationTimestamp | The time at which this event expires. If the same event is posted again prior to its expiration time, this field gets updated to a new expiration time. |
EventCodeDescription |
EVENT_CODE_ DESCRIPTION |
Description | EventCodeDescription | A brief description of the event and details pertinent to the specific situation. |
ProblemDescription |
EVENT_PROBLEM_ DESCRIPTION |
Event Short Description | ProblemDescription | A generic description of the event. |
N/A |
REPORTING_ NODE |
Node Name | N/A | The name of the node within the cluster that reported the event. |
DatabaseName | N/A | Database Name | DatabaseName | The name of the database that is impacted by the event. |
N/A | N/A | Host Name | Hostname | The name of the host within the cluster that reported the event. |
N/A | N/A | Event Status | N/A |
The status of the event. It can be either:
|
4 - Configuring event reporting
Event reporting is automatically configured for vertica.log
, and current events are automatically posted to the ACTIVE_EVENTS system table. You can also configure Vertica to post events to syslog and SNMP.
4.1 - Configuring reporting for syslog
Syslog is a network-logging utility that issues, stores, and processes log messages. It is a useful way to get heterogeneous data into a single data repository.
To log events to syslog, enable event reporting for each individual event you want logged. Messages are logged, by default, to /var/log/messages
.
Configuring event reporting to syslog consists of:
-
Enabling Vertica to trap events for syslog.
-
Defining which events Vertica traps for syslog.
Vertica strongly suggests that you trap the Stale Checkpoint event.
-
Defining which syslog facility to use.
Enabling Vertica to trap events for syslog
To enable event trapping for syslog, issue the following SQL command:
=> ALTER DATABASE DEFAULT SET SyslogEnabled = 1;
To disable event trapping for syslog, issue the following SQL command:
=> ALTER DATABASE DEFAULT SET SyslogEnabled = 0;
Defining events to trap for syslog
To define events that generate a syslog entry, issue the following SQL command, one of the events described in the list below the command:
=> ALTER DATABASE DEFAULT SET SyslogEvents = 'events-list';
where events-list
is a comma-delimited list of events, one or more of the following:
-
Low Disk Space
-
Read Only File System
-
Loss Of K Safety
-
Current Fault Tolerance at Critical Level
-
Too Many ROS Containers
-
Node State Change
-
Recovery Failure
-
Recovery Error
-
Recovery Lock Error
-
Recovery Projection Retrieval Error
-
Refresh Error
-
Refresh Lock Error
-
Tuple Mover Error
-
Timer Service Task Error
-
Stale Checkpoint
The following example generates a syslog entry for low disk space and recovery failure:
=> ALTER DATABASE DEFAULT SET SyslogEvents = 'Low Disk Space, Recovery Failure';
Defining the SyslogFacility to use for reporting
The syslog mechanism allows for several different general classifications of logging messages, called facilities. Typically, all authentication-related messages are logged with the auth
(or authpriv
) facility. These messages are intended to be secure and hidden from unauthorized eyes. Normal operational messages are logged with the daemon
facility, which is the collector that receives and optionally stores messages.
The SyslogFacility directive allows all logging messages to be directed to a different facility than the default. When the directive is used, all logging is done using the specified facility, both authentication (secure) and otherwise.
To define which SyslogFacility Vertica uses, issue the following SQL command:
=> ALTER DATABASE DEFAULT SET SyslogFacility = 'Facility_Name';
Where the facility-level argument <Facility_Name>
is one of the following:
-
auth
-
authpriv (Linux only)
-
cron
-
uucp (UUCP subsystem)
-
daemon
-
ftp (Linux only)
-
lpr (line printer subsystem)
-
mail (mail system)
-
news (network news subsystem)
-
user (default system)
-
local0 (local use 0)
-
local1 (local use 1)
-
local2 (local use 2)
-
local3 (local use 3)
-
local4 (local use 4)
-
local5 (local use 5)
-
local6 (local use 6)
-
local7 (local use 7)
Trapping other event types
To trap events other than the ones listed above, create a syslog notifier and allow it to trap the desired events with SET_DATA_COLLECTOR_NOTIFY_POLICY.
Events monitored by this notifier type are not logged to MONITORING_EVENTS nor vertica.log
.
The following example creates a notifier that writes a message to syslog when the Data collector (DC) component LoginFailures
updates:
-
Enable syslog notifiers for the current database:
=> ALTER DATABASE DEFAULT SET SyslogEnabled = 1;
-
Create and enable a syslog notifier
v_syslog_notifier
:=> CREATE NOTIFIER v_syslog_notifier ACTION 'syslog' ENABLE MAXMEMORYSIZE '10M' IDENTIFIED BY 'f8b0278a-3282-4e1a-9c86-e0f3f042a971' PARAMETERS 'eventSeverity = 5';
-
Configure the syslog notifier
v_syslog_notifier
for updates to theLoginFailures
DC component with SET_DATA_COLLECTOR_NOTIFY_POLICY:=> SELECT SET_DATA_COLLECTOR_NOTIFY_POLICY('LoginFailures','v_syslog_notifier', 'Login failed!', true);
This notifier writes the following message to syslog (default location:
/var/log/messages
) when a user fails to authenticate as the userBob
:Apr 25 16:04:58 vertica_host_01 vertica: Event Posted: Event Code:21 Event Id:0 Event Severity: Notice [5] PostedTimestamp: 2022-04-25 16:04:58.083063 ExpirationTimestamp: 2022-04-25 16:04:58.083063 EventCodeDescription: Notifier ProblemDescription: (Login failed!) { "_db":"VMart", "_schema":"v_internal", "_table":"dc_login_failures", "_uuid":"f8b0278a-3282-4e1a-9c86-e0f3f042a971", "authentication_method":"Reject", "client_authentication_name":"default: Reject", "client_hostname":"::1", "client_label":"", "client_os_user_name":"dbadmin", "client_pid":523418, "client_version":"", "database_name":"dbadmin", "effective_protocol":"3.8", "node_name":"v_vmart_node0001", "reason":"REJECT", "requested_protocol":"3.8", "ssl_client_fingerprint":"", "ssl_client_subject":"", "time":"2022-04-25 16:04:58.082568-05", "user_name":"Bob" }#012 DatabaseName: VMart Hostname: vertica_host_01
See also
4.2 - Configuring reporting for SNMP
Configuring event reporting for SNMP consists of:
-
Configuring Vertica to enable event trapping for SNMP as described below.
-
Importing the Vertica Management Information Base (MIB) file into the SNMP monitoring device.
The Vertica MIB file allows the SNMP trap receiver to understand the traps it receives from Vertica. This, in turn, allows you to configure the actions it takes when it receives traps.
Vertica supports the SNMP V1 trap protocol, and it is located in /opt/vertica/sbin/VERTICA-MIB. See the documentation for your SNMP monitoring device for more information about importing MIB files.
-
Configuring the SNMP trap receiver to handle traps from Vertica.
SNMP trap receiver configuration differs greatly from vendor to vendor. As such, the directions presented here for configuring the SNMP trap receiver to handle traps from Vertica are generic.
Vertica traps are single, generic traps that contain several fields of identifying information. These fields equate to the event data described in Monitoring events. However, the format used for the field names differs slightly. Under SNMP, the field names contain no spaces. Also, field names are pre-pended with “vert”. For example, Event Severity becomes vertEventSeverity.
When configuring your trap receiver, be sure to use the same hostname, port, and community string you used to configure event trapping in Vertica.
Examples of network management providers:
-
IBM Tivoli
-
AdventNet
-
Net-SNMP (Open Source)
-
Nagios (Open Source)
-
Open NMS (Open Source)
See also
4.3 - Configuring event trapping for SNMP
The following events are trapped by default when you configure Vertica to trap events for SNMP:
-
Low Disk Space
-
Read Only File System
-
Loss of K Safety
-
Current Fault Tolerance at Critical Level
-
Too Many ROS Containers
-
Node State Change
-
Recovery Failure
-
Stale Checkpoint
-
CRC Mismatch
To configure Vertica to trap events for SNMP
-
Enable Vertica to trap events for SNMP.
-
Define where Vertica sends the traps.
-
Optionally redefine which SNMP events Vertica traps.
Note
After you complete steps 1 and 2 above, Vertica automatically traps the default SNMP events. Only perform step 3 if you want to redefine which SNMP events are trapped. Vertica recommends that you trap theStale Checkpoint
event even if you decide to reduce the number events Vertica traps for SNMP. The specific settings you define have no effect on traps sent to the log. All events are trapped to the log.
To enable event trapping for SNMP
Use the following SQL command:
=> ALTER DATABASE DEFAULT SET SnmpTrapsEnabled = 1;
To define where Vertica send traps
Use the following SQL command, where Host_name and port identify the computer where SNMP resides, and CommunityString acts like a password to control Vertica's access to the server:
=> ALTER DATABASE DEFAULT SET SnmpTrapDestinationsList = 'host_name port CommunityString';
For example:
=> ALTER DATABASE DEFAULT SET SnmpTrapDestinationsList = 'localhost 162 public';
You can also specify multiple destinations by specifying a list of destinations, separated by commas:
=> ALTER DATABASE DEFAULT SET SnmpTrapDestinationsList = 'host_name1 port1 CommunityString1, hostname2 port2 CommunityString2';
Note
: Setting multiple destinations sends any SNMP trap notification to all destinations listed.To define which events Vertica traps
Use the following SQL command, where Event_Name
is one of the events in the list below the command:
=> ALTER DATABASE DEFAULT SET SnmpTrapEvents = 'Event_Name1, Even_Name2';
-
Low Disk Space
-
Read Only File System
-
Loss Of K Safety
-
Current Fault Tolerance at Critical Level
-
Too Many ROS Containers
-
Node State Change
-
Recovery Failure
-
Recovery Error
-
Recovery Lock Error
-
Recovery Projection Retrieval Error
-
Refresh Error
-
Tuple Mover Error
-
Stale Checkpoint
-
CRC Mismatch
Note
The above values are case sensitive.The following example specifies two event names:
=> ALTER DATABASE DEFAULT SET SnmpTrapEvents = 'Low Disk Space, Recovery Failure';
4.4 - Verifying SNMP configuration
To create a set of test events that checks SNMP configuration:
-
Set up SNMP trap handlers to catch Vertica events.
-
Test your setup with the following command:
SELECT SNMP_TRAP_TEST(); SNMP_TRAP_TEST -------------------------- Completed SNMP Trap Test (1 row)
5 - Event reporting examples
Vertica.log
The following example illustrates a Too Many ROS Containers event posted and cleared within vertica.log:
08/14/15 15:07:59 thr:nameless:0x45a08940 [INFO] Event Posted: Event Code:4 Event Id:0 Event Severity: Warning [4] PostedTimestamp:
2015-08-14 15:07:59.253729 ExpirationTimestamp: 2015-08-14 15:08:29.253729
EventCodeDescription: Too Many ROS Containers ProblemDescription:
Too many ROS containers exist on this node. DatabaseName: TESTDB
Hostname: fc6-1.example.com
08/14/15 15:08:54 thr:Ageout Events:0x2aaab0015e70 [INFO] Event Cleared:
Event Code:4 Event Id:0 Event Severity: Warning [4] PostedTimestamp:
2015-08-14 15:07:59.253729 ExpirationTimestamp: 2015-08-14 15:08:53.012669
EventCodeDescription: Too Many ROS Containers ProblemDescription:
Too many ROS containers exist on this node. DatabaseName: TESTDB
Hostname: fc6-1.example.com
SNMP
The following example illustrates a Too Many ROS Containers event posted to SNMP:
Version: 1, type: TRAPREQUESTEnterprise OID: .1.3.6.1.4.1.31207.2.0.1
Trap agent: 72.0.0.0
Generic trap: ENTERPRISESPECIFIC (6)
Specific trap: 0
.1.3.6.1.4.1.31207.1.1 ---> 4
.1.3.6.1.4.1.31207.1.2 ---> 0
.1.3.6.1.4.1.31207.1.3 ---> 2008-08-14 11:30:26.121292
.1.3.6.1.4.1.31207.1.4 ---> 4
.1.3.6.1.4.1.31207.1.5 ---> 1
.1.3.6.1.4.1.31207.1.6 ---> site01
.1.3.6.1.4.1.31207.1.7 ---> suse10-1
.1.3.6.1.4.1.31207.1.8 ---> Too many ROS containers exist on this node.
.1.3.6.1.4.1.31207.1.9 ---> QATESTDB
.1.3.6.1.4.1.31207.1.10 ---> Too Many ROS Containers
Syslog
The following example illustrates a Too Many ROS Containers event posted and cleared within syslog:
Aug 14 15:07:59 fc6-1 vertica: Event Posted: Event Code:4 Event Id:0 Event Severity: Warning [4] PostedTimestamp: 2015-08-14 15:07:59.253729 ExpirationTimestamp:
2015-08-14 15:08:29.253729 EventCodeDescription: Too Many ROS Containers ProblemDescription:
Too many ROS containers exist on this node. DatabaseName: TESTDB Hostname: fc6-1.example.com
Aug 14 15:08:54 fc6-1 vertica: Event Cleared: Event Code:4 Event Id:0 Event Severity:
Warning [4] PostedTimestamp: 2015-08-14 15:07:59.253729 ExpirationTimestamp:
2015-08-14 15:08:53.012669 EventCodeDescription: Too Many ROS Containers ProblemDescription:
Too many ROS containers exist on this node. DatabaseName: TESTDB Hostname: fc6-1.example.com