This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Data Collector utility

The Data Collector collects and retains history of important system activities, and records essential performance and resource utilization counters.

The Data Collector collects and retains history of important system activities and records essential performance and resource utilization counters. The Data Collector propagates information to system tables.

You can use the Data Collector in the following ways:

  • Query the past state of system tables and extract aggregate information

  • See what actions users have taken

  • Locate performance bottlenecks

  • Identify potential improvements to Vertica configuration

The Data Collector does not collect data for nodes that are down, so no historical data is available for affected nodes.

The Data Collector works with Workload Analyzer, a tool that intelligently monitors the performance of SQL queries and workloads and recommends tuning actions based on observations of the actual workload history.

Configuring and accessing Data Collector information

The Data Collector retains the data it gathers according to configurable retention policies. The Data Collector is on by default; you can disable it by setting the EnableDataCollector configuration parameter to 0. You can set the parameter at the database level using ALTER DATABASE or the node level using ALTER NODE. You cannot set it at the session or user level.

You can access metadata on collected data of all components using the DATA_COLLECTOR system table. This table includes information for each component about current collection policies and how much data is retained in memory and on disk.

Collected data is logged on disk in the DataCollector directory under the Vertica /catalog path. You can query logged data from component-specific Data Collector tables. You can also manage logged data with Vertica meta-functions; see Managing data collection logs.

1 - Configuring data retention policies

maintains retention policies for each Vertica component that it monitors—for example, TupleMoverEvents, or DepotEvictions.

Data collector maintains retention policies for each Vertica component that it monitors—for example, TupleMoverEvents or DepotEvictions. You can identify monitored components by querying the DATA_COLLECTOR system table. For example, the following query returns partition activity components:

=> SELECT DISTINCT component FROM DATA_COLLECTOR 
   WHERE component ILIKE '%partition%';
      component
----------------------
 HiveCustomPartitions
 CopyPartitions
 MovePartitions
 SwapPartitions
(4 rows)

Each component has its own retention policy, which consists of several properties:

  • MEMORY_BUFFER_SIZE_KB: the maximum amount of collected data that the Data Collector buffers in memory before moving it to disk.

  • DISK_SIZE_KB: the maximum disk space allocated for the component's Data Collector table.

  • INTERVAL_TIME: how long data of a given component is retained in the component's Data Collector table (INTERVAL data type).

Vertica sets default values on all properties, which you can modify with the SET_DATA_COLLECTOR_POLICY (using parameters) function or, alternatively, SET_DATA_COLLECTOR_POLICY and SET_DATA_COLLECTOR_TIME_POLICY.

You can view retention policy settings with GET_DATA_COLLECTOR_POLICY. For example, the following statement returns the retention policy for the TupleMoverEvents component:

=> SELECT GET_DATA_COLLECTOR_POLICY('TupleMoverEvents');
                          GET_DATA_COLLECTOR_POLICY
-----------------------------------------------------------------------------
 1000KB kept in memory, 15000KB kept on disk. Time based retention disabled.
(1 row)

Setting retention memory and disk storage

The MEMORY_BUFFER_SIZE_KB and DISK_SIZE_KB properties combine to determine how much collected data is available at any given time. If MEMORY_BUFFER_SIZE_KB is set to 0, the Data Collector does not retain any data for the component either in memory or on disk. If DISK_SIZE_KB is set to 0, then the Data Collector retains only as much component data as it can buffer, as set by MEMORY_BUFFER_SIZE_KB.

For example, the following statement changes memory and disk setting for the ResourceAcquisitions component from its default setting of 1,000 KB memory and 10,000 KB disk space to 1500 KB and 25000 KB, respectively:

=> SELECT SET_DATA_COLLECTOR_POLICY('ResourceAcquisitions', '1500', '25000');
 SET_DATA_COLLECTOR_POLICY
---------------------------
 SET
(1 row)

Consider setting MEMORY_BUFFER_SIZE_KB to a high value in the following cases:

  • Unusually high levels of data collection. If MEMORY_BUFFER_SIZE_KB is set too low, the Data Collector might be unable to flush buffered data to disk quickly enough to keep up with the activity level, which can lead to loss of in-memory data.

  • Very large data collector records—for example, records with very long query strings. The Data Collector uses double-buffering, so it cannot retain in-memory records that are more than half the size of the memory buffer.

Setting time-based retention

By default, all data collected for a given component remains on disk and is accessible in the component's Data Collector table, up to the disk storage limit of that component's retention policy. You can use SET_DATA_COLLECTOR_POLICY to limit how long data is retained in a component's Data Collector table. The following example sets the INTERVAL_TIME property for the TupleMoverEvents component:

=> SELECT SET_DATA_COLLECTOR_TIME_POLICY('TupleMoverEvents ', '30 minutes'::INTERVAL);
SET_DATA_COLLECTOR_TIME_POLICY
--------------------------------
SET
(1 row)

After this call, the corresponding Data Collector table, DC_TUPLE_MOVER_EVENTS, only retains records of activity that occurred in the last 30 minutes. Older data is automatically dropped from this table. For example, after the previous call to SET_DATA_COLLECTOR_TIME_POLICY, the table contains on 30 minutes' worth of data:

=> SELECT CURRENT_TIMESTAMP(0) - '30 minutes'::INTERVAL AS '30 minutes ago';
   30 minutes ago
---------------------
 2020-08-13 07:58:21
(1 row)

=> SELECT time, node_name, session_id, user_name, transaction_id, operation 
   FROM DC_TUPLE_MOVER_EVENTS WHERE node_name='v_vmart_node0001' 
   ORDER BY transaction_id;
             time              |    node_name     |           session_id            | user_name |  transaction_id   | operation
-------------------------------+------------------+---------------------------------+-----------+-------------------+-----------
 2020-08-13 08:16:54.360597-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807826 | Mergeout
 2020-08-13 08:16:54.397346-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807826 | Mergeout
 2020-08-13 08:16:54.424002-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807826 | Mergeout
 2020-08-13 08:16:54.425989-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807829 | Mergeout
 2020-08-13 08:16:54.456829-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807829 | Mergeout
 2020-08-13 08:16:54.485097-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807829 | Mergeout
 2020-08-13 08:19:45.8045-04   | v_vmart_node0001 | v_vmart_node0001-190508:0x37b08 | dbadmin   | 45035996273807855 | Mergeout
 2020-08-13 08:19:45.742-04    | v_vmart_node0001 | v_vmart_node0001-190508:0x37b08 | dbadmin   | 45035996273807855 | Mergeout
 2020-08-13 08:19:45.684764-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x37b08 | dbadmin   | 45035996273807855 | Mergeout
 2020-08-13 08:19:45.799796-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807865 | Mergeout
 2020-08-13 08:19:45.768856-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807865 | Mergeout
 2020-08-13 08:19:45.715424-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807865 | Mergeout
 2020-08-13 08:25:20.465604-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.497266-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.518839-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.52099-04  | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
 2020-08-13 08:25:20.549075-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
 2020-08-13 08:25:20.569072-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
(18 rows)

After 25 minutes elapse, 12 of these records age out of the 30-minute interval and are dropped:

=> SELECT CURRENT_TIMESTAMP(0) - '30 minutes'::INTERVAL AS '30 minutes ago';
   30 minutes ago
---------------------
 2020-08-13 08:23:33
(1 row)

=> SELECT time, node_name, session_id, user_name, transaction_id, operation 
   FROM DC_TUPLE_MOVER_EVENTS WHERE node_name='v_vmart_node0001' 
   ORDER BY transaction_id;
             time              |    node_name     |           session_id            | user_name |  transaction_id   | operation
-------------------------------+------------------+---------------------------------+-----------+-------------------+-----------
 2020-08-13 08:25:20.465604-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.497266-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.518839-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.52099-04  | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
 2020-08-13 08:25:20.549075-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
 2020-08-13 08:25:20.569072-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
(6 rows)

You can use SET_DATA_COLLECTOR_TIME_POLICY to update INTERVAL_TIME for all components by omitting the component argument:

=> SELECT SET_DATA_COLLECTOR_TIME_POLICY('1 day'::INTERVAL);
SET_DATA_COLLECTOR_TIME_POLICY
--------------------------------
SET
(1 row)

=> SELECT DISTINCT component, INTERVAL_SET, INTERVAL_TIME 
   FROM DATA_COLLECTOR WHERE component ILIKE '%partition%';
      component       | INTERVAL_SET | INTERVAL_TIME
----------------------+--------------+---------------
 HiveCustomPartitions | t            | 1
 MovePartitions       | t            | 1
 CopyPartitions       | t            | 1
 SwapPartitions       | t            | 1
(4 rows)

To clear the INTERVAL_TIME policy property, call SET_DATA_COLLECTOR_TIME_POLICY with a negative integer argument:

=> SELECT SET_DATA_COLLECTOR_TIME_POLICY('-1');
 SET_DATA_COLLECTOR_TIME_POLICY
--------------------------------
 SET
(1 row)

=> SELECT DISTINCT component, INTERVAL_SET, INTERVAL_TIME 
   FROM DATA_COLLECTOR WHERE component ILIKE '%partition%';
      component       | INTERVAL_SET | INTERVAL_TIME
----------------------+--------------+---------------
 MovePartitions       | f            | 0
 SwapPartitions       | f            | 0
 HiveCustomPartitions | f            | 0
 CopyPartitions       | f            | 0
(4 rows)

Setting INTERVAL_TIME on a retention policy also sets its INTERVAL_SET property to true.

2 - Querying data collector tables

Data Collector tables (prefixed by dc_) are in the V_INTERNAL schema.

You can obtain component-specific data from Data Collector tables. The Data Collector compiles the component data from its log files in a table format that you can query with standard SQL queries. You can identify Data Collector table names for specific components from the DATA_COLLECTOR system table:

=> SELECT DISTINCT component, table_name FROM DATA_COLLECTOR 
   WHERE component ILIKE 'lock%';
  component   |    table_name
--------------+------------------
 LockRequests | dc_lock_requests
 LockReleases | dc_lock_releases
 LockAttempts | dc_lock_attempts
(3 rows)

You can then query those Data Collector tables directly:

=> SELECT * FROM DC_LOCK_ATTEMPTS 
   WHERE description != 'Granted immediately';
-[ RECORD 1 ]------+------------------------------
time               | 2020-08-17 00:14:07.187607-04
node_name          | v_vmart_node0001
session_id         | v_vmart_node0001-319647:0x1d
user_id            | 45035996273704962
user_name          | dbadmin
transaction_id     | 45035996273819050
object             | 0
object_name        | Global Catalog
mode               | X
promoted_mode      | X
scope              | TRANSACTION
start_time         | 2020-08-17 00:14:07.184663-04
timeout_in_seconds | 300
result             | granted
description        | Granted after waiting

3 - Managing data collection logs

On startup, Vertica creates a DataCollector directory under the database catalog directory of each node.

On startup, Vertica creates a DataCollector directory under the database catalog directory of each node. This directory contains one or more logs for individual components. For example:

$ pwd
/home/dbadmin/VMart/v_vmart_node0001_catalog/DataCollector

$ ls -1 -g Lock*
-rw------- 1 verticadba 2559879 Aug 17 00:14 LockAttempts_650572441057355.log
-rw------- 1 verticadba  614579 Aug 17 05:28 LockAttempts_650952885486175.log
-rw------- 1 verticadba 2559895 Aug 14 18:31 LockReleases_650306482037650.log
-rw------- 1 verticadba 1411127 Aug 17 05:28 LockReleases_650759468041873.log

The DataCollector directory also contains a pair of SQL template files for each component:

  • CREATE_component_TABLE.sql provides DDL for creating a table that you can use to load Data Collector logs for a given component. For example:

    $ cat CREATE_LockAttempts_TABLE.sql
    \set dcschema 'echo ${DCSCHEMA:-dc}'
    CREATE TABLE :dcschema.dc_lock_attempts(
      "time" TIMESTAMP WITH TIME ZONE,
      "node_name" VARCHAR(128),
      "session_id" VARCHAR(128),
      "user_id" INTEGER,
      "user_name" VARCHAR(128),
      "transaction_id" INTEGER,
      "object" INTEGER,
      "object_name" VARCHAR(128),
      "mode" VARCHAR(128),
      "promoted_mode" VARCHAR(128),
      "scope" VARCHAR(128),
      "start_time" TIMESTAMP WITH TIME ZONE,
      "timeout_in_seconds" INTEGER,
      "result" VARCHAR(128),
      "description" VARCHAR(64000)
    );
    
  • COPY_component_TABLE.sql contains SQL for loading (with COPY) the data log files into the table that the CREATE script creates. For example:

    $ cat COPY_LockAttempts_TABLE.sql
    \set dcpath 'echo ${DCPATH:-$PWD}'
    \set dcschema 'echo ${DCSCHEMA:-dc}'
    \set logfiles '''':dcpath'/LockAttempts_*.log'''
    COPY :dcschema.dc_lock_attempts(
      LockAttempts_start_filler FILLER VARCHAR(64) DELIMITER E'\n',
      "time_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "time" FORMAT '_internal' DELIMITER E'\n',
      "node_name_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "node_name" ESCAPE E'\001' DELIMITER E'\n',
      "session_id_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "session_id" ESCAPE E'\001' DELIMITER E'\n',
      "user_id_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "user_id" FORMAT 'd' DELIMITER E'\n',
      "user_name_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "user_name" ESCAPE E'\001' DELIMITER E'\n',
      "transaction_id_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "transaction_id" FORMAT 'd' DELIMITER E'\n',
      "object_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "object" FORMAT 'd' DELIMITER E'\n',
      "object_name_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "object_name" ESCAPE E'\001' DELIMITER E'\n',
      "mode_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "mode" ESCAPE E'\001' DELIMITER E'\n',
      "promoted_mode_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "promoted_mode" ESCAPE E'\001' DELIMITER E'\n',
      "scope_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "scope" ESCAPE E'\001' DELIMITER E'\n',
      "start_time_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "start_time" FORMAT '_internal' DELIMITER E'\n',
      "timeout_in_seconds_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "timeout_in_seconds" FORMAT 'd' DELIMITER E'\n',
      "result_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "result" ESCAPE E'\001' DELIMITER E'\n',
      "description_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "description" ESCAPE E'\001'
    )  FROM :logfiles RECORD TERMINATOR E'\n.\n' DELIMITER E'\n';
    

Log management functions

You can use two Data Collector functions to manage logs. Both functions can operate on a single component or all components.

  • FLUSH_DATA_COLLECTOR waits until in-memory logs are moved to disk and then flushes the Data Collector, synchronizing the log with disk storage.

  • CLEAR_DATA_COLLECTOR clears all memory and disk records from Data Collector tables and logs and then resets collection statistics in DATA_COLLECTOR.