This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Data collector utility

The Data Collector collects and retains history of important system activities, and records essential performance and resource utilization counters.

The Data Collector collects and retains history of important system activities, and records essential performance and resource utilization counters.

Data Collector extends system table functionality with minimal overhead, performing the following tasks:

  • Provides a framework for recording events.

  • Propagates information to system tables.

You can query Data Collector information to obtain the past state of system tables and extract aggregate information. It can also help you:

  • See what actions users have taken.

  • Locate performance bottlenecks.

  • Identify potential improvements to Vertica configuration.

Data Collector works with Workload Analyzer, a tool that intelligently monitors the performance of SQL queries and workloads, and recommends tuning actions based on observations of the actual workload history.

Configuring and accessing data collector information

Data Collector retains the data it gathers according to configurable retention policies. Data Collector is on by default; you can disable it by setting set configuration parameter EnableDataCollector to 0, at the database and node levels, with ALTER DATABASE and ALTER NODE, respectively.

You can access metadata on collected data of all components through system table DATA_COLLECTOR. This table includes information about current collection policies on that component, and how much data is retained in memory and on disk.

Collected data is logged on disk in the DataCollector directory under the Vertica /catalog path. You can query logged data from component-specific Data Collector tables. You can also manage logged data with Vertica meta-functions; see Managing data collection logs for details.

1 - Configuring data retention policies

maintains retention policies for each Vertica component that it monitors—for example, TupleMoverEvents, or DepotEvictions.

Data collector maintains retention policies for each Vertica component that it monitors—for example, TupleMoverEvents, or DepotEvictions. You can identify monitored components by querying system table DATA_COLLECTOR. For example, the following query returns partition activity components:

=> SELECT DISTINCT component FROM data_collector WHERE component ILIKE '%partition%';
      component
----------------------
 HiveCustomPartitions
 CopyPartitions
 MovePartitions
 SwapPartitions
(4 rows)

Each component has its own retention policy, which is comprised of several properties:

  • MEMORY_BUFFER_SIZE_KB specifies in kilobytes the maximum amount of collected data that the Data Collector buffers in memory before moving it to disk.

  • DISK_SIZE_KB specifies in kilobytes the maximum disk space allocated for this component's Data Collector table.

  • INTERVAL_TIME is an INTERVAL data type that specifies how long data of a given component is retained in that component's Data Collector table.

Vertica sets default values on all properties, which you can modify with meta-functions SET_DATA_COLLECTOR_POLICY and SET_DATA_COLLECTOR_TIME_POLICY.

You can view retention policy settings by calling GET_DATA_COLLECTOR_POLICY. For example, the following statement returns the retention policy for the TupleMoverEvents component:

=> SELECT get_data_collector_policy('TupleMoverEvents');
                          get_data_collector_policy
-----------------------------------------------------------------------------
 1000KB kept in memory, 15000KB kept on disk. Time based retention disabled.
(1 row)

Setting retention memory and disk storage

Retention policy properties MEMORY_BUFFER_SIZE_KB and DISK_SIZE_KB combine to determine how much collected data is available at any given time. The two properties have the following dependencies: if MEMORY_BUFFER_SIZE_KB is set to 0, the Data Collector does not retain any data for this component either in memory or on disk; and if DISK_SIZE_KB is set to 0, then the Data Collector retains only as much component data as it can buffer, as set by MEMORY_BUFFER_SIZE_KB .

For example, the following statement changes memory and disk setting for component ResourceAcquisitions from its current setting of 1,000 KB memory and 10,000 KB disk space to 1500 KB and 25000 KB, respectively:

=> SELECT set_data_collector_policy('ResourceAcquisitions', '1500', '25000');
 set_data_collector_policy
---------------------------
 SET
(1 row)

You should consider setting MEMORY_BUFFER_SIZE_KB to a high value in the following cases:

  • Unusually high levels of data collection. If MEMORY_BUFFER_SIZE_KB is set too low, the Data Collector might be unable to flush buffered data to disk fast enough to keep up with the activity level, which can lead to loss of in-memory data.

  • Very large data collector records—for example, records with very long query strings. The Data Collector uses double-buffering, so it cannot retain in memory records that are more than 50 percent larger than MEMORY_BUFFER_SIZE_KB.

Setting time-based retention

By default, all collected data of a given component remain on disk and are accessible in the component's Data Collector table, up to the disk storage limit of that component's retention policy as set by its DISK_SIZE_KB property. You can call SET_DATA_COLLECTOR_POLICY to limit how long data is retained in a component's Data Collector table. In the following example, SET_DATA_COLLECTOR_POLICY is called on component TupleMoverEvents and sets its INTERVAL_TIME property to an interval of 30 minutes:

SELECT set_data_collector_policy('TupleMoverEvents ', '30 minutes'::interval);
set_data_collector_time_policy
--------------------------------
SET
(1 row)

After this call, the Data Collector table dc_tuple_mover_events only retains records of Tuple Mover activity that occurred in the last 30 minutes. Older Tuple Mover data are automatically dropped from this table. For example, after the previous call to SET_DATA_COLLECTOR_POLICY, querying dc_tuple_mover_events returns data of Tuple Mover activity that was collected over the last 30 minutes—in this case, since 07:58:21:

=> SELECT current_timestamp(0)  - '30 minutes'::interval AS '30 minutes ago';
   30 minutes ago
---------------------
 2020-08-13 07:58:21
(1 row)

=> SELECT time, node_name, session_id, user_name, transaction_id, operation FROM dc_tuple_mover_events WHERE node_name='v_vmart_node0001' ORDER BY transaction_id;
             time              |    node_name     |           session_id            | user_name |  transaction_id   | operation
-------------------------------+------------------+---------------------------------+-----------+-------------------+-----------
 2020-08-13 08:16:54.360597-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807826 | Mergeout
 2020-08-13 08:16:54.397346-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807826 | Mergeout
 2020-08-13 08:16:54.424002-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807826 | Mergeout
 2020-08-13 08:16:54.425989-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807829 | Mergeout
 2020-08-13 08:16:54.456829-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807829 | Mergeout
 2020-08-13 08:16:54.485097-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807829 | Mergeout
 2020-08-13 08:19:45.8045-04   | v_vmart_node0001 | v_vmart_node0001-190508:0x37b08 | dbadmin   | 45035996273807855 | Mergeout
 2020-08-13 08:19:45.742-04    | v_vmart_node0001 | v_vmart_node0001-190508:0x37b08 | dbadmin   | 45035996273807855 | Mergeout
 2020-08-13 08:19:45.684764-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x37b08 | dbadmin   | 45035996273807855 | Mergeout
 2020-08-13 08:19:45.799796-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807865 | Mergeout
 2020-08-13 08:19:45.768856-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807865 | Mergeout
 2020-08-13 08:19:45.715424-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807865 | Mergeout
 2020-08-13 08:25:20.465604-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.497266-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.518839-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.52099-04  | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
 2020-08-13 08:25:20.549075-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
 2020-08-13 08:25:20.569072-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
(18 rows)

After 25 minutes elapse, 12 of these records age out of the 30 minute interval set for TupleMoverEvents., and are dropped from dc_tuple_mover_events:

=> SELECT current_timestamp(0)  - '30 minutes'::interval AS '30 minutes ago';
   30 minutes ago
---------------------
 2020-08-13 08:23:33
(1 row)


=> SELECT time, node_name, session_id, user_name, transaction_id, operation FROM dc_tuple_mover_events WHERE node_name='v_vmart_node0001' ORDER BY transaction_id;
             time              |    node_name     |           session_id            | user_name |  transaction_id   | operation
-------------------------------+------------------+---------------------------------+-----------+-------------------+-----------
 2020-08-13 08:25:20.465604-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.497266-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.518839-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.52099-04  | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
 2020-08-13 08:25:20.549075-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
 2020-08-13 08:25:20.569072-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
(6 rows)

The meta-function SET_DATA_COLLECTOR_TIME_POLICY also sets a retention policy's INTERVAL_TIME property. Unlike SET_DATA_COLLECTOR_POLICY, this meta-function only sets the INTERVAL_TIME property . It also differs in that you can use this meta-function to update INTERVAL_TIME on all components, by omitting the component argument. For example:

SELECT set_data_collector_time_policy('1 day'::interval);
set_data_collector_time_policy
--------------------------------
SET
(1 row)

=> SELECT DISTINCT component, INTERVAL_SET, INTERVAL_TIME FROM DATA_COLLECTOR WHERE component ILIKE '%partition%';
      component       | INTERVAL_SET | INTERVAL_TIME
----------------------+--------------+---------------
 HiveCustomPartitions | t            | 1
 MovePartitions       | t            | 1
 CopyPartitions       | t            | 1
 SwapPartitions       | t            | 1
(4 rows)

To clear the INTERVAL_TIME policy property, call SET_DATA_COLLECTOR_TIME_POLICY with a negative integer argument. For example:

=> SELECT set_data_collector_time_policy('-1');
 set_data_collector_time_policy
--------------------------------
 SET
(1 row)

=> SELECT DISTINCT component, INTERVAL_SET, INTERVAL_TIME FROM DATA_COLLECTOR WHERE component ILIKE '%partition%';
      component       | INTERVAL_SET | INTERVAL_TIME
----------------------+--------------+---------------
 MovePartitions       | f            | 0
 SwapPartitions       | f            | 0
 HiveCustomPartitions | f            | 0
 CopyPartitions       | f            | 0
(4 rows)

2 - Querying data collector tables

Data Collector tables (prefixed by dc_) are in the V_INTERNAL schema.

You can obtain component-specific data from Data Collector tables. The Data Collector compiles the component data from its log files in a table format that you can query with standard SQL queries. You can identify Data Collector table names for specific components through system table Data Collector. For example:

=> SELECT distinct component, table_name FROM data_collector where component ILIKE 'lock%';
  component   |    table_name
--------------+------------------
 LockRequests | dc_lock_requests
 LockReleases | dc_lock_releases
 LockAttempts | dc_lock_attempts
(3 rows)

You can then query the desired Data Collector tables—for example, check for lock delays in dc_lock_attempts:

=> SELECT * from dc_lock_attempts WHERE description != 'Granted immediately';
-[ RECORD 1 ]------+------------------------------
time               | 2020-08-17 00:14:07.187607-04
node_name          | v_vmart_node0001
session_id         | v_vmart_node0001-319647:0x1d
user_id            | 45035996273704962
user_name          | dbadmin
transaction_id     | 45035996273819050
object             | 0
object_name        | Global Catalog
mode               | X
promoted_mode      | X
scope              | TRANSACTION
start_time         | 2020-08-17 00:14:07.184663-04
timeout_in_seconds | 300
result             | granted
description        | Granted after waiting

3 - Managing data collection logs

On startup, Vertica creates a DataCollector directory under the database catalog directory of each node.

On startup, Vertica creates a DataCollector directory under the database catalog directory of each node. This directory contains one or more logs for individual components. For example:

[dbadmin@doch01 DataCollector]$ pwd
/home/dbadmin/VMart/v_vmart_node0001_catalog/DataCollector
[dbadmin@doch01 DataCollector]$ ls -1 -g Lock*
-rw------- 1 verticadba 2559879 Aug 17 00:14 LockAttempts_650572441057355.log
-rw------- 1 verticadba  614579 Aug 17 05:28 LockAttempts_650952885486175.log
-rw------- 1 verticadba 2559895 Aug 14 18:31 LockReleases_650306482037650.log
-rw------- 1 verticadba 1411127 Aug 17 05:28 LockReleases_650759468041873.log

The DataCollector directory also contains a pair of SQL template files for each component:

  • CREATE_component_TABLE.sql provides DDL for creating a table where you can load Data Collector logs for a given component—for example, LockAttempts:

    [dbadmin@doch01 DataCollector]$ cat CREATE_LockAttempts_TABLE.sql
    \set dcschema 'echo ${DCSCHEMA:-dc}'
    CREATE TABLE :dcschema.dc_lock_attempts(
      "time" TIMESTAMP WITH TIME ZONE,
      "node_name" VARCHAR(128),
      "session_id" VARCHAR(128),
      "user_id" INTEGER,
      "user_name" VARCHAR(128),
      "transaction_id" INTEGER,
      "object" INTEGER,
      "object_name" VARCHAR(128),
      "mode" VARCHAR(128),
      "promoted_mode" VARCHAR(128),
      "scope" VARCHAR(128),
      "start_time" TIMESTAMP WITH TIME ZONE,
      "timeout_in_seconds" INTEGER,
      "result" VARCHAR(128),
      "description" VARCHAR(64000)
    );
    
  • COPY_component_TABLE.sql contains SQL for loading (with COPY) the data log files into the table that the CREATE script creates. For example:

    [dbadmin@doch01 DataCollector]$ cat COPY_LockAttempts_TABLE.sql
    \set dcpath 'echo ${DCPATH:-$PWD}'
    \set dcschema 'echo ${DCSCHEMA:-dc}'
    \set logfiles '''':dcpath'/LockAttempts_*.log'''
    COPY :dcschema.dc_lock_attempts(
      LockAttempts_start_filler FILLER VARCHAR(64) DELIMITER E'\n',
      "time_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "time" FORMAT '_internal' DELIMITER E'\n',
      "node_name_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "node_name" ESCAPE E'\001' DELIMITER E'\n',
      "session_id_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "session_id" ESCAPE E'\001' DELIMITER E'\n',
      "user_id_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "user_id" FORMAT 'd' DELIMITER E'\n',
      "user_name_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "user_name" ESCAPE E'\001' DELIMITER E'\n',
      "transaction_id_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "transaction_id" FORMAT 'd' DELIMITER E'\n',
      "object_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "object" FORMAT 'd' DELIMITER E'\n',
      "object_name_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "object_name" ESCAPE E'\001' DELIMITER E'\n',
      "mode_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "mode" ESCAPE E'\001' DELIMITER E'\n',
      "promoted_mode_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "promoted_mode" ESCAPE E'\001' DELIMITER E'\n',
      "scope_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "scope" ESCAPE E'\001' DELIMITER E'\n',
      "start_time_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "start_time" FORMAT '_internal' DELIMITER E'\n',
      "timeout_in_seconds_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "timeout_in_seconds" FORMAT 'd' DELIMITER E'\n',
      "result_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "result" ESCAPE E'\001' DELIMITER E'\n',
      "description_nfiller" FILLER VARCHAR(32) DELIMITER ':',
      "description" ESCAPE E'\001'
    )  FROM :logfiles RECORD TERMINATOR E'\n.\n' DELIMITER E'\n';
    

Log management meta-functions

You can manage Data Collector logs with Vertica meta-functions FLUSH_DATA_COLLECTOR and CLEAR_DATA_COLLECTOR. Both functions can specify a single component, or execute on all components:

  • FLUSH_DATA_COLLECTOR waits until memory logs are moved to disk and then flushes the Data Collector, synchronizing the log with disk storage. For example, the following statement executes on all components:

    => SELECT flush_data_collector();
     flush_data_collector
    ----------------------
     FLUSH
    (1 row)
    
  • CLEAR_DATA_COLLECTOR clears all memory and disk records from Data Collector tables and logs, and resets collection statistics in system table DATA_COLLECTOR. For example, the following statement executes on data collected for component ResourceAcquisitions:

    => SELECT clear_data_collector('ResourceAcquisitions');
     clear_data_collector
    ----------------------
     CLEAR
    (1 row)