The Vertica Data Collector is a utility that extends system table functionality by providing a framework for recording events. It gathers and retains monitoring information about your database cluster and makes that information available in system tables with negligble performance impact.
Collected data is stored on disk in the DataCollector directory under the Vertica /catalog path. You can use the information the Data Collector retains in the following ways:
Query the past state of system tables and extract aggregate information
See what actions users have taken
Locate performance bottlenecks
Identify potential improvements to Vertica configuration
Data Collector works in conjunction with an advisor tool called Workload Analyzer, which intelligently monitors the performance of SQL queries and workloads and recommends tuning actions based on observations of the actual workload history.
By default, Data Collector is enabled and retains information for all sessions. If performance issues arise, a superuser can disable Data Collector by setting the EnableDataCollector configuration parameter to 0.
1 - CLEAR_DATA_COLLECTOR
Clears all memory and disk records from Data Collector tables and logs, and resets collection statistics in system table DATA_COLLECTOR.
Clears all memory and disk records from Data Collector tables and logs, and resets collection statistics in the DATA_COLLECTOR system table.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
The function returns information like the following (exact contents might differ from this example):
=> SELECT DATA_COLLECTOR_HELP();
-----------------------------------------------------------------------------
Usage Data Collector
The data collector retains history of important system activities.
This data can be used as a reference of what actions have been taken
by users, but it can also be used to locate performance bottlenecks,
or identify potential improvements to the Vertica configuration.
This data is queryable via Vertica system tables.
Acccess a list of data collector components, and some statistics, by running:
SELECT * FROM v_monitor.data_collector;
The amount of data retained by size and time can be controlled with several
functions.
To just set the size amount:
set_data_collector_policy(<component>,
<memory retention (KB)>,
<disk retention (KB)>);
To set both the size and time amounts (the smaller one will dominate):
set_data_collector_policy(<component>,
<memory retention (KB)>,
<disk retention (KB)>,
<interval>);
To set just the time amount:
set_data_collector_time_policy(<component>,
<interval>);
To set the time amount for all tables:
set_data_collector_time_policy(<interval>);
The current retention policy for a component can be queried with:
get_data_collector_policy(<component>);
Data on disk is kept in the "DataCollector" directory under the Vertica
\catalog path. This directory also contains instructions on how to load
the monitoring data into another Vertica database.
To move the data collector logs and instructions to other storage locations,
create labeled storage locations using add_location and then use:
set_data_collector_storage_location(<storage_label>);
Additional commands can be used to configure the data collection logs.
The log can be cleared with:
clear_data_collector([<optional component>]);
The log can be synchronized with the disk storage using:
flush_data_collector([<optional component>]);
Updates the following retention policy properties for the specified component:.
Updates selected retention policy properties for a specified component. SET_DATA_COLLECTOR_POLICY (using parameters) is another version of this function that uses named parameters instead of positional arguments.
Before you change a retention policy, you can view its current settings by querying the DATA_COLLECTOR table or by calling the GET_DATA_COLLECTOR_POLICY function.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Maximum amount of data, in kilobytes, that is buffered in memory before moving it to disk. The retention policy property MEMORY_BUFFER_SIZE_KB is set from this value. This value must be positive and greater than 0.
Consider setting this parameter to a high value in the following cases:
Unusually high levels of data collection. If the value is too low, the Data Collector might be unable to flush buffered data to disk quickly enough to keep up with the activity level, which can lead to loss of in-memory data.
Very large data collector records—for example, records with very long query strings. The Data Collector uses double-buffering, so it cannot retain in-memory records that are more than half the size of the memory buffer.
disk-size
Maximum disk space, in kilobytes, allocated for this component's Data Collector table. The retention policy property DISK_SIZE_KB is set from this value. If set to 0, the Data Collector retains only as much component data as it can buffer in memory, as specified by memory-buffer-size.
interval-time
How long to retain data in the component's Data Collector table, an INTERVAL. The INTERVAL_TIME retention policy property is set from this value. If the value is positive, it also sets the INTERVAL_SET policy property to true.
For example, if you specify the TupleMoverEvents component and set this value to two days ('2 days'::interval), the DC_TUPLE_MOVER_EVENTS Data Collector table retains records over the last 48 hours. Older Tuple Mover data is automatically dropped from this table.
Setting a component's policy's INTERVAL_TIME property has no effect on how much data storage the Data Collector retains on disk for that component. Maximum disk storage capacity is determined by the DISK_SIZE_KB property. Setting the INTERVAL_TIME property only affects how long data is retained by the component's Data Collector table. For details, see Configuring data retention policies.
To disable the INTERVAL_TIME policy property, set this value to a negative integer. Doing so reverts two retention policy properties to their default settings:
INTERVAL_SET: false
INTERVAL_TIME: 0
With these two properties thus set, the component's Data Collector table retains data on all component events until it reaches its maximum limit, as set by the DISK_SIZE_KB retention policy property.
Updates selected retention policy properties for a component.
Updates selected retention policy properties for a specified component. SET_DATA_COLLECTOR_POLICY is another version of this function that uses positional arguments instead of named parameters.
Before you change a retention policy, you can view its current settings by querying the DATA_COLLECTOR table or by calling the GET_DATA_COLLECTOR_POLICY function.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Maximum amount of data, in kilobytes, that is buffered in memory before moving it to disk. The retention policy property MEMORY_BUFFER_SIZE_KB is set from this value. This value must be positive and greater than 0.
Consider setting this parameter to a high value in the following cases:
Unusually high levels of data collection. If the value is too low, the Data Collector might be unable to flush buffered data to disk quickly enough to keep up with the activity level, which can lead to loss of in-memory data.
Very large data collector records—for example, records with very long query strings. The Data Collector uses double-buffering, so it cannot retain in-memory records that are more than half the size of the memory buffer.
diskKB (INTEGER)
Maximum disk space, in kilobytes, allocated for this component's Data Collector table. The retention policy property DISK_SIZE_KB is set from this value. If set to 0, the Data Collector retains only as much component data as it can buffer in memory, as specified by memory-buffer-size.
synchronous (BOOLEAN)
Whether to ensure that no data is lost by performing synchronous writes. By default, if the Data Collector cannot keep up with the activity level, it can drop data buffered in memory before writing it to disk. If it is important to retain all activity records, set this parameter to true.
retention (INTERVAL)
How long to retain data in the component's Data Collector table, an INTERVAL. The INTERVAL_TIME retention policy property is set from this value. If the value is positive, it also sets the INTERVAL_SET policy property to true.
For example, if you specify the TupleMoverEvents component and set this value to two days ('2 days'::interval), the DC_TUPLE_MOVER_EVENTS Data Collector table retains records over the last 48 hours. Older Tuple Mover data is automatically dropped from this table.
Setting a component's policy's INTERVAL_TIME property has no effect on how much data storage the Data Collector retains on disk for that component. Maximum disk storage capacity is determined by the DISK_SIZE_KB property. Setting the INTERVAL_TIME property only affects how long data is retained by the component's Data Collector table. For details, see Configuring data retention policies.
To disable the INTERVAL_TIME policy property, set this value to a negative integer. Doing so reverts two retention policy properties to their default settings:
INTERVAL_SET: false
INTERVAL_TIME: 0
With these two properties thus set, the component's Data Collector table retains data on all component events until it reaches its maximum limit, as set by the DISK_SIZE_KB retention policy property.
Updates the retention policy property INTERVAL_TIME for the specified component.
Updates the INTERVAL_TIME retention policy property for a specified component or globally. Calling this function has no effect on other properties of the same component.
How long to retain data in the component's Data Collector table, an INTERVAL. The INTERVAL_TIME retention policy property is set from this value. If the value is positive, it also sets the INTERVAL_SET policy property to true.
For example, if you specify the TupleMoverEvents component and set this value to two days ('2 days'::interval), the DC_TUPLE_MOVER_EVENTS Data Collector table retains records over the last 48 hours. Older Tuple Mover data is automatically dropped from this table.
Setting a component's policy's INTERVAL_TIME property has no effect on how much data storage the Data Collector retains on disk for that component. Maximum disk storage capacity is determined by the DISK_SIZE_KB property. Setting the INTERVAL_TIME property only affects how long data is retained by the component's Data Collector table. For details, see Configuring data retention policies.
To disable the INTERVAL_TIME policy property, set this value to a negative integer. Doing so reverts two retention policy properties to their default settings:
INTERVAL_SET: false
INTERVAL_TIME: 0
With these two properties thus set, the component's Data Collector table retains data on all component events until it reaches its maximum limit, as set by the DISK_SIZE_KB retention policy property.
Privileges
Superuser
Examples
The following example sets a retention time for a single component. Other components are unaffected: