This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Data Collector functions
The Vertica Data Collector is a utility that extends system table functionality by providing a framework for recording events.
The Vertica Data Collector is a utility that extends system table functionality by providing a framework for recording events. It gathers and retains monitoring information about your database cluster and makes that information available in system tables with negligble performance impact.
Collected data is stored on disk in the DataCollector
directory under the Vertica /catalog
path. You can use the information the Data Collector retains in the following ways:
-
Query the past state of system tables and extract aggregate information
-
See what actions users have taken
-
Locate performance bottlenecks
-
Identify potential improvements to Vertica configuration
Data Collector works in conjunction with an advisor tool called Workload Analyzer, which intelligently monitors the performance of SQL queries and workloads and recommends tuning actions based on observations of the actual workload history.
By default, Data Collector is enabled and retains information for all sessions. If performance issues arise, a superuser can disable Data Collector by setting the EnableDataCollector configuration parameter to 0.
1 - CLEAR_DATA_COLLECTOR
Clears all memory and disk records from Data Collector tables and logs, and resets collection statistics in system table DATA_COLLECTOR.
Clears all memory and disk records from Data Collector tables and logs, and resets collection statistics in the DATA_COLLECTOR system table.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
Volatile
Syntax
CLEAR_DATA_COLLECTOR( [ 'component' ] )
Arguments
component
- Component to clear. If not specified, the function clears memory and disk records for all components.
Query DATA_COLLECTOR to get a list of components:
=> SELECT DISTINCT component, description FROM DATA_COLLECTOR
WHERE component ILIKE '%Depot%' ORDER BY component;
component | description
----------------+-------------------------------
DepotEvictions | Files evicted from the Depot
DepotFetches | Files fetched to the Depot
DepotUploads | Files Uploaded from the Depot
(3 rows)
Privileges
Superuser
Examples
By default, the function clears data collection for all components:
=> SELECT CLEAR_DATA_COLLECTOR();
CLEAR_DATA_COLLECTOR
----------------------
CLEAR
(1 row)
To clear memory and disk records for only one component, specify the component:
=> SELECT CLEAR_DATA_COLLECTOR('ResourceAcquisitions');
CLEAR_DATA_COLLECTOR
----------------------
CLEAR
(1 row)
See also
Data Collector utility
2 - DATA_COLLECTOR_HELP
Returns online usage instructions about the Data Collector, the V_MONITOR.DATA_COLLECTOR system table, and the Data Collector control functions.
Returns online usage instructions about the Data Collector, the
DATA_COLLECTOR
system table, and the Data Collector control functions.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
Volatile
Syntax
DATA_COLLECTOR_HELP()
Privileges
None
Returns
The function returns information like the following (exact contents might differ from this example):
=> SELECT DATA_COLLECTOR_HELP();
-----------------------------------------------------------------------------
Usage Data Collector
The data collector retains history of important system activities.
This data can be used as a reference of what actions have been taken
by users, but it can also be used to locate performance bottlenecks,
or identify potential improvements to the Vertica configuration.
This data is queryable via Vertica system tables.
Acccess a list of data collector components, and some statistics, by running:
SELECT * FROM v_monitor.data_collector;
The amount of data retained by size and time can be controlled with several
functions.
To just set the size amount:
set_data_collector_policy(<component>,
<memory retention (KB)>,
<disk retention (KB)>);
To set both the size and time amounts (the smaller one will dominate):
set_data_collector_policy(<component>,
<memory retention (KB)>,
<disk retention (KB)>,
<interval>);
To set just the time amount:
set_data_collector_time_policy(<component>,
<interval>);
To set the time amount for all tables:
set_data_collector_time_policy(<interval>);
The current retention policy for a component can be queried with:
get_data_collector_policy(<component>);
Data on disk is kept in the "DataCollector" directory under the Vertica
\catalog path. This directory also contains instructions on how to load
the monitoring data into another Vertica database.
To move the data collector logs and instructions to other storage locations,
create labeled storage locations using add_location and then use:
set_data_collector_storage_location(<storage_label>);
Additional commands can be used to configure the data collection logs.
The log can be cleared with:
clear_data_collector([<optional component>]);
The log can be synchronized with the disk storage using:
flush_data_collector([<optional component>]);
See also
3 - FLUSH_DATA_COLLECTOR
Waits until memory logs are moved to disk and then flushes the Data Collector, synchronizing the log with disk storage.
Waits until memory logs are moved to disk and then flushes the Data Collector, synchronizing the log with disk storage.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
Volatile
Syntax
FLUSH_DATA_COLLECTOR( [ 'component' ] )
Arguments
component
- Component to flush. If not specified, the function flushes data for all components.
Query DATA_COLLECTOR to get a list of components:
=> SELECT DISTINCT component, description FROM DATA_COLLECTOR
WHERE component ILIKE '%Depot%' ORDER BY component;
component | description
----------------+-------------------------------
DepotEvictions | Files evicted from the Depot
DepotFetches | Files fetched to the Depot
DepotUploads | Files Uploaded from the Depot
(3 rows)
Privileges
Superuser
Examples
By default, the function flushes the Data Collector for all components:
=> SELECT FLUSH_DATA_COLLECTOR();
FLUSH_DATA_COLLECTOR
----------------------
FLUSH
(1 row)
To flush only one component, specify it:
=> SELECT FLUSH_DATA_COLLECTOR('ResourceAcquisitions');
FLUSH_DATA_COLLECTOR
----------------------
FLUSH
(1 row)
See also
Data Collector utility
4 - GET_DATA_COLLECTOR_POLICY
Retrieves a brief statement about the retention policy for the specified component.
Returns a brief statement about the retention policy for the specified component.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
Volatile
Syntax
GET_DATA_COLLECTOR_POLICY( 'component' )
Arguments
component
- Component to query.
Query DATA_COLLECTOR to get a list of components:
=> SELECT DISTINCT component, description FROM DATA_COLLECTOR
WHERE component ILIKE '%Depot%' ORDER BY component;
component | description
----------------+-------------------------------
DepotEvictions | Files evicted from the Depot
DepotFetches | Files fetched to the Depot
DepotUploads | Files Uploaded from the Depot
(3 rows)
Privileges
None
Examples
=> SELECT GET_DATA_COLLECTOR_POLICY('ResourceAcquisitions');
GET_DATA_COLLECTOR_POLICY
----------------------------------------------
1000KB kept in memory, 10000KB kept on disk. Synchronous logging disabled. Time based retention disabled.
(1 row)
See also
5 - SET_DATA_COLLECTOR_POLICY
Updates the following retention policy properties for the specified component:.
Updates selected retention policy properties for a specified component. SET_DATA_COLLECTOR_POLICY (using parameters) is another version of this function that uses named parameters instead of positional arguments.
Before you change a retention policy, you can view its current settings by querying the DATA_COLLECTOR table or by calling the GET_DATA_COLLECTOR_POLICY function.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
Volatile
Syntax
SET_DATA_COLLECTOR_POLICY('component', 'memory-buffer-size', 'disk-size' [,'interval-time'] )
Arguments
component
- The retention policy to update.
Query DATA_COLLECTOR to get a list of components:
=> SELECT DISTINCT component, description FROM DATA_COLLECTOR
WHERE component ILIKE '%Depot%' ORDER BY component;
component | description
----------------+-------------------------------
DepotEvictions | Files evicted from the Depot
DepotFetches | Files fetched to the Depot
DepotUploads | Files Uploaded from the Depot
(3 rows)
memory-buffer-size
- Maximum amount of data, in kilobytes, that is buffered in memory before moving it to disk. The retention policy property MEMORY_BUFFER_SIZE_KB is set from this value. This value must be positive and greater than 0.
Consider setting this parameter to a high value in the following cases:
-
Unusually high levels of data collection. If the value is too low, the Data Collector might be unable to flush buffered data to disk quickly enough to keep up with the activity level, which can lead to loss of in-memory data.
-
Very large data collector records—for example, records with very long query strings. The Data Collector uses double-buffering, so it cannot retain in-memory records that are more than half the size of the memory buffer.
disk-size
- Maximum disk space, in kilobytes, allocated for this component's Data Collector table. The retention policy property DISK_SIZE_KB is set from this value. If set to 0, the Data Collector retains only as much component data as it can buffer in memory, as specified by
memory-buffer-size
.
interval-time
How long to retain data in the component's Data Collector table, an INTERVAL. The INTERVAL_TIME retention policy property is set from this value. If the value is positive, it also sets the INTERVAL_SET policy property to true.
For example, if you specify the TupleMoverEvents component and set this value to two days ('2 days'::interval
), the DC_TUPLE_MOVER_EVENTS Data Collector table retains records over the last 48 hours. Older Tuple Mover data is automatically dropped from this table.
Setting a component's policy's INTERVAL_TIME property has no effect on how much data storage the Data Collector retains on disk for that component. Maximum disk storage capacity is determined by the DISK_SIZE_KB property. Setting the INTERVAL_TIME property only affects how long data is retained by the component's Data Collector table. For details, see Configuring data retention policies.
To disable the INTERVAL_TIME policy property, set this value to a negative integer. Doing so reverts two retention policy properties to their default settings:
-
INTERVAL_SET: false
-
INTERVAL_TIME: 0
With these two properties thus set, the component's Data Collector table retains data on all component events until it reaches its maximum limit, as set by the DISK_SIZE_KB retention policy property.
Privileges
Superuser
Examples
=> SELECT SET_DATA_COLLECTOR_POLICY('ResourceAcquisitions', '1500', '25000');
SET_DATA_COLLECTOR_POLICY
---------------------------
SET
(1 row)
See also
6 - SET_DATA_COLLECTOR_POLICY (using parameters)
Updates selected retention policy properties for a component.
Updates selected retention policy properties for a specified component. SET_DATA_COLLECTOR_POLICY is another version of this function that uses positional arguments instead of named parameters.
Before you change a retention policy, you can view its current settings by querying the DATA_COLLECTOR table or by calling the GET_DATA_COLLECTOR_POLICY function.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
Volatile
Syntax
SET_DATA_COLLECTOR_POLICY('component' USING PARAMETERS param=value[,...] )
Arguments
component
- The retention policy to update.
Query DATA_COLLECTOR to get a list of components:
=> SELECT DISTINCT component, description FROM DATA_COLLECTOR
WHERE component ILIKE '%Depot%' ORDER BY component;
component | description
----------------+-------------------------------
DepotEvictions | Files evicted from the Depot
DepotFetches | Files fetched to the Depot
DepotUploads | Files Uploaded from the Depot
(3 rows)
Parameters
memKB
(INTEGER)
- Maximum amount of data, in kilobytes, that is buffered in memory before moving it to disk. The retention policy property MEMORY_BUFFER_SIZE_KB is set from this value. This value must be positive and greater than 0.
Consider setting this parameter to a high value in the following cases:
-
Unusually high levels of data collection. If the value is too low, the Data Collector might be unable to flush buffered data to disk quickly enough to keep up with the activity level, which can lead to loss of in-memory data.
-
Very large data collector records—for example, records with very long query strings. The Data Collector uses double-buffering, so it cannot retain in-memory records that are more than half the size of the memory buffer.
diskKB
(INTEGER)
- Maximum disk space, in kilobytes, allocated for this component's Data Collector table. The retention policy property DISK_SIZE_KB is set from this value. If set to 0, the Data Collector retains only as much component data as it can buffer in memory, as specified by
memory-buffer-size
.
synchronous
(BOOLEAN)
- Whether to ensure that no data is lost by performing synchronous writes. By default, if the Data Collector cannot keep up with the activity level, it can drop data buffered in memory before writing it to disk. If it is important to retain all activity records, set this parameter to true.
retention
(INTERVAL)
How long to retain data in the component's Data Collector table, an INTERVAL. The INTERVAL_TIME retention policy property is set from this value. If the value is positive, it also sets the INTERVAL_SET policy property to true.
For example, if you specify the TupleMoverEvents component and set this value to two days ('2 days'::interval
), the DC_TUPLE_MOVER_EVENTS Data Collector table retains records over the last 48 hours. Older Tuple Mover data is automatically dropped from this table.
Setting a component's policy's INTERVAL_TIME property has no effect on how much data storage the Data Collector retains on disk for that component. Maximum disk storage capacity is determined by the DISK_SIZE_KB property. Setting the INTERVAL_TIME property only affects how long data is retained by the component's Data Collector table. For details, see Configuring data retention policies.
To disable the INTERVAL_TIME policy property, set this value to a negative integer. Doing so reverts two retention policy properties to their default settings:
-
INTERVAL_SET: false
-
INTERVAL_TIME: 0
With these two properties thus set, the component's Data Collector table retains data on all component events until it reaches its maximum limit, as set by the DISK_SIZE_KB retention policy property.
Privileges
Superuser
Examples
=> SELECT SET_DATA_COLLECTOR_POLICY('ResourceAcquisitions'
USING PARAMETERS synchronous=TRUE, memKB=100, diskKB=50000);
set_data_collector_policy
---------------------------
SET
(1 row)
7 - SET_DATA_COLLECTOR_TIME_POLICY
Updates the retention policy property INTERVAL_TIME for the specified component.
Updates the INTERVAL_TIME retention policy property for a specified component or globally. Calling this function has no effect on other properties of the same component.
You can also set the retention policy using SET_DATA_COLLECTOR_POLICY (using parameters).
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
Volatile
Syntax
SET_DATA_COLLECTOR_TIME_POLICY( ['component',] 'interval-time' )
Arguments
component
- Component to update. If not specified, the function updates the retention policy of all Data Collector components.
Query DATA_COLLECTOR to get a list of components:
=> SELECT DISTINCT component, description FROM DATA_COLLECTOR
WHERE component ILIKE '%Depot%' ORDER BY component;
component | description
----------------+-------------------------------
DepotEvictions | Files evicted from the Depot
DepotFetches | Files fetched to the Depot
DepotUploads | Files Uploaded from the Depot
(3 rows)
interval-time
How long to retain data in the component's Data Collector table, an INTERVAL. The INTERVAL_TIME retention policy property is set from this value. If the value is positive, it also sets the INTERVAL_SET policy property to true.
For example, if you specify the TupleMoverEvents component and set this value to two days ('2 days'::interval
), the DC_TUPLE_MOVER_EVENTS Data Collector table retains records over the last 48 hours. Older Tuple Mover data is automatically dropped from this table.
Setting a component's policy's INTERVAL_TIME property has no effect on how much data storage the Data Collector retains on disk for that component. Maximum disk storage capacity is determined by the DISK_SIZE_KB property. Setting the INTERVAL_TIME property only affects how long data is retained by the component's Data Collector table. For details, see Configuring data retention policies.
To disable the INTERVAL_TIME policy property, set this value to a negative integer. Doing so reverts two retention policy properties to their default settings:
-
INTERVAL_SET: false
-
INTERVAL_TIME: 0
With these two properties thus set, the component's Data Collector table retains data on all component events until it reaches its maximum limit, as set by the DISK_SIZE_KB retention policy property.
Privileges
Superuser
Examples
The following example sets a retention time for a single component. Other components are unaffected:
=> SELECT SET_DATA_COLLECTOR_TIME_POLICY('TupleMoverEvents ', '30 minutes'::INTERVAL);
SET_DATA_COLLECTOR_TIME_POLICY
--------------------------------
SET
(1 row)
To set a retention time for all components, omit the first argument:
=> SELECT SET_DATA_COLLECTOR_TIME_POLICY('1 day'::INTERVAL);
SET_DATA_COLLECTOR_TIME_POLICY
--------------------------------
SET
(1 row)