This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Data Collector functions

The Vertica Data Collector is a utility that extends system table functionality by providing a framework for recording events.

The Vertica Data Collector is a utility that extends system table functionality by providing a framework for recording events. It gathers and retains monitoring information about your database cluster and makes that information available in system tables with negligble performance impact.

Collected data is stored on disk in the DataCollector directory under the Vertica /catalog path. You can use the information the Data Collector retains in the following ways:

  • Query the past state of system tables and extract aggregate information

  • See what actions users have taken

  • Locate performance bottlenecks

  • Identify potential improvements to Vertica configuration

Data Collector works in conjunction with an advisor tool called Workload Analyzer, which intelligently monitors the performance of SQL queries and workloads and recommends tuning actions based on observations of the actual workload history.

By default, Data Collector is enabled and retains information for all sessions. If performance issues arise, a superuser can disable Data Collector by setting the EnableDataCollector configuration parameter to 0.

1 - CLEAR_DATA_COLLECTOR

Clears all memory and disk records from Data Collector tables and logs, and resets collection statistics in system table DATA_COLLECTOR.

Clears all memory and disk records from Data Collector tables and logs, and resets collection statistics in the DATA_COLLECTOR system table.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

CLEAR_DATA_COLLECTOR( [ 'component' ] )

Arguments

component
Component to clear. If not specified, the function clears memory and disk records for all components.

Query DATA_COLLECTOR to get a list of components:

=> SELECT DISTINCT component, description FROM DATA_COLLECTOR 
   WHERE component ILIKE '%Depot%' ORDER BY component;
   component    |          description
----------------+-------------------------------
 DepotEvictions | Files evicted from the Depot
 DepotFetches   | Files fetched to the Depot
 DepotUploads   | Files Uploaded from the Depot
(3 rows)

Privileges

Superuser

Examples

By default, the function clears data collection for all components:

=> SELECT CLEAR_DATA_COLLECTOR();
 CLEAR_DATA_COLLECTOR
----------------------
 CLEAR
(1 row)

To clear memory and disk records for only one component, specify the component:

=> SELECT CLEAR_DATA_COLLECTOR('ResourceAcquisitions');
 CLEAR_DATA_COLLECTOR
----------------------
 CLEAR
(1 row)

See also

Data Collector utility

2 - DATA_COLLECTOR_HELP

Returns online usage instructions about the Data Collector, the V_MONITOR.DATA_COLLECTOR system table, and the Data Collector control functions.

Returns online usage instructions about the Data Collector, the DATA_COLLECTOR system table, and the Data Collector control functions.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

DATA_COLLECTOR_HELP()

Privileges

None

Returns

The function returns information like the following (exact contents might differ from this example):

=> SELECT DATA_COLLECTOR_HELP();

-----------------------------------------------------------------------------
Usage Data Collector
The data collector retains history of important system activities.
   This data can be used as a reference of what actions have been taken
      by users, but it can also be used to locate performance bottlenecks,
      or identify potential improvements to the Vertica configuration.
   This data is queryable via Vertica system tables.
Acccess a list of data collector components, and some statistics, by running:
   SELECT * FROM v_monitor.data_collector;

The amount of data retained by size and time can be controlled with several
functions.
   To just set the size amount:
      set_data_collector_policy(<component>,
                                <memory retention (KB)>,
                                <disk retention (KB)>);

   To set both the size and time amounts (the smaller one will dominate):
      set_data_collector_policy(<component>,
                                <memory retention (KB)>,
                                <disk retention (KB)>,
                                <interval>);

   To set just the time amount:
      set_data_collector_time_policy(<component>,
                                     <interval>);

   To set the time amount for all tables:
      set_data_collector_time_policy(<interval>);

The current retention policy for a component can be queried with:
   get_data_collector_policy(<component>);

Data on disk is kept in the "DataCollector" directory under the Vertica
\catalog path. This directory also contains instructions on how to load
the monitoring data into another Vertica database.

To move the data collector logs and instructions to other storage locations,
create labeled storage locations using add_location and then use:

   set_data_collector_storage_location(<storage_label>);

Additional commands can be used to configure the data collection logs.
The log can be cleared with:
clear_data_collector([<optional component>]);
The log can be synchronized with the disk storage using:
flush_data_collector([<optional component>]);

See also

3 - FLUSH_DATA_COLLECTOR

Waits until memory logs are moved to disk and then flushes the Data Collector, synchronizing the log with disk storage.

Waits until memory logs are moved to disk and then flushes the Data Collector, synchronizing the log with disk storage.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

FLUSH_DATA_COLLECTOR( [ 'component' ] )

Arguments

component
Component to flush. If not specified, the function flushes data for all components.

Query DATA_COLLECTOR to get a list of components:

=> SELECT DISTINCT component, description FROM DATA_COLLECTOR 
   WHERE component ILIKE '%Depot%' ORDER BY component;
   component    |          description
----------------+-------------------------------
 DepotEvictions | Files evicted from the Depot
 DepotFetches   | Files fetched to the Depot
 DepotUploads   | Files Uploaded from the Depot
(3 rows)

Privileges

Superuser

Examples

By default, the function flushes the Data Collector for all components:

=> SELECT FLUSH_DATA_COLLECTOR();
 FLUSH_DATA_COLLECTOR
----------------------
 FLUSH
(1 row)

To flush only one component, specify it:

=> SELECT FLUSH_DATA_COLLECTOR('ResourceAcquisitions');
 FLUSH_DATA_COLLECTOR
----------------------
 FLUSH
(1 row)

See also

Data Collector utility

4 - GET_DATA_COLLECTOR_POLICY

Retrieves a brief statement about the retention policy for the specified component.

Returns a brief statement about the retention policy for the specified component.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

GET_DATA_COLLECTOR_POLICY( 'component' )

Arguments

component
Component to query.

Query DATA_COLLECTOR to get a list of components:

=> SELECT DISTINCT component, description FROM DATA_COLLECTOR 
   WHERE component ILIKE '%Depot%' ORDER BY component;
   component    |          description
----------------+-------------------------------
 DepotEvictions | Files evicted from the Depot
 DepotFetches   | Files fetched to the Depot
 DepotUploads   | Files Uploaded from the Depot
(3 rows)

Privileges

None

Examples

=> SELECT GET_DATA_COLLECTOR_POLICY('ResourceAcquisitions');
          GET_DATA_COLLECTOR_POLICY
----------------------------------------------
 1000KB kept in memory, 10000KB kept on disk. Synchronous logging disabled. Time based retention disabled.
(1 row)

See also

5 - SET_DATA_COLLECTOR_POLICY

Updates the following retention policy properties for the specified component:.

Updates selected retention policy properties for a specified component. SET_DATA_COLLECTOR_POLICY (using parameters) is another version of this function that uses named parameters instead of positional arguments.

Before you change a retention policy, you can view its current settings by querying the DATA_COLLECTOR table or by calling the GET_DATA_COLLECTOR_POLICY function.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

SET_DATA_COLLECTOR_POLICY('component', 'memory-buffer-size', 'disk-size' [,'interval-time']  )

Arguments

component
The retention policy to update.

Query DATA_COLLECTOR to get a list of components:

=> SELECT DISTINCT component, description FROM DATA_COLLECTOR 
   WHERE component ILIKE '%Depot%' ORDER BY component;
   component    |          description
----------------+-------------------------------
 DepotEvictions | Files evicted from the Depot
 DepotFetches   | Files fetched to the Depot
 DepotUploads   | Files Uploaded from the Depot
(3 rows)
memory-buffer-size
Maximum amount of data, in kilobytes, that is buffered in memory before moving it to disk. The retention policy property MEMORY_BUFFER_SIZE_KB is set from this value. This value must be positive and greater than 0.

Consider setting this parameter to a high value in the following cases:

  • Unusually high levels of data collection. If the value is too low, the Data Collector might be unable to flush buffered data to disk quickly enough to keep up with the activity level, which can lead to loss of in-memory data.

  • Very large data collector records—for example, records with very long query strings. The Data Collector uses double-buffering, so it cannot retain in-memory records that are more than half the size of the memory buffer.

disk-size
Maximum disk space, in kilobytes, allocated for this component's Data Collector table. The retention policy property DISK_SIZE_KB is set from this value. If set to 0, the Data Collector retains only as much component data as it can buffer in memory, as specified by memory-buffer-size.
interval-time

How long to retain data in the component's Data Collector table, an INTERVAL. The INTERVAL_TIME retention policy property is set from this value. If the value is positive, it also sets the INTERVAL_SET policy property to true.

For example, if you specify the TupleMoverEvents component and set this value to two days ('2 days'::interval), the DC_TUPLE_MOVER_EVENTS Data Collector table retains records over the last 48 hours. Older Tuple Mover data is automatically dropped from this table.

Setting a component's policy's INTERVAL_TIME property has no effect on how much data storage the Data Collector retains on disk for that component. Maximum disk storage capacity is determined by the DISK_SIZE_KB property. Setting the INTERVAL_TIME property only affects how long data is retained by the component's Data Collector table. For details, see Configuring data retention policies.

To disable the INTERVAL_TIME policy property, set this value to a negative integer. Doing so reverts two retention policy properties to their default settings:

  • INTERVAL_SET: false

  • INTERVAL_TIME: 0

With these two properties thus set, the component's Data Collector table retains data on all component events until it reaches its maximum limit, as set by the DISK_SIZE_KB retention policy property.

Privileges

Superuser

Examples

=> SELECT SET_DATA_COLLECTOR_POLICY('ResourceAcquisitions', '1500', '25000');
 SET_DATA_COLLECTOR_POLICY
---------------------------
 SET
(1 row)

See also

6 - SET_DATA_COLLECTOR_POLICY (using parameters)

Updates selected retention policy properties for a component.

Updates selected retention policy properties for a specified component. SET_DATA_COLLECTOR_POLICY is another version of this function that uses positional arguments instead of named parameters.

Before you change a retention policy, you can view its current settings by querying the DATA_COLLECTOR table or by calling the GET_DATA_COLLECTOR_POLICY function.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

SET_DATA_COLLECTOR_POLICY('component' USING PARAMETERS param=value[,...] )

Arguments

component
The retention policy to update.

Query DATA_COLLECTOR to get a list of components:

=> SELECT DISTINCT component, description FROM DATA_COLLECTOR 
   WHERE component ILIKE '%Depot%' ORDER BY component;
   component    |          description
----------------+-------------------------------
 DepotEvictions | Files evicted from the Depot
 DepotFetches   | Files fetched to the Depot
 DepotUploads   | Files Uploaded from the Depot
(3 rows)

Parameters

memKB (INTEGER)
Maximum amount of data, in kilobytes, that is buffered in memory before moving it to disk. The retention policy property MEMORY_BUFFER_SIZE_KB is set from this value. This value must be positive and greater than 0.

Consider setting this parameter to a high value in the following cases:

  • Unusually high levels of data collection. If the value is too low, the Data Collector might be unable to flush buffered data to disk quickly enough to keep up with the activity level, which can lead to loss of in-memory data.

  • Very large data collector records—for example, records with very long query strings. The Data Collector uses double-buffering, so it cannot retain in-memory records that are more than half the size of the memory buffer.

diskKB (INTEGER)
Maximum disk space, in kilobytes, allocated for this component's Data Collector table. The retention policy property DISK_SIZE_KB is set from this value. If set to 0, the Data Collector retains only as much component data as it can buffer in memory, as specified by memory-buffer-size.
synchronous (BOOLEAN)
Whether to ensure that no data is lost by performing synchronous writes. By default, if the Data Collector cannot keep up with the activity level, it can drop data buffered in memory before writing it to disk. If it is important to retain all activity records, set this parameter to true.
retention (INTERVAL)

How long to retain data in the component's Data Collector table, an INTERVAL. The INTERVAL_TIME retention policy property is set from this value. If the value is positive, it also sets the INTERVAL_SET policy property to true.

For example, if you specify the TupleMoverEvents component and set this value to two days ('2 days'::interval), the DC_TUPLE_MOVER_EVENTS Data Collector table retains records over the last 48 hours. Older Tuple Mover data is automatically dropped from this table.

Setting a component's policy's INTERVAL_TIME property has no effect on how much data storage the Data Collector retains on disk for that component. Maximum disk storage capacity is determined by the DISK_SIZE_KB property. Setting the INTERVAL_TIME property only affects how long data is retained by the component's Data Collector table. For details, see Configuring data retention policies.

To disable the INTERVAL_TIME policy property, set this value to a negative integer. Doing so reverts two retention policy properties to their default settings:

  • INTERVAL_SET: false

  • INTERVAL_TIME: 0

With these two properties thus set, the component's Data Collector table retains data on all component events until it reaches its maximum limit, as set by the DISK_SIZE_KB retention policy property.

Privileges

Superuser

Examples

=> SELECT SET_DATA_COLLECTOR_POLICY('ResourceAcquisitions' 
   USING PARAMETERS synchronous=TRUE, memKB=100, diskKB=50000);
 set_data_collector_policy 
---------------------------
 SET
(1 row)

7 - SET_DATA_COLLECTOR_TIME_POLICY

Updates the retention policy property INTERVAL_TIME for the specified component.

Updates the INTERVAL_TIME retention policy property for a specified component or globally. Calling this function has no effect on other properties of the same component.

You can also set the retention policy using SET_DATA_COLLECTOR_POLICY (using parameters).

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

SET_DATA_COLLECTOR_TIME_POLICY( ['component',] 'interval-time' )

Arguments

component
Component to update. If not specified, the function updates the retention policy of all Data Collector components.

Query DATA_COLLECTOR to get a list of components:

=> SELECT DISTINCT component, description FROM DATA_COLLECTOR 
   WHERE component ILIKE '%Depot%' ORDER BY component;
   component    |          description
----------------+-------------------------------
 DepotEvictions | Files evicted from the Depot
 DepotFetches   | Files fetched to the Depot
 DepotUploads   | Files Uploaded from the Depot
(3 rows)
interval-time

How long to retain data in the component's Data Collector table, an INTERVAL. The INTERVAL_TIME retention policy property is set from this value. If the value is positive, it also sets the INTERVAL_SET policy property to true.

For example, if you specify the TupleMoverEvents component and set this value to two days ('2 days'::interval), the DC_TUPLE_MOVER_EVENTS Data Collector table retains records over the last 48 hours. Older Tuple Mover data is automatically dropped from this table.

Setting a component's policy's INTERVAL_TIME property has no effect on how much data storage the Data Collector retains on disk for that component. Maximum disk storage capacity is determined by the DISK_SIZE_KB property. Setting the INTERVAL_TIME property only affects how long data is retained by the component's Data Collector table. For details, see Configuring data retention policies.

To disable the INTERVAL_TIME policy property, set this value to a negative integer. Doing so reverts two retention policy properties to their default settings:

  • INTERVAL_SET: false

  • INTERVAL_TIME: 0

With these two properties thus set, the component's Data Collector table retains data on all component events until it reaches its maximum limit, as set by the DISK_SIZE_KB retention policy property.

Privileges

Superuser

Examples

The following example sets a retention time for a single component. Other components are unaffected:

=> SELECT SET_DATA_COLLECTOR_TIME_POLICY('TupleMoverEvents ', '30 minutes'::INTERVAL);
SET_DATA_COLLECTOR_TIME_POLICY
--------------------------------
SET
(1 row)

To set a retention time for all components, omit the first argument:

=> SELECT SET_DATA_COLLECTOR_TIME_POLICY('1 day'::INTERVAL);
SET_DATA_COLLECTOR_TIME_POLICY
--------------------------------
SET
(1 row)