Data collector table queries

The Vertica Data Collector extends system table functionality by gathering and retaining information about your database cluster.

The Vertica Data Collector extends system table functionality by gathering and retaining information about your database cluster. The Data Collector makes this information available in system tables.

Vertica Analytic Database stores Data Collection data in the Data Collector directory under the Vertica or catalog path. Use Data Collector information to query the past state of system tables and extract aggregate information.

In general, queries on Data Collector tables are more efficient when they include only the columns that contain the desired data. Queries are also more efficient when they:

Avoiding resegmentation

You can avoid resegmentation when you join the following DC tables on session_id or transaction_id, because all data is local:

  • dc_session_starts

  • dc_session_ends

  • dc_requests_issued

  • dc_requests_completed

Resegmentation is not required when a query includes the node_name column. For example:

=> SELECT dri.transaction_id, dri.request, drc.processed_row_count
    FROM dc_requests_issued dri
    JOIN dc_requests_completed drc
    USING (node_name, session_id, request_id)
    WHERE dri.time between 'April 7,2015'::timestamptz and 'April 8,2015'::timestamptz
    AND drc.time between 'April 7,2015'::timestamptz and 'April 8,2015'::timestamptz;

This query runs efficiently because:

  • The initiator node writes only to dc_requests_issued and dc_requests_completed.

  • Columns session_id and node_name are correlated.

Using time predicates

Use non-volatile functions and TIMESTAMP for the time range predicates. Vertica Analytic Database optimizes SQL performance for DC tables that use the time predicate.

Each DC table has a time column. Use this column to enter the time range as the query predicate.

For example, this query returns data for dates between September 1 and September 10: select * from dc_foo where time > 'Sept 1, 2015::timestamptz and time < 'Sept 10 2015':: timestamptz; You can change the minimum and maximum time values to adjust the time for which you want to retrieve data.

You must use non-volatile functions as time predicates. Volatile functions cause queries to run inefficiently. This example returns all queries that started and ended on April 7, 2015. However, the query runs at less than optimal performance because trunc and timestamp are volatile:

=> SELECT dri.transaction_id, dri.request, drc.processed_row_count
    FROM dc_requests_issued dri
    LEFT JOIN dc_requests_completed drc
    USING (session_id, request_id)
    WHERE trunc(dri.time, ‘DDD’) > 'April 7,2015'::timestamp
    AND trunc(drc.time, ‘DDD’) < 'April 8,2015'::timestamp;