Data collector table queries
The Vertica Data Collector extends system table functionality by gathering and retaining information about your database cluster. The Data Collector makes this information available in system tables.
Vertica Analytic Database stores Data Collection data in the Data Collector directory under the Vertica or catalog path. Use Data Collector information to query the past state of system tables and extract aggregate information.
In general, queries on Data Collector tables are more efficient when they include only the columns that contain the desired data. Queries are also more efficient when they:
Avoiding resegmentation
You can avoid resegmentation when you join the following DC tables on session_id
or transaction_id
, because all data is local:
-
dc_session_starts
-
dc_session_ends
-
dc_requests_issued
-
dc_requests_completed
Resegmentation is not required when a query includes the node_name
column. For example:
=> SELECT dri.transaction_id, dri.request, drc.processed_row_count
FROM dc_requests_issued dri
JOIN dc_requests_completed drc
USING (node_name, session_id, request_id)
WHERE dri.time between 'April 7,2015'::timestamptz and 'April 8,2015'::timestamptz
AND drc.time between 'April 7,2015'::timestamptz and 'April 8,2015'::timestamptz;
This query runs efficiently because:
-
The initiator node writes only to
dc_requests_issued
anddc_requests_completed
. -
Columns
session_id
andnode_name
are correlated.
Using time predicates
Use non-volatile functions and
TIMESTAMP
for the time range predicates. Vertica Analytic Database optimizes SQL performance for DC tables that use the time predicate.
Each DC table has a time
column. Use this column to enter the time range as the query predicate.
For example, this query returns data for dates between September 1 and September 10:
select * from dc_foo where time > 'Sept 1, 2015::timestamptz and time < 'Sept 10 2015':: timestamptz;
You can change the minimum and maximum time values to adjust the time for which you want to retrieve data.
You must use non-volatile functions as time predicates. Volatile functions cause queries to run inefficiently. This example returns all queries that started and ended on April 7, 2015. However, the query runs at less than optimal performance because trunc
and timestamp
are volatile:
=> SELECT dri.transaction_id, dri.request, drc.processed_row_count
FROM dc_requests_issued dri
LEFT JOIN dc_requests_completed drc
USING (session_id, request_id)
WHERE trunc(dri.time, ‘DDD’) > 'April 7,2015'::timestamp
AND trunc(drc.time, ‘DDD’) < 'April 8,2015'::timestamp;