Configuring data retention policies

maintains retention policies for each Vertica component that it monitors—for example, TupleMoverEvents, or DepotEvictions.

Data collector maintains retention policies for each Vertica component that it monitors—for example, TupleMoverEvents or DepotEvictions. You can identify monitored components by querying the DATA_COLLECTOR system table. For example, the following query returns partition activity components:

=> SELECT DISTINCT component FROM DATA_COLLECTOR 
   WHERE component ILIKE '%partition%';
      component
----------------------
 HiveCustomPartitions
 CopyPartitions
 MovePartitions
 SwapPartitions
(4 rows)

Each component has its own retention policy, which consists of several properties:

  • MEMORY_BUFFER_SIZE_KB: the maximum amount of collected data that the Data Collector buffers in memory before moving it to disk.

  • DISK_SIZE_KB: the maximum disk space allocated for the component's Data Collector table.

  • INTERVAL_TIME: how long data of a given component is retained in the component's Data Collector table (INTERVAL data type).

Vertica sets default values on all properties, which you can modify with the SET_DATA_COLLECTOR_POLICY (using parameters) function or, alternatively, SET_DATA_COLLECTOR_POLICY and SET_DATA_COLLECTOR_TIME_POLICY.

You can view retention policy settings with GET_DATA_COLLECTOR_POLICY. For example, the following statement returns the retention policy for the TupleMoverEvents component:

=> SELECT GET_DATA_COLLECTOR_POLICY('TupleMoverEvents');
                          GET_DATA_COLLECTOR_POLICY
-----------------------------------------------------------------------------
 1000KB kept in memory, 15000KB kept on disk. Time based retention disabled.
(1 row)

Setting retention memory and disk storage

The MEMORY_BUFFER_SIZE_KB and DISK_SIZE_KB properties combine to determine how much collected data is available at any given time. If MEMORY_BUFFER_SIZE_KB is set to 0, the Data Collector does not retain any data for the component either in memory or on disk. If DISK_SIZE_KB is set to 0, then the Data Collector retains only as much component data as it can buffer, as set by MEMORY_BUFFER_SIZE_KB.

For example, the following statement changes memory and disk setting for the ResourceAcquisitions component from its default setting of 1,000 KB memory and 10,000 KB disk space to 1500 KB and 25000 KB, respectively:

=> SELECT SET_DATA_COLLECTOR_POLICY('ResourceAcquisitions', '1500', '25000');
 SET_DATA_COLLECTOR_POLICY
---------------------------
 SET
(1 row)

Consider setting MEMORY_BUFFER_SIZE_KB to a high value in the following cases:

  • Unusually high levels of data collection. If MEMORY_BUFFER_SIZE_KB is set too low, the Data Collector might be unable to flush buffered data to disk quickly enough to keep up with the activity level, which can lead to loss of in-memory data.

  • Very large data collector records—for example, records with very long query strings. The Data Collector uses double-buffering, so it cannot retain in-memory records that are more than half the size of the memory buffer.

Setting time-based retention

By default, all data collected for a given component remains on disk and is accessible in the component's Data Collector table, up to the disk storage limit of that component's retention policy. You can use SET_DATA_COLLECTOR_POLICY to limit how long data is retained in a component's Data Collector table. The following example sets the INTERVAL_TIME property for the TupleMoverEvents component:

=> SELECT SET_DATA_COLLECTOR_TIME_POLICY('TupleMoverEvents ', '30 minutes'::INTERVAL);
SET_DATA_COLLECTOR_TIME_POLICY
--------------------------------
SET
(1 row)

After this call, the corresponding Data Collector table, DC_TUPLE_MOVER_EVENTS, only retains records of activity that occurred in the last 30 minutes. Older data is automatically dropped from this table. For example, after the previous call to SET_DATA_COLLECTOR_TIME_POLICY, the table contains on 30 minutes' worth of data:

=> SELECT CURRENT_TIMESTAMP(0) - '30 minutes'::INTERVAL AS '30 minutes ago';
   30 minutes ago
---------------------
 2020-08-13 07:58:21
(1 row)

=> SELECT time, node_name, session_id, user_name, transaction_id, operation 
   FROM DC_TUPLE_MOVER_EVENTS WHERE node_name='v_vmart_node0001' 
   ORDER BY transaction_id;
             time              |    node_name     |           session_id            | user_name |  transaction_id   | operation
-------------------------------+------------------+---------------------------------+-----------+-------------------+-----------
 2020-08-13 08:16:54.360597-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807826 | Mergeout
 2020-08-13 08:16:54.397346-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807826 | Mergeout
 2020-08-13 08:16:54.424002-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807826 | Mergeout
 2020-08-13 08:16:54.425989-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807829 | Mergeout
 2020-08-13 08:16:54.456829-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807829 | Mergeout
 2020-08-13 08:16:54.485097-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807829 | Mergeout
 2020-08-13 08:19:45.8045-04   | v_vmart_node0001 | v_vmart_node0001-190508:0x37b08 | dbadmin   | 45035996273807855 | Mergeout
 2020-08-13 08:19:45.742-04    | v_vmart_node0001 | v_vmart_node0001-190508:0x37b08 | dbadmin   | 45035996273807855 | Mergeout
 2020-08-13 08:19:45.684764-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x37b08 | dbadmin   | 45035996273807855 | Mergeout
 2020-08-13 08:19:45.799796-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807865 | Mergeout
 2020-08-13 08:19:45.768856-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807865 | Mergeout
 2020-08-13 08:19:45.715424-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807865 | Mergeout
 2020-08-13 08:25:20.465604-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.497266-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.518839-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.52099-04  | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
 2020-08-13 08:25:20.549075-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
 2020-08-13 08:25:20.569072-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
(18 rows)

After 25 minutes elapse, 12 of these records age out of the 30-minute interval and are dropped:

=> SELECT CURRENT_TIMESTAMP(0) - '30 minutes'::INTERVAL AS '30 minutes ago';
   30 minutes ago
---------------------
 2020-08-13 08:23:33
(1 row)

=> SELECT time, node_name, session_id, user_name, transaction_id, operation 
   FROM DC_TUPLE_MOVER_EVENTS WHERE node_name='v_vmart_node0001' 
   ORDER BY transaction_id;
             time              |    node_name     |           session_id            | user_name |  transaction_id   | operation
-------------------------------+------------------+---------------------------------+-----------+-------------------+-----------
 2020-08-13 08:25:20.465604-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.497266-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.518839-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807890 | Mergeout
 2020-08-13 08:25:20.52099-04  | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
 2020-08-13 08:25:20.549075-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
 2020-08-13 08:25:20.569072-04 | v_vmart_node0001 | v_vmart_node0001-190508:0x375db | dbadmin   | 45035996273807893 | Mergeout
(6 rows)

You can use SET_DATA_COLLECTOR_TIME_POLICY to update INTERVAL_TIME for all components by omitting the component argument:

=> SELECT SET_DATA_COLLECTOR_TIME_POLICY('1 day'::INTERVAL);
SET_DATA_COLLECTOR_TIME_POLICY
--------------------------------
SET
(1 row)

=> SELECT DISTINCT component, INTERVAL_SET, INTERVAL_TIME 
   FROM DATA_COLLECTOR WHERE component ILIKE '%partition%';
      component       | INTERVAL_SET | INTERVAL_TIME
----------------------+--------------+---------------
 HiveCustomPartitions | t            | 1
 MovePartitions       | t            | 1
 CopyPartitions       | t            | 1
 SwapPartitions       | t            | 1
(4 rows)

To clear the INTERVAL_TIME policy property, call SET_DATA_COLLECTOR_TIME_POLICY with a negative integer argument:

=> SELECT SET_DATA_COLLECTOR_TIME_POLICY('-1');
 SET_DATA_COLLECTOR_TIME_POLICY
--------------------------------
 SET
(1 row)

=> SELECT DISTINCT component, INTERVAL_SET, INTERVAL_TIME 
   FROM DATA_COLLECTOR WHERE component ILIKE '%partition%';
      component       | INTERVAL_SET | INTERVAL_TIME
----------------------+--------------+---------------
 MovePartitions       | f            | 0
 SwapPartitions       | f            | 0
 HiveCustomPartitions | f            | 0
 CopyPartitions       | f            | 0
(4 rows)

Setting INTERVAL_TIME on a retention policy also sets its INTERVAL_SET property to true.