This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Purging deleted data
In Vertica, delete operations do not remove rows from physical storage.
In Vertica, delete operations do not remove rows from physical storage. DELETE marks rows as deleted, as does UPDATE, which combines delete and insert operations. In both cases, Vertica retains discarded rows as historical data, which remains accessible to historical queries until it is purged.
The cost of retaining historical data is twofold:
-
Disk space is allocated to deleted rows and delete markers.
-
Typical (non-historical) queries must read and skip over deleted data, which can impact performance.
A purge operation permanently removes historical data from physical storage and frees disk space for reuse. Only historical data that precedes the Ancient History Mark (AHM) is eligible to be purged.
You can purge data in two ways:
In both cases, Vertica purges all historical data up to and including the AHM epoch and resets the AHM. See Epochs for additional information about how Vertica uses epochs.
Caution
Large delete and purge operations can take a long time to complete, so use them sparingly. If your application requires deleting data on a regular basis, such as by month or year, consider designing tables that take advantage of
table partitioning. If partitioning is not suitable, consider
rebuilding the table.
1 - Setting a purge policy
The preferred method for purging data is to establish a policy that determines which deleted data is eligible to be purged.
The preferred method for purging data is to establish a policy that determines which deleted data is eligible to be purged. Eligible data is automatically purged when the Tuple Mover performs mergeout operations.
Vertica provides two methods for determining when deleted data is eligible to be purged:
Specifying the time for which delete data is saved
Specifying the time for which delete data is saved is the preferred method for determining which deleted data can be purged. By default, Vertica saves historical data only when nodes are down.
To change the specified time for saving deleted data, use the HistoryRetentionTime
configuration parameter:
=> ALTER DATABASE DEFAULT SET HistoryRetentionTime = {seconds | -1};
In the above syntax:
-
seconds is the amount of time (in seconds) for which to save deleted data.
-
-1 indicates that you do not want to use the HistoryRetentionTime
configuration parameter to determine which deleted data is eligible to be purged. Use this setting if you prefer to use the other method (HistoryRetentionEpochs
) for determining which deleted data can be purged.
The following example sets the history epoch retention level to 240 seconds:
=> ALTER DATABASE DEFAULT SET HistoryRetentionTime = 240;
Specifying the number of epochs that are saved
Unless you have a reason to limit the number of epochs, Vertica recommends that you specify the time over which delete data is saved.
To specify the number of historical epoch to save through the HistoryRetentionEpochs
configuration parameter:
-
Turn off the HistoryRetentionTime
configuration parameter:
=> ALTER DATABASE DEFAULT SET HistoryRetentionTime = -1;
-
Set the history epoch retention level through the HistoryRetentionEpochs
configuration parameter:
=> ALTER DATABASE DEFAULT SET HistoryRetentionEpochs = {num_epochs | -1};
-
num_epochs is the number of historical epochs to save.
-
-1 indicates that you do not want to use the HistoryRetentionEpochs
configuration parameter to trim historical epochs from the epoch map. By default, HistoryRetentionEpochs
is set to -1.
The following example sets the number of historical epochs to save to 40:
=> ALTER DATABASE DEFAULT SET HistoryRetentionEpochs = 40;
Modifications are immediately implemented across all nodes within the database cluster. You do not need to restart the database.
Note
If both HistoryRetentionTime
and HistoryRetentionEpochs
are specified, HistoryRetentionTime
takes precedence.
See Epoch management parameters for additional details. See Epochs for information about how Vertica uses epochs.
Disabling purge
If you want to preserve all historical data, set the value of both historical epoch retention parameters to -1, as follows:
=> ALTER DABABASE mydb SET HistoryRetentionTime = -1;
=> ALTER DATABASE DEFAULT SET HistoryRetentionEpochs = -1;
2 - Manually purging data
You manually purge deleted data as follows:.
You manually purge deleted data as follows:
-
Set the cut-off date for purging deleted data. First, call one of the following functions to verify the current ancient history mark (AHM):
-
Set the AHM to the desired cut-off date with one of the following functions:
If you call SET_AHM_TIME
, keep in mind that the timestamp you specify is mapped to an epoch, which by default has a three-minute granularity. Thus, if you specify an AHM time of 2008-01-01 00:00:00.00
, Vertica might purge data from the first three minutes of 2008, or retain data from last three minutes of 2007.
Note
You cannot advance the AHM beyond a point where Vertica is unable to recover data for a down node.
-
Purge deleted data from the desired projections with one of the following functions:
The tuple mover performs a mergeout operation to purge the data. Vertica periodically invokes the tuple mover to perform mergeout operations, as configured by tuple mover parameters. You can manually invoke the tuple mover by calling the function
DO_TM_TASK
.
Caution
Manual purge operations can take a long time.
See Epochs for additional information about how Vertica uses epochs.