This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Removing table data

Vertica provides several ways to remove data from a table:.

Vertica provides several ways to remove data from a table:

Delete operation Description
Drop a table Permanently remove a table and its definition, optionally remove associated views and projections.
Delete table rows Mark rows with delete vectors and store them so data can be rolled back to a previous epoch. The data must be purged to reclaim disk space.
Truncate table data Remove all storage and history associated with a table. The table structure is preserved for future use.
Purge data Permanently remove historical data from physical storage and free disk space for reuse.
Drop partitions Remove one more partitions from a table. Each partition contains a related subset of data in the table. Dropping partitioned data is efficient, and provides query performance benefits.

1 - Data removal operations compared

/need to include purge operations? or is that folded into DELETE operations?/.

The following table summarizes differences between various data removal operations.

Operations and options Performance Auto commits Saves history
DELETE FROM ``table Normal No Yes
DELETE FROM ``temp-table High No No
DELETE FROM table where-clause Normal No Yes
DELETE FROM temp-table where-clause Normal No Yes
DELETE FROM temp-table where-clause 
ON COMMIT PRESERVE ROWS
Normal No Yes
DELETE FROM temp-table where-clause 
ON COMMIT DELETE ROWS
High Yes No
DROP table High Yes No
TRUNCATE table High Yes No
TRUNCATE temp-table High Yes No
SELECT DROP_PARTITIONS (...) High Yes No

Choosing the best operation

The following table can help you decide which operation is best for removing table data:

If you want to... Use...
Delete both table data and definitions and start from scratch. DROP TABLE
Quickly drop data while preserving table definitions, and reload data. TRUNCATE TABLE
Regularly perform bulk delete operations on logical sets of data. DROP_PARTITIONS
Occasionally perform small deletes with the option to roll back or review history. DELETE

2 - Optimizing DELETE and UPDATE

Vertica is optimized for query-intensive workloads, so DELETE and UPDATE queries might not achieve the same level of performance as other queries.

Vertica is optimized for query-intensive workloads, so DELETE and UPDATE queries might not achieve the same level of performance as other queries. A DELETE and UPDATE operation must update all projections, so the operation can only be as fast as the slowest projection.

To improve the performance of DELETE and UPDATE queries, consider the following issues:

  • Query performance after large DELETE operations: Vertica's implementation of DELETE differs from traditional databases: it does not delete data from disk storage; rather, it marks rows as deleted so they are available for historical queries. Deletion of 10% or more of the total rows in a table can adversely affect queries on that table. In that case, consider purging those rows to improve performance.
  • Recovery performance: Recovery is the action required for a cluster to restore K-safety after a crash. Large numbers of deleted records can degrade the performance of a recovery. To improve recovery performance, purge the deleted rows.
  • Concurrency: DELETE and UPDATE take exclusive locks on the table. Only one DELETE or UPDATE transaction on a table can be in progress at a time and only when no load operations are in progress. Delete and update operations on different tables can run concurrently.

Projection column requirements for optimized delete

A projection is optimized for delete and update operations if it contains all columns required by the query predicate. In general, DML operations are significantly faster when performed on optimized projections than on non-optimized projections.

For example, consider the following table and projections:

=> CREATE TABLE t (a INTEGER, b INTEGER, c INTEGER);
=> CREATE PROJECTION p1 (a, b, c) AS SELECT * FROM t ORDER BY a;
=> CREATE PROJECTION p2 (a, c) AS SELECT a, c FROM t ORDER BY c, a;

In the following query, both p1 and p2 are eligible for DELETE and UPDATE optimization because column a is available:

=> DELETE from t WHERE a = 1;

In the following example, only projection p1 is eligible for DELETE and UPDATE optimization because the b column is not available in p2:

=> DELETE from t WHERE b = 1;

Optimized DELETE in subqueries

To be eligible for DELETE optimization, all target table columns referenced in a DELETE or UPDATE statement's WHERE clause must be in the projection definition.

For example, the following simple schema has two tables and three projections:

=> CREATE TABLE tb1 (a INT, b INT, c INT, d INT);
=> CREATE TABLE tb2 (g INT, h INT, i INT, j INT);

The first projection references all columns in tb1 and sorts on column a:

=> CREATE PROJECTION tb1_p AS SELECT a, b, c, d FROM tb1 ORDER BY a;

The buddy projection references and sorts on column a in tb1:

=> CREATE PROJECTION tb1_p_2 AS SELECT a FROM tb1 ORDER BY a;

This projection references all columns in tb2 and sorts on column i:

=> CREATE PROJECTION tb2_p AS SELECT g, h, i, j FROM tb2 ORDER BY i;

Consider the following DML statement, which references tb1.a in its WHERE clause. Since both projections on tb1 contain column a, both are eligible for the optimized DELETE:

=> DELETE FROM tb1 WHERE tb1.a IN (SELECT tb2.i FROM tb2);

Restrictions

Optimized DELETE operations are not supported under the following conditions:

  • With replicated projections if subqueries reference the target table. For example, the following syntax is not supported:

    => DELETE FROM tb1 WHERE tb1.a IN (SELECT e FROM tb2, tb2 WHERE tb2.e = tb1.e);
    
  • With subqueries that do not return multiple rows. For example, the following syntax is not supported:

    => DELETE FROM tb1 WHERE tb1.a = (SELECT k from tb2);
    

Projection sort order for optimizing DELETE

Design your projections so that frequently-used DELETE or UPDATE predicate columns appear in the sort order of all projections for large DELETE and UPDATE operations.

For example, suppose most of the DELETE queries you perform on a projection look like the following:

=> DELETE from t where time_key < '1-1-2007'

To optimize the delete operations, make time_key appear in the ORDER BY clause of all projections. This schema design results in better performance of the delete operation.

In addition, add sort columns to the sort order such that each combination of the sort key values uniquely identifies a row or a small set of rows. For more information, see Choosing sort order: best practices. To analyze projections for sort order issues, use the EVALUATE_DELETE_PERFORMANCE function.

3 - Purging deleted data

In Vertica, delete operations do not remove rows from physical storage.

In Vertica, delete operations do not remove rows from physical storage. DELETE marks rows as deleted, as does UPDATE, which combines delete and insert operations. In both cases, Vertica retains discarded rows as historical data, which remains accessible to historical queries until it is purged.

The cost of retaining historical data is twofold:

  • Disk space is allocated to deleted rows and delete markers.

  • Typical (non-historical) queries must read and skip over deleted data, which can impact performance.

A purge operation permanently removes historical data from physical storage and frees disk space for reuse. Only historical data that precedes the Ancient History Mark (AHM) is eligible to be purged.

You can purge data in two ways:

In both cases, Vertica purges all historical data up to and including the AHM epoch and resets the AHM. See Epochs for additional information about how Vertica uses epochs.

3.1 - Setting a purge policy

The preferred method for purging data is to establish a policy that determines which deleted data is eligible to be purged.

The preferred method for purging data is to establish a policy that determines which deleted data is eligible to be purged. Eligible data is automatically purged when the Tuple Mover performs mergeout operations.

Vertica provides two methods for determining when deleted data is eligible to be purged:

  • Specifying the time for which delete data is saved

  • Specifying the number of epochs that are saved

Specifying the time for which delete data is saved

Specifying the time for which delete data is saved is the preferred method for determining which deleted data can be purged. By default, Vertica saves historical data only when nodes are down.

To change the specified time for saving deleted data, use the HistoryRetentionTime configuration parameter:

=> ALTER DATABASE DEFAULT SET HistoryRetentionTime = {seconds | -1};

In the above syntax:

  • seconds is the amount of time (in seconds) for which to save deleted data.

  • -1 indicates that you do not want to use the HistoryRetentionTime configuration parameter to determine which deleted data is eligible to be purged. Use this setting if you prefer to use the other method (HistoryRetentionEpochs) for determining which deleted data can be purged.

The following example sets the history epoch retention level to 240 seconds:

=> ALTER DATABASE DEFAULT SET HistoryRetentionTime = 240;

Specifying the number of epochs that are saved

Unless you have a reason to limit the number of epochs, Vertica recommends that you specify the time over which delete data is saved.

To specify the number of historical epoch to save through the HistoryRetentionEpochs configuration parameter:

  1. Turn off the HistoryRetentionTime configuration parameter:

    => ALTER DATABASE DEFAULT SET HistoryRetentionTime = -1;
    
  2. Set the history epoch retention level through the HistoryRetentionEpochs configuration parameter:

    => ALTER DATABASE DEFAULT SET HistoryRetentionEpochs = {num_epochs | -1};
    
    • num_epochs is the number of historical epochs to save.

    • -1 indicates that you do not want to use the HistoryRetentionEpochs configuration parameter to trim historical epochs from the epoch map. By default, HistoryRetentionEpochs is set to -1.

The following example sets the number of historical epochs to save to 40:

=> ALTER DATABASE DEFAULT SET HistoryRetentionEpochs = 40;

Modifications are immediately implemented across all nodes within the database cluster. You do not need to restart the database.

See Epoch management parameters for additional details. See Epochs for information about how Vertica uses epochs.

Disabling purge

If you want to preserve all historical data, set the value of both historical epoch retention parameters to -1, as follows:

=> ALTER DABABASE mydb SET HistoryRetentionTime = -1;
=> ALTER DATABASE DEFAULT SET HistoryRetentionEpochs = -1;

3.2 - Manually purging data

You manually purge deleted data as follows:.

You manually purge deleted data as follows:

  1. Set the cut-off date for purging deleted data. First, call one of the following functions to verify the current ancient history mark (AHM):

    • GET_AHM_TIME returns a TIMESTAMP value of the AHM.

    • GET_AHM_EPOCH returns the number of the epoch in which the AHM is located.

  2. Set the AHM to the desired cut-off date with one of the following functions:

    If you call SET_AHM_TIME, keep in mind that the timestamp you specify is mapped to an epoch, which by default has a three-minute granularity. Thus, if you specify an AHM time of 2008-01-01 00:00:00.00, Vertica might purge data from the first three minutes of 2008, or retain data from last three minutes of 2007.

  3. Purge deleted data from the desired projections with one of the following functions:

    The tuple mover performs a mergeout operation to purge the data. Vertica periodically invokes the tuple mover to perform mergeout operations, as configured by tuple mover parameters. You can manually invoke the tuple mover by calling the function DO_TM_TASK.

See Epochs for additional information about how Vertica uses epochs.

4 - Truncating tables

TRUNCATE TABLE removes all storage associated with the target table and its projections.

TRUNCATE TABLE removes all storage associated with the target table and its projections. Vertica preserves the table and the projection definitions. If the truncated table has out-of-date projections, those projections are cleared and marked up-to-date when TRUNCATE TABLE returns.

TRUNCATE TABLE commits the entire transaction after statement execution, even if truncating the table fails. You cannot roll back a TRUNCATE TABLE statement.

Use TRUNCATE TABLE for testing purposes. You can use it to remove all data from a table and load it with fresh data, without recreating the table and its projections.

Table locking

TRUNCATE TABLE takes an O (owner) lock on the table until the truncation process completes. The savepoint is then released.

If the operation cannot obtain an O lock on the target table, Vertica tries to close any internal Tuple Mover sessions that are running on that table. If successful, the operation can proceed. Explicit Tuple Mover operations that are running in user sessions do not close. If an explicit Tuple Mover operation is running on the table, the operation proceeds only when the operation is complete.

Restrictions

You cannot truncate an external table.

Examples

=> INSERT INTO sample_table (a) VALUES (3);
=> SELECT * FROM sample_table;
a
---
3
(1 row)
=> TRUNCATE TABLE sample_table;
TRUNCATE TABLE
=> SELECT * FROM sample_table;
a
---
(0 rows)