Optimizing DELETE and UPDATE

Vertica is optimized for query-intensive workloads, so DELETE and UPDATE queries might not achieve the same level of performance as other queries.

Vertica is optimized for query-intensive workloads, so DELETE and UPDATE queries might not achieve the same level of performance as other queries. A DELETE and UPDATE operation must update all projections, so the operation can only be as fast as the slowest projection.

To improve the performance of DELETE and UPDATE queries, consider the following issues:

  • Query performance after large DELETE operations: Vertica's implementation of DELETE differs from traditional databases: it does not delete data from disk storage; rather, it marks rows as deleted so they are available for historical queries. Deletion of 10% or more of the total rows in a table can adversely affect queries on that table. In that case, consider purging those rows to improve performance.
  • Recovery performance: Recovery is the action required for a cluster to restore K-safety after a crash. Large numbers of deleted records can degrade the performance of a recovery. To improve recovery performance, purge the deleted rows.
  • Concurrency: DELETE and UPDATE take exclusive locks on the table. Only one DELETE or UPDATE transaction on a table can be in progress at a time and only when no load operations are in progress. Delete and update operations on different tables can run concurrently.

Projection column requirements for optimized delete

A projection is optimized for delete and update operations if it contains all columns required by the query predicate. In general, DML operations are significantly faster when performed on optimized projections than on non-optimized projections.

For example, consider the following table and projections:

=> CREATE TABLE t (a INTEGER, b INTEGER, c INTEGER);
=> CREATE PROJECTION p1 (a, b, c) AS SELECT * FROM t ORDER BY a;
=> CREATE PROJECTION p2 (a, c) AS SELECT a, c FROM t ORDER BY c, a;

In the following query, both p1 and p2 are eligible for DELETE and UPDATE optimization because column a is available:

=> DELETE from t WHERE a = 1;

In the following example, only projection p1 is eligible for DELETE and UPDATE optimization because the b column is not available in p2:

=> DELETE from t WHERE b = 1;

Optimized DELETE in subqueries

To be eligible for DELETE optimization, all target table columns referenced in a DELETE or UPDATE statement's WHERE clause must be in the projection definition.

For example, the following simple schema has two tables and three projections:

=> CREATE TABLE tb1 (a INT, b INT, c INT, d INT);
=> CREATE TABLE tb2 (g INT, h INT, i INT, j INT);

The first projection references all columns in tb1 and sorts on column a:

=> CREATE PROJECTION tb1_p AS SELECT a, b, c, d FROM tb1 ORDER BY a;

The buddy projection references and sorts on column a in tb1:

=> CREATE PROJECTION tb1_p_2 AS SELECT a FROM tb1 ORDER BY a;

This projection references all columns in tb2 and sorts on column i:

=> CREATE PROJECTION tb2_p AS SELECT g, h, i, j FROM tb2 ORDER BY i;

Consider the following DML statement, which references tb1.a in its WHERE clause. Since both projections on tb1 contain column a, both are eligible for the optimized DELETE:

=> DELETE FROM tb1 WHERE tb1.a IN (SELECT tb2.i FROM tb2);

Restrictions

Optimized DELETE operations are not supported under the following conditions:

  • With replicated projections if subqueries reference the target table. For example, the following syntax is not supported:

    => DELETE FROM tb1 WHERE tb1.a IN (SELECT e FROM tb2, tb2 WHERE tb2.e = tb1.e);
    
  • With subqueries that do not return multiple rows. For example, the following syntax is not supported:

    => DELETE FROM tb1 WHERE tb1.a = (SELECT k from tb2);
    

Projection sort order for optimizing DELETE

Design your projections so that frequently-used DELETE or UPDATE predicate columns appear in the sort order of all projections for large DELETE and UPDATE operations.

For example, suppose most of the DELETE queries you perform on a projection look like the following:

=> DELETE from t where time_key < '1-1-2007'

To optimize the delete operations, make time_key appear in the ORDER BY clause of all projections. This schema design results in better performance of the delete operation.

In addition, add sort columns to the sort order such that each combination of the sort key values uniquely identifies a row or a small set of rows. For more information, see Choosing sort order: best practices. To analyze projections for sort order issues, use the EVALUATE_DELETE_PERFORMANCE function.