This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Epochs

An epoch represents a cutoff point of historical data within the database.

An epoch represents a cutoff point of historical data within the database. The timestamp of all commits within a given epoch are equal to or less than the epoch's timestamp. Understanding epochs is useful when you need to perform the following operations:

  • Database recovery: Vertica uses epochs to determine the last time data was consistent across all nodes in a database cluster.

  • Execute historical queries: A SELECT statement that includes an AT epoch clause only returns data that was committed on or before the specified epoch.

  • Purge deleted data: Deleted data is not removed from physical storage until it is purged from the database. You can purge deleted data from the database only if it precedes the ancient history marker (AHM) epoch.

Vertica has one open epoch and any number of closed epochs, depending on your system configuration. New and updated data is written into the open epoch, and each closed epoch represents a previous commit to your database. When data is committed with a DML operation (INSERT, UPDATE, MERGE, COPY, or DELETE), Vertica writes the data, closes the open epoch, and opens a new epoch. Each row committed to the database is associated with the epoch in which it was written.

The EPOCHS system table contains information about each available closed epoch. The epoch_close_time column stores the date and time of the commit. The epoch_number column stores the corresponding epoch number:

=> SELECT * FROM EPOCHS;
       epoch_close_time        | epoch_number
-------------------------------+--------------
 2020-07-27 14:29:49.687106-04 |           91
 2020-07-28 12:51:53.291795-04 |           92
(2 rows)

Epoch milestones

As an epoch progresses through its life cycle, it reaches milestones that Vertica uses it to perform a variety of operations and maintain the state of the database. The following image generally depicts these milestones within the epoch life cycle:

Vertica defines each milestone as follows:

  • Current epoch (CE): The current, open epoch that you are presently writing data to.

  • Latest epoch (LE): The most recently closed epoch.

  • Checkpoint epoch: Enterprise Mode only. A node-level epoch that is the latest epoch in which data is consistent across all projections on that node.

  • Last good epoch (LGE): The minimum checkpoint epoch in which data is consistent across all nodes.

  • Ancient history mark (AHM): The oldest epoch that contains data that is accessible by historical queries.

See Epoch life cycle for detailed information about each stage.

1 - Epoch life cycle

The epoch life cycle consists of a sequence of milestones that enable you to perform a variety of operations and manage the state of your database.

The epoch life cycle consists of a sequence of milestones that enable you to perform a variety of operations and manage the state of your database.

Vertica provides epoch management parameters and functions so that you can retrieve and adjust epoch values. Additionally, see Configuring epochs for recommendations on how to set epochs for specific use cases.

Current epoch (CE)

The open epoch that contains all uncommitted changes that you are presently writing to the database. The current epoch is stored in the SYSTEM system table:

=> SELECT CURRENT_EPOCH FROM SYSTEM;
 CURRENT_EPOCH
---------------
         71
(1 row)

The following example demonstrates how the current epoch advances when you commit data:

  1. Query the SYSTEM systems table to return the current epoch:

    => SELECT CURRENT_EPOCH FROM SYSTEM;
     CURRENT_EPOCH
    ---------------
             71
    (1 row)
    

    The current epoch is open, which means it is the epoch that you are presently writing data to.

  2. Insert a row into the orders table:

    => INSERT INTO orders VALUES ('123456789', 323426, 'custacct@example.com');
     OUTPUT
    --------
          1
    (1 row)
    

    Each row of data has an implicit epoch column that stores that row's commit epoch. The row that you just inserted into the table was not committed, so the epoch column is blank:

    => SELECT epoch, orderkey, custkey, email_addrs FROM orders;
     epoch | orderkey  | custkey |     email_addrs
    -------+-----------+---------+----------------------
           | 123456789 |  323426 | custacct@example.com
    (1 row)
    
  3. Commit the data, then query the table again. The committed data is associated with epoch 71, the current epoch that was previously returned from the SYSTEM systems table:

    => COMMIT;
    COMMIT
    => SELECT epoch, orderkey, custkey, email_addrs FROM orders;
     epoch | orderkey  | custkey |     email_addrs
    -------+-----------+---------+----------------------
        71 | 123456789 |  323426 | custacct@example.com
    (1 row)
    
  4. Query the SYSTEMS table again to return the current epoch. The current epoch is 1 integer higher:

    => SELECT CURRENT_EPOCH FROM SYSTEM;
     CURRENT_EPOCH
    ---------------
             72
    (1 row)
    

Latest epoch (LE)

The most recently closed epoch. The current epoch becomes the latest epoch after a commit operation.

The LE is the most recent epoch stored in the EPOCHS system table:

=> SELECT * FROM EPOCHS;
       epoch_close_time        | epoch_number
-------------------------------+--------------
 2020-07-27 14:29:49.687106-04 |           91
 2020-07-28 12:51:53.291795-04 |           92
(2 rows)

Checkpoint epoch (CPE)

Valid in Enterprise Mode only. Each node has a checkpoint epoch, which is the most recent epoch in which the data on that node is consistent across all projections. When the database runs optimally, the checkpoint epoch is equal to the LE, which is always one epoch older than the current epoch.

The checkpoint epoch is used during node failure and recovery. When a single node fails, that node attempts to rebuild data beyond its checkpoint epoch from other nodes. If the failed node cannot recover data using any of those epochs, then the failed node recovers data using the checkpoint epoch.

Use PROJECTION_CHECKPOINT_EPOCHS to query information about the checkpoint epochs. The following query returns information about the checkpoint epoch on nodes that store the orders projection:

=> SELECT checkpoint_epoch, node_name, projection_name, is_up_to_date, would_recover, is_behind_ahm
      FROM PROJECTION_CHECKPOINT_EPOCHS WHERE projection_name ILIKE 'orders_b%';
 checkpoint_epoch |    node_name     | projection_name | is_up_to_date | would_recover | is_behind_ahm
------------------+------------------+-----------------+---------------+---------------+---------------
               92 | v_vmart_node0001 | orders_b1       | t             | f             | f
               92 | v_vmart_node0001 | orders_b0       | t             | f             | f
               92 | v_vmart_node0003 | orders_b1       | t             | f             | f
               92 | v_vmart_node0003 | orders_b0       | t             | f             | f
               92 | v_vmart_node0002 | orders_b0       | t             | f             | f
               92 | v_vmart_node0002 | orders_b1       | t             | f             | f
(6 rows)

This query confirms that the database epochs are advancing correctly. The would_recover column displays an f when the last good epoch (LGE) is equal to the CPE because Vertica gives precedence to the LGE for recovery when possible. The is_behind_ahm column shows whether the checkpoint epoch is behind the AHM. Any data in an epoch that precedes the ancient history mark (AHM) is unrecoverable in case of a database or node failure.

Last good epoch (LGE)

The minimum checkpoint epoch in which data is consistent across all nodes in the cluster. Each node has an LGE, and Vertica evaluates the LGE for each node to determine the cluster LGE. The cluster's LGE is stored in the SYSTEM system table:

=> SELECT LAST_GOOD_EPOCH FROM SYSTEM;
 LAST_GOOD_EPOCH
-----------------
              70
(1 row)

You can retrieve the LGE for each node by querying the expected recovery epoch:

=> SELECT GET_EXPECTED_RECOVERY_EPOCH();
INFO 4544:  Recovery Epoch Computation:
Node Dependencies:
011 - cnt: 21
101 - cnt: 21
110 - cnt: 21
111 - cnt: 9

001 - name: v_vmart_node0001
010 - name: v_vmart_node0002
100 - name: v_vmart_node0003
Nodes certainly in the cluster:
        Node 0(v_vmart_node0001), epoch 70
        Node 1(v_vmart_node0002), epoch 70
Filling more nodes to satisfy node dependencies:
Data dependencies fulfilled, remaining nodes LGEs don't matter:
        Node 2(v_vmart_node0003), epoch 70
--
 GET_EXPECTED_RECOVERY_EPOCH
-----------------------------
                          70
(1 row)

Because the LGE is a snapshot of all of the most recent data on the disk, it is used to recover from database failure. Administration Tools uses the LGE to manually reset the database. If you are recovering from database failure after an unclean shutdown, Vertica prompts you to accept recovery using the LGE during restart.

Ancient history mark (AHM)

The oldest epoch that contains data that is accessible by historical queries. The AHM is stored in the SYSTEM system table:

=> SELECT AHM_EPOCH FROM SYSTEM;
 AHM_EPOCH
-----------
        70
(1 row)

Epochs that precede the AHM are unavailable for historical queries. The following example returns the AHM, and then returns an error when executing a historical query that precedes the AHM:

=> SELECT GET_AHM_EPOCH();
 GET_AHM_EPOCH
---------------
            93
(1 row)

=> AT EPOCH 92 SELECT * FROM orders;
ERROR 3183:  Epoch number out of range
HINT:  Epochs prior to [93] do not exist. Epochs [94] and later have not yet closed

The AHM advances according to your HistoryRetentionTime, HistoryRetentionEpochs, and AdvanceAHMInterval parameter settings. By default, the AHM advances every 180 seconds until it is equal with the LGE. This helps reduce the number of epochs saved to the epoch map, which reduces the catalog size. The AHM cannot advance beyond the LGE.

The AHM serves as the cutoff epoch for purging data from physical disk. As the AHM advances, the Tuple Mover mergeout process purges any deleted data that belongs to an epoch that precedes the AHM. See Purging deleted data for details about automated or manual purges.

2 - Managing epochs

Epochs are stored in the epoch map, a catalog object that contains a list of closed epochs beginning at ancient history mark (AHM) epoch and ending at the latest epoch (LE).

Epochs are stored in the epoch map, a catalog object that contains a list of closed epochs beginning at ancient history mark (AHM) epoch and ending at the latest epoch (LE). As the epoch map increases in size, the catalog uses more memory. Additionally, the AHM is used to determine what data is purged from disk. It is important to monitor database epochs to verify that they are advancing correctly to optimize database performance.

Monitoring epochs

When Vertica is running properly using the default Vertica settings, the ancient history mark, last good epoch (LGE), and checkpoint epoch (CPE, Enterprise Mode only) are equal to the latest epoch, or 1 less than the current epoch. This maintains control on the size of the epoch map and catalog by making sure that disk space is not used storing data that is eligible for purging. The SYSTEM system table stores the current epoch, last good epoch, and ancient history mark:

=> SELECT CURRENT_EPOCH, LAST_GOOD_EPOCH, AHM_EPOCH FROM SYSTEM;
 CURRENT_EPOCH | LAST_GOOD_EPOCH | AHM_EPOCH
---------------+-----------------+-----------
            88 |              87 |        87
(1 row)

Vertica provides GET_AHM_EPOCH, GET_AHM_TIME, GET_CURRENT_EPOCH, and GET_LAST_GOOD_EPOCH to retrieve these epochs individually.

In Enterprise Mode, you can query the checkpoint epoch using the PROJECTION_CHECKPOINT_EPOCHS table to return the checkpoint epoch for each node in your cluster. The following query returns the CPE for any node that stores the orders projection:

=> SELECT checkpoint_epoch, node_name, projection_name
     FROM PROJECTION_CHECKPOINT_EPOCHS WHERE projection_name ILIKE 'orders_b%';
 checkpoint_epoch |    node_name     | projection_name
------------------+------------------+-----------------
               87 | v_vmart_node0001 | orders_b1
               87 | v_vmart_node0001 | orders_b0
               87 | v_vmart_node0003 | orders_b1
               87 | v_vmart_node0003 | orders_b0
               87 | v_vmart_node0002 | orders_b0
               87 | v_vmart_node0002 | orders_b1
(6 rows)

Troubleshooting the ancient history mark

A properly functioning AHM is critical in determining how well your database utilizes disk space and executes queries. When you commit a DELETE or UPDATE (a combination of DELETE and INSERT) operation, the data is not deleted from disk immediately. Instead, Vertica marks the data for deletion so that you can retrieve it with historical queries. Deleted data takes up space on disk and impacts query performance because Vertica must read the deleted data during non-historical queries.

Epochs advance as you commit data, and any data that is marked for deletion is automatically purged by the Tuple Mover mergeout process when its epoch advances past the AHM. You can create an automated purge policy or manually purge any deleted data that was committed in an epoch that precedes the AHM. See Setting a purge policy for additional information.

By default, the AHM advances every 180 seconds until it is equal to the LGE. Monitor the SYSTEM system table to ensure that the AHM is advancing according properly:

=> SELECT CURRENT_EPOCH, LAST_GOOD_EPOCH, AHM_EPOCH FROM SYSTEM;
 CURRENT_EPOCH | LAST_GOOD_EPOCH | AHM_EPOCH
---------------+-----------------+-----------
            94 |              93 |        86
(1 row)

If you notice that the AHM is not advancing correctly, it might be due to one or more of the following:

  • Your database contains unrefreshed projections. This occurs when you create a projection for a table that already contains data. See Refreshing projections for details on how to refresh projections.

  • A node is DOWN. When a node is DOWN, the AHM cannot advance. See Restarting Vertica on a host for information on how to resolve this issue.

  • Confirm that the AHMBackupManagement epoch parameter is set to 0. If this parameter is set to 1, the AHM does not advance beyond the most recent full backup:

    => SELECT node_name, parameter_name, current_value FROM CONFIGURATION_PARAMETERS WHERE parameter_name='AHMBackupManagement';
     node_name |   parameter_name    | current_value
    -----------+---------------------+---------------
     ALL       | AHMBackupManagement | 0
    (1 row)
    

3 - Configuring epochs

Epoch configuration impacts how your database recovers from failure, handles historical data, and purges data from disk.

Epoch configuration impacts how your database recovers from failure, handles historical data, and purges data from disk. Vertica provides epoch management parameters for system-wide epoch configuration. Epoch management functions enable you to make ad hoc adjustments to epoch values.

Historical query and recovery precision

When you execute a historical query, Vertica returns an epoch within the amount of time specified by the EpochMapInterval configuration parameter. For example, when you execute a historical query using the AT TIME time epoch clause, Vertica returns an epoch within the parameter setting. By default, EpochMapInterval is set to 180 seconds. You must set EpochMapInterval to a value greater than or equal to the AdvanceAHMInterval parameter:

=> SELECT node_name, parameter_name, current_value FROM CONFIGURATION_PARAMETERS
     WHERE parameter_name='EpochMapInterval' OR parameter_name='AdvanceAHMInterval';
 node_name |   parameter_name   | current_value
-----------+--------------------+---------------
 ALL       | EpochMapInterval   | 180
 ALL       | AdvanceAHMInterval | 180
(2 rows)

During failure recovery, Vertica uses the EpochMapInterval setting to determine which epoch is reported as the last good epoch (LGE).

History retention and purge workflows

Vertica recommends that you configure your epoch parameters to create a purge policy that determines when deleted data is purged from disk. If you use historical queries often, then you need to find a balance between saving deleted historical data and purging it from disk. An aggressive purge policy increases disk utilization and improves query performance, but also limits your recovery options and narrows the window of data available for historical queries.

There are two strategies to creating a purge policy:

  • Set HistoryRetentionTime to specify how long deleted data is saved (in seconds) as an historical reference.

  • Set HistoryRetentionEpochs to specify the number of historical epochs to save.

See Setting a purge policy for details about configuring each workflow.

Setting HistoryRetentionTime is the preferred method for creating a purge policy. By default, Vertica sets this value to 0, so the AHM is 1 less than the current epoch when the database is running properly. You cannot execute historical queries on epochs that precede the AHM, so you might want to adjust this setting to save more data between the present time and the AHM. Another reason to adjust this parameter is if you use the Roll Back Database to Last Good Epoch option for manual roll backs. For example, the following command sets HistoryRetentionTime to 1 day (in seconds) to provide a wider range of epoch roll back options:

=> ALTER DATABASE vmart SET HistoryRetentionTime = 86400;

Vertica checks the status of your retention settings using the AdvanceAHMInterval setting and advances the AHM as necessary. After the AHM advances, any deleted data in an epoch that precedes the AHM is purged automatically by the Tuple Mover mergeout process.

If you want to disable any purge policy and preserve all historical data, set both HistoryRetentionTime and HistoryRetentionEpochs to -1:

=> ALTER DABABASE vmart SET HistoryRetentionTime = -1;
=> ALTER DATABASE vmart SET HistoryRetentionEpochs = -1;

If you do not set a purge policy, you can use epoch management functions to adjust the AHM to manually purge deleted data as needed. Manual purges are useful if you need to update or delete data uploaded by mistake. See Manually purging data for details.