This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Creating backups

You should perform full backups of your database regularly.

You should perform full backups of your database regularly. You should also perform a full backup under the following circumstances:

Before…

  • Upgrading Vertica to another release.

  • Dropping a partition.

  • Adding, removing, or replacing nodes in the database cluster.

After…

  • Loading a large volume of data.

  • Adding, removing, or replacing nodes in the database cluster.

  • Recovering a cluster from a crash.

If…

  • The epoch of the latest backup predates the current ancient history mark.

Ideally, schedule ongoing backups to back up your data. You can run the Vertica vbr from a cron job or other task scheduler.

You can also back up selected objects. Use object backups to supplement full backups, not to replace them. Backup types are described in Types of backups.

Running vbr does not affect active database applications. vbr supports creating backups while concurrently running applications that execute DML statements, including COPY, INSERT, UPDATE, DELETE, and SELECT.

Backup locations and contents

Full and object-level backups reside on backup hosts, the computer systems on which backups and archives are stored.

Vertica saves backups in a specific backup location, the directory on a backup host. This location can contain multiple backups, both full and object-level, including associated archives. The backups are also compatible, allowing you to restore any objects from a full database backup. Backup locations for Eon Mode databases must be on S3.

Before beginning a backup, you must prepare your backup locations using the vbr init task, as in the following example:

$ vbr -t init -c full_backup.ini

For more information about backup locations, see Setting up backup locations.

Backups contain all committed data for the backed-up objects as of the start time of the backup. Backups do not contain uncommitted data or data committed during the backup. Backups do not delay mergeout or load activity.

Backing up HDFS storage locations

If your Vertica cluster uses HDFS storage locations, you must do some additional configuration before you can perform backups. See Requirements for backing up and restoring HDFS storage locations.

HDFS storage locations support only full backup and restore. You cannot perform object backup or restore on a cluster that uses HDFS storage locations.

Impact of backups on Vertica nodes

While a backup is taking place, the backup process can consume additional storage. The amount of space consumed depends on the size of your catalog and any objects that you drop during the backup. The backup process releases this storage when the backup is complete.

Best practices for creating backups

When creating backup configuration files:

  • Create separate configuration files to create full and object-level backups.

  • Use a unique snapshot name in each configuration file.

  • Use the same backup host directory location for both kinds of backups:

    • Because the backups share disk space, they are compatible when performing a restore.

    • Each cluster node must also use the same directory location on its designated backup host.

  • For best network performance, use one backup host per cluster node.

  • Use one directory on each backup node to store successive backups.

  • For future reference, append the major Vertica version number to the configuration file name (mybackup9x).

The selected objects of a backup can include one or more schemas or tables, or a combination of both. For example, you can include schema S1 and tables T1 and T2 in an object-level backup. Multiple backups can be combined into a single backup. A schema-level backup can be integrated with a database backup (and a table backup integrated with a schema-level backup, and so on).

1 - Types of backups

vbr supports the following kinds of backups:.

vbr supports the following kinds of backups:

The vbr configuration file includes the snapshotName parameter. Use different snapshot names for different types of backups, including different combinations of objects in object-level backups. Backups with the same snapshot name form a time sequence limited by restorePointLimit,. Avoid giving all backups the same snapshot name; otherwise, they eventually interfere with each other.

Full backups

A full backup is a complete copy of the database catalog, its schemas, tables, and other objects. This type of backup provides a consistent image of the database at the time the backup occurred. You can use a full backup for disaster recovery to restore a damaged or incomplete database. You can also restore individual objects from a full backup.

When a full backup already exists, vbr performs incremental backups, whose scope is confined to data that is new or changed since the last full backup occurred. You can specify the number of historical backups to keep.

Archives contain a collection of same-name backups. Each archive can have a different retention policy. For example, TBak might be the name of an object-level backup of table T. If you create a daily backup each week, the seven backups of a given week become part of the TBak archive. Keeping a backup archive lets you revert back to any one of the saved backups.

Object-level backups

An object-level backup consists of one or more schemas or tables or a group of such objects. The conglomerate parts of the object-level backup do not contain the entire database. When an object-level backup exists, you can restore all of its contents or individual objects.

Object-level backups contain the following object types:

Object Type Description
Selected objects Objects you choose to be part of an object-level backup. For example, if you specify tables T1 and T2 to include in an object-level backup, they are the selected objects.
Dependent objects Objects that must be included as part of an object-level backup, due to dependencies. Suppose you want to create an object-level backup that includes a table with a foreign key. To do so, table constraints require that you include the primary key table, and vbr enforces this requirement. Projections anchored on a table in the selected objects are also dependent objects.
Principal objects The objects on which both selected and dependent objects depend are called principal objects. For example, each table and projection has an owner, and each is a principal object.

Valid only for Enterprise Mode, hard-link local backups are saved directly on the database nodes, and can be performed on the entire database or specific objects. Typically you use this kind of backup temporarily before performing a disruptive operation. Do not rely on this kind of backup for long-term use; it cannot protect you from node failures because data and backups are on the same nodes.

A checkpoint backup is a hard-link local backup that comprises a complete copy of the database catalog, and a set of hard file links to corresponding data files. You must save a hard-link local backup on the same file system that is used by the catalog and database files.

2 - Creating full backups

Before you create a database backup, verify the following:.

Before you create a database backup, verify the following:

  • You have prepared your backup directory with the vbr init task:

    $ vbr -t init -c full_backup.ini
    
  • Your database is running. It is unnecessary for all nodes to be up in a K-safe database. However, any nodes that are DOWN are not backed up.

  • All of the backup hosts are up and available.

  • The backup host (either on the database cluster or elsewhere) has sufficient disk space to store the backups.

  • The user account of the user who starts vbr has write access to the target directories on the host backup location. This user can be dbadmin or another assigned role. However, you cannot run vbr as root.

  • Each backup has a unique file name.

  • If you want to keep earlier backups, restorePointLimit is set to a number greater than 1 in the configuration file.

  • If you are backing up an Eon Mode database, you have met the Eon Mode database requirements.

Run vbr from a terminal. Use the database administrator account from an initiator node in your database cluster. The command requires only the --task backup and --config-file arguments (or their short forms, -t and -c).

If your configuration file does not contain the database administrator password, vbr prompts you to enter the password. It does not display what you type.

vbr requires no further interaction after you invoke it.

The following example shows a full backup:

$ vbr -t backup -c full_backup.ini
Starting backup of database VTDB.
Participating nodes: v_vmart_node0001, v_vmart_node0002, v_vmart_node0003, v_vmart_node0004.
Snapshotting database.
Snapshot complete.
Approximate bytes to copy: 2315056043 of 2356089422 total.
[==================================================] 100%
Copying backup metadata.
Finalizing backup.
Backup complete!

By default, no output is displayed, other than the progress bar. To include additional progress information, use the --debug option, with a value of 1, 2, or 3.

3 - Creating object-level backups

Use object-level backups to back up individual schemas or tables.

Use object-level backups to back up individual schemas or tables. Object-level backups are especially useful for multi-tenanted database sites. For example, an international airport could use a multi-tenanted database to represent different airlines in its schemas. Then, tables could maintain various types of information for the airline, including ARRIVALS, DEPARTURES, and PASSENGER information. With such an organization, creating object-level backups of the specific schemas would let you restore by airline tenant, or any other important data segment.

To create one or more object-level backups, create a configuration file specifying the backup location, the object-level backup name, and a list of objects to include (one or more schemas and tables). You can use the includeObjects and excludeObjects parameters together with wildcards to specify the objects of interest. For more information about specifying the objects to include, see Including and excluding objects.

For more information about configuration files for full or object-level backups, see Sample vbr configuration files and vbr configuration file reference.

While not required, Vertica recommends that you first create a full backup before creating any object-level backups.

Performing the backup

Before you can create a backup, you must prepare your backup directory with the vbr -init task. You must also create a configuration file specifying which objects to back up.

Run vbr from a terminal using the database administrator account from a node in your database cluster. You cannot run vbr as root.

You can create an object-level backup as in the following example.

$ vbr --task backup --config-file objectbak.ini
Preparing...
Found Database port:  5433
Copying...
[==================================================] 100%
All child processes terminated successfully.
Committing changes on all backup sites...
backup done!

Naming conventions

Give each object-level backup configuration file a distinct and descriptive name. For instance, at an airport terminal, schema-based backup configuration files use a naming convention with an airline prefix, followed by further description, such as:

AIR1_daily_arrivals_backup

AIR2_hourly_arrivals_backup

AIR2_hourly_departures_backup

AIR3_daily_departures_backup

When database and object-level backups exist, you can recover the backup of your choice.

Understanding object-level backup contents

Object-level backups comprise only the elements necessary to restore the schema or table, including the selected, dependent, and principal objects. An object-level backup includes the following contents:

  • Storage: Data files belonging to any specified objects

  • Metadata: Including the cluster topology, timestamp, epoch, AHM, and so on

  • Catalog snippet: Persistent catalog objects serialized into the principal and dependent objects

Some of the elements that AIR2 comprises, for instance, are its parent schema, tables, named sequences, primary key and foreign key constraints, and so on. To create such a backup, vbr saves the objects directly associated with the table. It also saves any dependencies, such as foreign key (FK) tables, and creates an object map from which to restore the backup.

Making changes after an object-level backup

Be aware how changes made after an object-level backup affect subsequent backups. Suppose you create an object-level backup and later drop schemas and tables from the database. In this case, the objects you dropped are also dropped from subsequent backups. If you do not save an archive of the object backup, such objects could be lost permanently.

Changing a table name after creating a table backup does not persist after restoring the backup. Suppose that, after creating a backup, you drop a user who owns any selected or dependent objects in that backup. In this case, restoring the backup re-creates the object and assigns ownership to the user performing the restore. If the owner of a restored object still exists, that user retains ownership of the restored object.

To restore a dropped table from a backup:

  1. Rename the newly created table from t1 to t2.

  2. Restore the backup containing t1.

  3. Restore t1. Tables t1 and t2 now coexist.

For information on how Vertica handles object overwrites, refer to the objectRestoreMode parameter in [misc].

K-safety can increase after an object backup. Restoration of a backup fails if both of the following conditions occur:

  • An increase in K-safety occurs.

  • Any table in the backup has insufficient projections.

Changing principal and dependent objects

If you create a backup and then drop a principal object, restoring the backup restores that principal object. If the owner of the restored object has also been dropped, Vertica assigns the restored object to the current dbadmin.

You can specify how Vertica handles object overwrites in the vbr configuration file. For more information, refer to the objectRestoreMode parameter in [misc].

IDENTITY sequences are dependent objects because they cannot exist without their tables. An object-level backup includes such objects, along with the tables on which they depend.

Named sequences are not dependent objects because they exist autonomously. A named sequence remains after you drop the table in which the sequence is used. In this case, the named sequence is a principal object. Thus, you must back up the named sequence with the table. Then you can regenerate it, if it does not already exist when you restore the table. If the sequence does exist, vbr uses it, unmodified. Sequence values could repeat, if you restore the full database and then restore a table backup to a newer epoch.

Considering constraint references

When database objects are related through constraints, you must back them up together. For example, a schema with tables whose constraints reference only tables in the same schema can be backed up. However, a schema containing a table with an FK/PK constraint on a table in another schema cannot. To back up the second table, you must include the other schema in the list of selected objects.

Configuration files for object-level backups

vbr automatically associates configurations with different backup names but uses the same backup location.

Always create a cluster-wide configuration file and one or more object-level configuration files pointing to the same backup location. Storage between backups is shared, preventing multiple copies of the same data. For object-level backups, using the same backup location causes vbr to encounter fewer OID conflict prevention techniques. Avoiding OID conflict prevention results in fewer problems when restoring the backup.

When using cluster and object configuration files with the same backup location, vbr includes additional provisions to ensure that the object-level backups can be used following a full cluster restore. One approach to restoring a full cluster is to use a full database backup to bootstrap the cluster. After the cluster is operational again, you can restore the most recent object-level backups for schemas and tables.

Attempting to restore a full database using an object-level configuration file fails, resulting in this error:

VMart=> /tmp/vbr --config-file=Table2.ini -t restore
Preparing...
Invalid metadata file. Cannot restore.
restore failed!

See Restoring all objects from an object-level backup for more information.

Backup epochs

Each backup includes the epoch to which its contents can be restored. When vbr restores data, Vertica updates to the current epoch.

vbr attempts to create an object-level backup five times before an error occurs and the backup fails.

4 - Creating hard-link local backups

You can use the hardLinkLocal option to create a full or object-level backup with hard file links on a local database host.

You can use the hardLinkLocal option to create a full or object-level backup with hard file links on a local database host.

Creating hard-link local backups can provide the following advantages over a remote host backup:

  • Speed: A hard-link local backup is significantly faster than a remote host backup. When backing up, vbr does not copy files if the backup directory exists on the same file system as the database directory.

  • Reduced network activities: The hard-link local backup minimizes network load because it does not require rsync to copy files to a remote backup host.

  • Less disk space: The backup includes a copy of the catalog and hard file links. Therefore, the local backup uses significantly less disk space than a backup with copies of database data files. However, a hard-link local backup saves a full copy of the catalog each time you run vbr. Thus, the disk size increases with the catalog size over time.

Hard-link local backups can help you during experimental designs and development cycles. Database designers and developers can create hard-link local object backups of schemas and tables on a regular schedule during design and development phases. If any new developments are unsuccessful, developers can restore one or more objects from the backup.

If you plan to use hard-link local backups as a standard site procedure, design your database and hardware configuration appropriately. Consider storing all of the data files on one file system per node. Such a configuration has the advantage of being set up automatically for hard-link local backups.

Specifying backup directory locations

The backupDir parameter of the configuration file specifies the location of the top-level backup directory. Hard-link local backups require that the backup directory be located on the same Linux file system as the database data. The Linux operating system cannot create hard file links to another file system.

Do not create the hard-link local backup directory in a database data storage location. For example, as a best practice, the database data directory should not be at the top level of the file system, as it is in the following example:

/home/dbadmin/data/VMart/v_vmart_node0001

Instead, Vertica recommends adding another subdirectory for data above the database level, such as in this example:

/home/dbadmin/data/dbdata/VMart/v_vmart_node0001

You can then create the hard-link local backups subdirectory as a peer of the data directory you just created, such as in this example:

/home/dbadmin/data/backups
/home/dbadmin/data/dbdata

When you specify the hard-link backup location, be sure to avoid these common errors when adding the hardLinkLocal=True parameter to the configuration file:

If ... Then... Solution
You specify a backup directory on a different node vbr issues an error message and aborts the backup. Change the configuration file to include a backup directory on the same host and file system as the database files. Then, run vbr again.
You specify a backup location on the same node, but a backup destination directory on a different file system from the database and catalog files. vbr issues a warning message and performs the backup by copying (not linking) the files from one file system to the other. No action required, but copying consumes more disk space and takes longer than linking.

Creating the backup

Before creating a full hard-link local database backup of an Enterprise Mode database, verify the following:

  • Your database is running. All nodes need not be up in a K-safe database for vbr to run. However, be aware that any nodes that are DOWN are not backed up.

  • The user account that starts vbr (dbadmin or other) has write access to the target backup directories.

Hard-link backups are not supported in Eon Mode.

When you create a full or object-level hard link local backup, that backup contains the following:

Backup Catalog Database files
Full backup Full copy Hard file links to all database files
Object-level backup Full copy Hard file links for all objects listed in the configuration file, and any of their dependent objects

Run the vbr script from a terminal using the database administrator account from a node in your database cluster. You cannot run vbr as root.

Hard-link backups use the same vbr arguments as other backups. Configuring a backup as a hard-link backup is done entirely in the configuration file. The following example shows the syntax:

$ vbr --task backup --config fullbak.ini

You can use hard-link local backups as a staging mechanism to back up to tape or other forms of storage media. The following steps present a simplified approach to saving, and then restoring, hard-link local backups from tape storage:

  1. Create a configuration file by copying an existing one or one of the samples described in Sample vbr configuration files.

  2. Edit the configuration file (localbak.ini in this example) to include the hardLinkLocal=True parameter in the [Transmission] section.

  3. Run vbr with the configuration file:

    $ vbr --task backup --config-file localbak.ini
    
  4. Copy the hard-link local backup directory with a separate process (not vbr) to tape or other external media.

  5. If the database becomes corrupted, transfer the backup files from tape to their original backup directory and restore as explained in Restoring hard-link local backups.

Restoring hard-link local backups requires some additional (manual) steps. Do not use them as a substitute for regular full backups (Creating full backups).

Hard-link local backups are only as reliable as the disk on which they are stored. If the local disk becomes corrupt, so does the hard-link local backup. In this case, you are unable to restore the database from the hard-link local backup because it is also corrupt.

All sites should maintain full backups externally for disaster recovery because hard-link local backups do not actually copy any database files.

5 - Incremental or repeated backups

As a best practice, Vertica recommends that you take frequent backups if database contents diverge in significant ways.

As a best practice, Vertica recommends that you take frequent backups if database contents diverge in significant ways. Always take backups after any event that significantly modifies the database, such as performing a rebalance. Mixing many backups with significant differences can weaken data K-safety. For example, taking backups both before and after a rebalance is not a recommended practice in cases where the backups are all part of one archive.

Each time you back up your database with the same configuration file, vbr creates an additional backup and might remove the oldest backup. The backup operation copies new storage containers, which can include:

  • Data that existed the last time you performed a database backup

  • New and changed data since the last full backup

Use the restorePointLimit parameter in the configuration file to increase the number of stored backups. If a backup task would cause this limit to be exceeded, vbr deletes the oldest backup after a successful backup.

When you run a backup task, vbr first creates the new backup in the specified location, which might temporarily exceed the limit. It then checks whether the number of backups exceeds the value of restorePointLimit, and, if necessary, deletes the oldest backups until only restorePointLimit remain. If the requested backup fails or is interrupted, vbr does not delete any backups.

When you restore a database, you can choose to restore from any retained backup rather than the most recent, so raise the limit if you expect to need access to older backups.