This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Backing up and restoring the database

Creating regular database backups is an important part of basic maintenance tasks.

1: Common use cases
2: Sample vbr configuration files

2.1: External full backup/restore
2.2: Backup/restore to cloud storage
2.3: Full hard-link backup/restore
2.4: Full local backup/restore
2.5: Object-level local backup/restore in Enterprise Mode
2.6: Restore object from backup to an alternate cluster
2.7: Object replication to an alternate database
2.8: Database copy to an alternate cluster
2.9: Password file

3: Eon Mode database requirements
4: Requirements for backing up and restoring HDFS storage locations
5: Setting up backup locations

5.1: Configuring backup hosts and connections
5.2: Configuring hard-link local backup hosts
5.3: Configuring cloud storage backups
5.4: Additional considerations for cloud storage
5.5: Configuring backups to and from HDFS

6: Creating backups

6.1: Types of backups
6.2: Creating full backups
6.3: Creating object-level backups
6.4: Creating hard-link local backups
6.5: Incremental or repeated backups

7: Restoring backups

7.1: Restoring a database from a full backup
7.2: Restoring a database to an alternate cluster
7.3: Restoring all objects from an object-level backup
7.4: Restoring individual objects
7.5: Restoring objects to an alternate cluster
7.6: Restoring hard-link local backups
7.7: Ownership of restored objects

8: Copying the database to another cluster
9: Replicating objects to another database cluster
10: Including and excluding objects
11: Managing backups

11.1: Viewing backups
11.2: Checking backup integrity
11.3: Repairing backups
11.4: Removing backups
11.5: Estimating log file disk requirements
11.6: Allocating resources

12: Troubleshooting backup and restore
13: vbr reference
14: vbr configuration file reference

14.1: [CloudStorage]
14.2: [database]
14.3: [mapping]
14.4: [misc]
14.5: [NodeMapping]
14.6: [transmission]
14.7: Password configuration file

Important

Inadequate security on backups can compromise overall database security. Be sure to secure backup locations and strictly limit access to backups only to users who already have permissions to access all database data.

Creating regular database backups is an important part of basic maintenance tasks. Vertica supplies a comprehensive utility, vbr, for this purpose. vbr lets you perform the following operations. Unless otherwise noted, operations are supported in both Enterprise Mode and Eon Mode:

Back up a database.
Back up specific objects (schemas or tables) in a database.
Restore a database or individual objects from backup.
Copy a database to another cluster. For example, to promote a test cluster to production (Enterprise Mode only).
Replicate individual objects (schemas or tables) to another cluster.
List available backups.

When you run vbr, you specify a configuration (.ini) file. In this file you specify all of the configuration parameters for the operation: what to back up, where to back it up, how many backups to keep, whether to encrypt transmissions, and much more. Vertica provides several Sample vbr configuration files that you can use as templates.

You can use vbr to restore a backup created by vbr. Typically, you use the same configuration file for both operations. Common use cases introduces the most common vbr operations.

When performing a backup, you can save your data to one of the following locations:

Local directory on each node
Remote file system
Different Vertica cluster (effectively cloning your database)
Cloud storage

You cannot back up an Enterprise Mode database and restore it in Eon Mode, or vice versa.

Supported cloud storage

Vertica supports backup and restore operations in the following cloud storage locations:

Amazon Web Services (AWS) S3
S3-compatible private cloud storage, such as Pure Storage or Minio
Google Cloud Storage (GCS)
Azure Blob Storage

If you are backing up an Eon Mode database, you must use a supported cloud storage location.

You cannot perform backup or restore operations between different cloud providers. For example, you cannot back up or restore from GCS to an S3 location.

Additional considerations for HDFS storage locations

If your database has any storage locations on HDFS, additional configuration is required to enable those storage locations for backup operations. See Requirements for backing up and restoring HDFS storage locations.

1 - Common use cases

You can use vbr to perform many tasks related to backup and restore.

You can use vbr to perform many tasks related to backup and restore. The vbr reference describes all of the tasks in detail. This section summarizes common use cases. For each of these cases, there are additional requirements not covered here. Be sure to read the linked topics for details.

This is not a complete list of Backup/Restore capabilities.

Routine backups in Enterprise Mode

A full backup stores a copy of your data in another location—ideally a location that is separated from your database location, such as on different hardware or in the cloud. You give the backup a name (the snapshot name), which allows you to have different backups and backup types without interference. In your configuration file, you can map database nodes to backup locations and set some other parameters.

Before your first backup, run the vbr init task.

Use the vbr backup task to perform a full backup. The External full backup/restore example provides a starting point for your configuration. For complete documentation of full backups, see Creating full backups.

Routine backups in Eon Mode

For the most part, backups in Eon Mode work the same way as backups in Enterprise Mode. Eon Mode has some additional requirements described in Eon Mode database requirements, and some configuration parameters are different for backups to cloud storage. You can back up or restore Eon Mode databases that run in the cloud or on-premises using a supported cloud storage location.

Use the vbr backup task to perform a full backup. The Backup/restore to cloud storage example provides a starting point for your configuration. For complete documentation of full backups, see Creating full backups.

Checkpoint backups: backing up before a major operation

It is a good idea to back up your database before performing destructive operations such as dropping tables, or before major operations such as upgrading Vertica to a new version.

You can perform a regular full backup for this purpose, but a faster way is to create a hard-link local backup. This kind of backup copies your catalog and links your data files to another location on the local file system on each node. (You can also do a hard-link backup of specific objects rather than the whole database.) A hard-link local backup does not provide the same protection as a backup stored externally. For example, it does not protect you from local system failures. However, for a backup that you expect to need only temporarily, a hard-link local backup is an expedient option. Do not use hard-link local backups as substitutes for regular backups to other nodes.

Hard-link backups use the same vbr backup task as other backups, but with a different configuration. The Full hard-link backup/restore example provides a starting point for your configuration. See Creating hard-link local backups for more information.

Restoring selected objects

Sometimes you need to restore specific objects, such as a table you dropped, rather than the entire database. You can restore individual tables or schemas from any backup that contains them, whether a full backup or an object backup.

Use the vbr restore task and the --restore-objects parameter to specify what to restore. Usually you use the same configuration file that you used to create the backup. See Restoring individual objects for more information.

Restoring an entire database

You can restore both Enterprise Mode and Eon Mode databases from complete backups. You cannot use restore to change the mode of your database. In Eon Mode, you can restore to the primary subcluster without regard to secondary subclusters.

Use the vbr restore task to restore a database. As when restoring selected objects, you usually use the same configuration file that you used to create the backup. See Restoring a database from a full backup and Restoring hard-link local backups for more information.

Copying a cluster

You might need to copy a database to another cluster of computers, such as when you are promoting a database from a staging environment to production. Copying a database to another cluster is essentially a simultaneous backup and restore operation. The data is backed up from the source database cluster and restored to the destination cluster in a single operation.

Use the vbr copycluster task to copy a cluster. The Database copy to an alternate cluster example provides a starting point for your configuration. See Copying the database to another cluster for more information.

Replicating selected objects to another database

You might want to replicate specific tables or schemas from one database to another. For example, you might do this to copy data from a production database to a test database to investigate a problem in isolation. Another example is when you complete a large data load in one database, replication to another database might be more efficient than repeating the load operation in the other database.

Use the vbr replicate task to replicate objects. You specify the objects to replicate in the configuration file. The Object replication to an alternate database example provides a starting point for your configuration. See Replicating objects to another database cluster for more information.

2 - Sample vbr configuration files

The vbr utility uses configuration files to provide the information it needs to back up and restore a full or object-level backup or copy a cluster.

The vbr utility uses configuration files to provide the information it needs to back up and restore a full or object-level backup or copy a cluster. No default configuration file exists. You must always specify a configuration file with the vbr command.

Vertica includes sample configuration files that you can copy, edit, and deploy for various vbr tasks. Vertica automatically installs these files at:

/opt/vertica/share/vbr/example_configs

2.1 - External full backup/restore

An external (distributed) backup backs up each database node to a distinct backup host.

backup_restore_full_external.ini

An external (distributed) backup backs up each database node to a distinct backup host. Nodes are mapped to hosts in the [Mapping] section.

To restore, use the same configuration file that you used to create the backup.

; This sample vbr configuration file shows full or object backup and restore to a separate remote backup-host for each respective database host.
; Section headings are enclosed by square brackets.
; Comments have leading semicolons (;) or pound signs (#).
; An equal sign separates options and values.
; Specify arguments marked '!!Mandatory!!' explicitly.
; All commented parameters are set to their default value.

; ------------------------------------------- ;
;;; BASIC PARAMETERS ;;;
; ------------------------------------------- ;

[Mapping]
; !!Mandatory!! This section defines what host and directory will store the backup for each node.
; node_name = backup_host:backup_dir
; In this "parallel backup" configuration, each node backs up to a distinct external host.
; To backup all database nodes to a single external host, use that single hostname/IP address in each entry below.
v_exampledb_node0001 = 10.20.100.156:/home/dbadmin/backups
v_exampledb_node0002 = 10.20.100.157:/home/dbadmin/backups
v_exampledb_node0003 = 10.20.100.158:/home/dbadmin/backups
v_exampledb_node0004 = 10.20.100.159:/home/dbadmin/backups

[Misc]
; !!Recommended!! Snapshot name.  Object and full backups should always have different snapshot names.
; Backups with the same snapshotName form a time sequence limited by restorePointLimit.
; SnapshotName is used for naming archives in the backup directory, and for monitoring and troubleshooting.
; Valid characters: a-z A-Z 0-9 - _
; snapshotName = backup_snapshot

[Database]
; !!Recommended!! If you have more than one database defined on this Vertica cluster, use this parameter to specify which database to backup/restore.
; dbName = current_database

; If this parameter is True, vbr prompts the user for the database password every time.
; If False, specify the location of password config file in 'passwordFile' parameter in [Misc] section.
; dbPromptForPassword = True

; If true, vbr attempts to connect to the database using a local connection.
; dbUseLocalConnection = False

; ------------------------------------------- ;
;;; ADVANCED PARAMETERS ;;;
; ------------------------------------------- ;

[Misc]
; The temp directory location on all database hosts.
; The directory must be readable and writeable by the dbadmin, and must implement POSIX style fcntl lockf locking.
; tempDir = /tmp/vbr

; Specifies the number of historical backups to retain in addition to the most recent backup.
; 1 current + n historical backups
; restorePointLimit = 1

; Full path to the password configuration file
; Store this file in directory readable only by the dbadmin
; (no default)
; passwordFile = /path/to/vbr/pw.txt

; When enabled, Vertica confirms that the specified backup locations contain
; sufficient free space and inodes to allow a successful backup. If a backup
; location has insufficient resources, Vertica displays an error message explaining the shortage and
; cancels the backup. If Vertica cannot determine the amount of available space
; or number of inodes in the backupDir, it displays a warning and continues
; with the backup.
; enableFreeSpaceCheck = True

[Transmission]
; Specifies the default port number for the rsync protocol.
; port_rsync = 50000

; Total bandwidth limit for all backup connections in KBPS, 0 for unlimited. Vertica distributes
; this bandwidth evenly among the number of connections set in concurrency_backup.
; total_bwlimit_backup = 0

; The maximum number of backup TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_backup = 1

; The total bandwidth limit for all restore connections in KBPS, 0 for unlimited
; total_bwlimit_restore = 0

; The maximum number of restore TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_restore = 1

; The maximum number of delete TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_delete = 16

[Database]
; Vertica user name for vbr to connect to the database.
; This setting is rarely needed since dbUser is normally identical to the database administrator
; dbUser = current_username

2.2 - Backup/restore to cloud storage

You can backup and restore Enterprise Mode and Eon Mode databases to a cloud storage location.

backup_restore_cloud_storage.ini

You can backup and restore Enterprise Mode and Eon Mode databases to a cloud storage location. You must back up Eon Mode databases to a supported cloud storage location. Configuration settings in the [CloudStorage] section are identical for both Enterprise Mode and Eon Mode.

There are one-time configurations that you must complete before your first backup to a new cloud storage location. See Additional considerations for cloud storage for more information.

Backups to on-premises cloud storage destinations require additional configuration for both Enterprise Mode and Eon databases. For details about the additional requirements, see Configuring cloud storage backups.

To restore, use the same configuration file that you used to create the backup. To restore selected objects rather than the entire database, specify the objects to restore on the vbr command line using --restore-objects.

; This sample vbr configuration file shows backup to Cloud Storage e.g AWS S3, GCS, HDFS or on-premises (e.g. Pure Storage)
; This can be used for Vertica databases in Enterprise or Eon mode.
; Section headings are enclosed by square brackets.
; Comments have leading semicolons (;) or pound signs (#).
; Option and values are separated by an equal sign.
; Only arguments marked as '!!Mandatory!!' must be specified explicitly.
; All commented parameters are set to their default value.

; ------------------------------------------- ;
;;; BASIC PARAMETERS ;;;
; ------------------------------------------- ;

[CloudStorage]
; This section replaces the [Mapping] section and is required to back up to cloud storage.

; !!Mandatory!! Backup location on Cloud or HDFS (no default).
cloud_storage_backup_path = gs://backup_bucket/database_backup_path/
; cloud_storage_backup_path = s3://backup_bucket/database_backup_path/
; cloud_storage_backup_path = webhdfs://backup_nameservice/database_backup_path/
; cloud_storage_backup_path = azb://backup_account/backup_container/

; !!Mandatory!! directory used to manage locking during a backup (no default).  If the directory is mounted on the initiator host, you
; should use "[]" instead of the local host name.  The file system must support POSIX fcntl flock.
cloud_storage_backup_file_system_path = []:/home/dbadmin/backup_locks_dir/

[Misc]
; !!Recommended!! Snapshot name
; Backups with the same snapshotName form a time sequence limited by restorePointLimit.
; SnapshotName is used for naming archives in the backup directory, and for monitoring and troubleshooting.
; Valid values: a-z A-Z 0-9 - _
; snapshotName = backup_snapshot

; Specifies how Vertica handles objects of the same name when restoring schema or table backups.
; objectRestoreMode = createOrReplace

; Specifies which tables and/or schemas to copy. For tables, the containing schema defaults to public.
; Note: 'objects' is incompatible with 'includeObjects' and 'excludeObjects'.
; (no default)
; objects = mytable, myschema, myothertable

; Specifies the set of objects to backup/restore; wildcards may be used.
; Note: 'includeObjects' is incompatible with 'objects'.
; includeObjects = public.mytable, customer*, s?

; Subtracts from the set of objects to backup/restore; wildcards may be used
; Note: 'excludeObjects' is incompatible with 'objects'.
; excludeObjects = public.*temp, etl.phase?

[Database]
; !!Recommended!! If you have more than one database defined on this Vertica cluster, use this parameter to specify which database to backup/restore.
; dbName = current_database

; If this parameter is True, vbr prompts the user for the database password every time.
; If False, specify the location of password config file in 'passwordFile' parameter in [Misc] section.
; dbPromptForPassword = True

; If true, vbr attempts to connect to the database using a local connection.
; dbUseLocalConnection = False

; ------------------------------------------- ;
;;; ADVANCED PARAMETERS ;;;
; ------------------------------------------- ;
[CloudStorage]
; Specifies encryption-at-rest on S3
; cloud_storage_encrypt_at_rest = sse
; cloud_storage_sse_kms_key_id = <key_id>

; Specifies SSL encrypted transfer.
; cloud_storage_encrypt_transport = True

; Specifies the number of threads for upload/download - backup
; cloud_storage_concurrency_backup = 10

; Specifies the number of threads for upload/download - restore
; cloud_storage_concurrency_restore = 10

; Specifies the number of threads for deleting objects from the backup location
; cloud_storage_concurrency_delete = 10

; Specifies the path to a custom SSL server certificate bundle
; cloud_storage_ca_bundle = /home/user/ssl_folder/ca_bundle.pem

[Misc]
; The temp directory location on all database hosts.
; The directory must be readable and writeable by the dbadmin, and must implement POSIX style fcntl lockf locking.
; tempDir = /tmp/vbr

; Specifies the number of historical backups to retain in addition to the most recent backup.
; 1 current + n historical backups
; restorePointLimit = 1

; Full path to the password configuration file
; Store this file in directory readable only by the dbadmin.
; (no default)
; passwordFile = /path/to/vbr/pw.txt

; Specifies the service name of the Vertica Kerberos principal. This only applies to HDFS.
; kerberos_service_name = vertica

; Specifies the realm (authentication domain) of the Vertica Kerberos principal. This only applies to HDFS.
; kerberos_realm = your_auth_domain

; Specifies the location of the keytab file which contains the credentials for the Vertica Kerberos principal. This only applies to HDFS.
; kerberos_keytab_file = /path/to/keytab_file

; Specifies the location of the Hadoop XML configuration files of the HDFS clusters. Only set this when your cluster is on HA. This only applies to HDFS.
; If you have multiple conf directories, please separate them with ':'.
; hadoop_conf_dir = /path/to/conf or /path/to/conf1:/path/to/conf2

[Database]
; Vertica user name for vbr to connect to the database.
; This setting is rarely needed since dbUser is normally identical to the database administrator
; dbUser = current_username

2.3 - Full hard-link backup/restore

The following requirements apply to configuring hard-link local backups:.

backup_restore_full_hardlink.ini

The following requirements apply to configuring hard-link local backups:

Under the [Transmission] section, add the parameter hardLinkLocal :
```
hardLinkLocal = True
```
The backup directory must be in the same file system as the database data directory.
Omit the encrypt parameter. If the configuration file sets both parameters encrypt and hardLinkLocal to true, then vbr issues a warning and ignores the encrypt parameter.

; This sample vbr configuration file shows backup and restore using hard-links to data files on each database host for that host's backup.
; Section headings are enclosed by square brackets.
; Comments have leading semicolons (;) or pound signs (#).
; An equal sign separates options and values.
; Specify arguments marked '!!Mandatory!!' explicitly.
; All commented parameters are set to their default value.

; ------------------------------------------- ;
;;; BASIC PARAMETERS ;;;
; ------------------------------------------- ;

[Mapping]
; For each database node there must be one [Mapping] entry to indicate the directory to store the backup.
; !!Mandatory!! Backup host name (no default) and Backup directory (no default).
; node_name = backup_host:backup_dir
; Must use [] for hardlink backups
v_exampledb_node0001 = []:/home/dbadmin/backups
v_exampledb_node0002 = []:/home/dbadmin/backups
v_exampledb_node0003 = []:/home/dbadmin/backups
v_exampledb_node0004 = []:/home/dbadmin/backups

[Misc]
; !!Recommended!! Snapshot name.  Object and full backups should always have different snapshot names.
; Backups with the same snapshotName form a time sequence limited by restorePointLimit.
; Valid characters: a-z A-Z 0-9 - _
; snapshotName = backup_snapshot

[Transmission]
; !!Mandatory!! Identifies the backup as a hardlink style backup.
hardLinkLocal = True
; If copyOnHardLinkFailure is True, when a hard-link local backup cannot create links the data is copied instead.
copyOnHardLinkFailure = False

; ------------------------------------------- ;
;;; ADVANCED PARAMETERS ;;;
; ------------------------------------------- ;

[Database]
; !!Recommended!! If you have more than one database defined on this Vertica cluster, use this parameter to specify which database to backup/restore.
; dbName = current_database

; If this parameter is True, vbr prompts the user for the database password every time.
; If False, specify the location of password config file in 'passwordFile' parameter in [Misc] section.
; dbPromptForPassword = True

[Misc]
; The temp directory location on all database hosts.
; The directory must be readable and writeable by the dbadmin, and must implement POSIX style fcntl lockf locking.
; tempDir = /tmp/vbr

; Full path to the password configuration file
; Store this file in directory readable only by the dbadmin.
; (no default)
; passwordFile =

; Specifies the number of historical backups to retain in addition to the most recent backup.
; 1 current + n historical backups
; restorePointLimit = 1

; When enabled, Vertica confirms that the specified backup locations contain
; sufficient free space and inodes to allow a successful backup. If a backup
; location has insufficient resources, Vertica displays an error message explaining the shortage and
; cancels the backup. If Vertica cannot determine the amount of available space
; or number of inodes in the backupDir, it displays a warning and continues
; with the backup.
; enableFreeSpaceCheck = True

[Database]
; Vertica user name for vbr to connect to the database.
; This setting is rarely needed since dbUser is normally identical to the database administrator.
; dbUser = current_username

2.4 - Full local backup/restore

backup_restore_full_local.ini

; This is a sample vbr configuration file for backup and restore using a file system on each database host for that host's backup.
; Section headings are enclosed by square brackets.
; Comments have leading semicolons (;) or pound signs (#).
; An equal sign separates options and values.
; Specify arguments marked '!!Mandatory!!' explicitly.
; All commented parameters are set to their default value.

; ------------------------------------------- ;
;;; BASIC PARAMETERS ;;;
; ------------------------------------------- ;

[Mapping]
; !!Mandatory!! For each database node there must be one [Mapping] entry to indicate the directory to store the backup.
; node_name = backup_host:backup_dir
; [] indicates backup to localhost
v_exampledb_node0001 = []:/home/dbadmin/backups
v_exampledb_node0002 = []:/home/dbadmin/backups
v_exampledb_node0003 = []:/home/dbadmin/backups
v_exampledb_node0004 = []:/home/dbadmin/backups

[Misc]
; !!Recommended!! Snapshot name
; Backups with the same snapshotName form a time sequence limited by restorePointLimit.
; SnapshotName is used for naming archives in the backup directory, and for monitoring and troubleshooting.
; Valid values: a-z A-Z 0-9 - _
; snapshotName = backup_snapshot

[Database]
; !!Recommended!! If you have more than one database defined on this Vertica cluster, use this parameter to specify which database to backup/restore.
; dbName = current_database

; If this parameter is True, vbr prompts the user for the database password every time.
; If False, specify the location of password config file in 'passwordFile' parameter in [Misc] section.
; dbPromptForPassword = True

; ------------------------------------------- ;
;;; ADVANCED PARAMETERS ;;;
; ------------------------------------------- ;

[Misc]

; The temp directory location on all database hosts.
; The directory must be readable and writeable by the dbadmin, and must implement POSIX style fcntl lockf locking.
; tempDir = /tmp/vbr

; Specifies the number of historical backups to retain in addition to the most recent backup.
; 1 current + n historical backups
; restorePointLimit = 1

; Full path to the password configuration file
; Store this file in directory readable only by the dbadmin.
; (no default)
; passwordFile = /path/to/vbr/pw.txt

; When enabled, Vertica confirms that the specified backup locations contain
; sufficient free space and inodes to allow a successful backup. If a backup
; location has insufficient resources, Vertica displays an error message explaining the shortage and
; cancels the backup. If Vertica cannot determine the amount of available space
; or number of inodes in the backupDir, it displays a warning and continues
; with the backup.
; enableFreeSpaceCheck = True

[Transmission]
; The total bandwidth limit for all restore connections in KBPS, 0 for unlimited
; total_bwlimit_restore = 0

; The maximum number of restore TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_restore = 1

; Total bandwidth limit for all backup connections in KBPS, 0 for unlimited. Vertica distributes
; this bandwidth evenly among the number of connections set in concurrency_backup.
; total_bwlimit_backup = 0

; The maximum number of backup TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_backup = 1

; The maximum number of delete TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_delete = 16

[Database]
; Vertica user name for vbr to connect to the database.
; This setting is rarely needed since dbUser is normally identical to the database administrator
; dbUser = current_username

2.5 - Object-level local backup/restore in Enterprise Mode

An object backup backs up only the schemas or tables that are specified in the [Misc] section by the parameter objects, or parameters includeObjects and excludeObjects.

backup_restore_object_local.ini

An object backup backs up only the schemas or tables that are specified in the [Misc] section by the parameter objects, or parameters includeObjects and excludeObjects.

For an object restore, use the same configuration file that you used to create the backup, and specify the objects to restore with the vbr command-line parameter --restore-objects.

; This sample vbr configuration file shows object-level backup and restore
; using a file system on each database host for that host's backup.
; Section headings are enclosed by square brackets.
; Comments have leading semicolons (;) or pound signs (#).
; Option and values are separated by an equal sign.
; Only arguments marked as '!!Mandatory!!' must be specified explicitly.
; All commented parameters are set to their default value.

; ------------------------------------------- ;
;;; BASIC PARAMETERS ;;;
; ------------------------------------------- ;

[Mapping]
; There must be one [Mapping] section for all of the nodes in your database cluster.
; !!Mandatory!! Backup host name (no default) and Backup directory (no default)
; node_name = backup_host:backup_dir
; [] indicates backup to localhost
v_exampledb_node0001 = []:/home/dbadmin/backups
v_exampledb_node0002 = []:/home/dbadmin/backups
v_exampledb_node0003 = []:/home/dbadmin/backups
v_exampledb_node0004 = []:/home/dbadmin/backups

[Misc]
; !!Recommended!! Snapshot name.  Object and full backups should always have different snapshot names.
; Backups with the same snapshotName form a time sequence limited by restorePointLimit.
; SnapshotName is used for naming archives in the backup directory, and for monitoring and troubleshooting.
; Valid values: a-z A-Z 0-9 - _
; snapshotName = backup_snapshot

; Specifies how Vertica handles objects of the same name when restoring schema or table backups.
; objectRestoreMode = createOrReplace

; Specifies which tables and/or schemas to copy. For tables, the containing schema defaults to public.
; Note: 'objects' is incompatible with 'includeObjects' and 'excludeObjects'.
; (no default)
objects = mytable, myschema, myothertable

; Specifies the set of objects to backup/restore; wildcards may be used.
; Note: 'includeObjects' is incompatible with 'objects'.
; includeObjects = public.mytable, customer*, s?

; Subtracts from the set of objects to backup/restore; wildcards may be used
; Note: 'excludeObjects' is incompatible with 'objects'.
; excludeObjects = public.*temp, etl.phase?

[Database]
; !!Recommended!! If you have more than one database defined on this Vertica cluster, use this parameter to specify which database to backup/restore.
; dbName = current_database

; If this parameter is True, vbr will prompt user for database password every time.
; If set to False, specify location of password config file in 'passwordFile' parameter in [Misc] section.
; dbPromptForPassword = True

; ------------------------------------------- ;
;;; ADVANCED PARAMETERS ;;;
; ------------------------------------------- ;

[Misc]
; The temp directory location on all database hosts.
; The directory must be readable and writeable by the dbadmin, and must implement POSIX style fcntl lockf locking.
; tempDir = /tmp/vbr

; Specifies the number of historical backups to retain in addition to the most recent backup.
; 1 current + n historical backups
; restorePointLimit = 1

; Full path to the password configuration file
; Store this file in directory readable only by the dbadmin.
; (no default)
; passwordFile = /path/to/vbr/pw.txt

; When enabled, Vertica confirms that the specified backup locations contain
; sufficient free space and inodes to allow a successful backup. If a backup
; location has insufficient resources, Vertica displays an error message explaining the shortage and
; cancels the backup. If Vertica cannot determine the amount of available space
; or number of inodes in the backupDir, it displays a warning and continues
; with the backup.
; enableFreeSpaceCheck = True

[Transmission]
; The total bandwidth limit for all restore connections in KBPS, 0 for unlimited
; total_bwlimit_restore = 0

; The maximum number of restore TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_restore = 1

; Total bandwidth limit for all backup connections in KBPS, 0 for unlimited. Vertica distributes
; this bandwidth evenly among the number of connections set in concurrency_backup.
; total_bwlimit_backup = 0

; The maximum number of backup TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_backup = 1

; The maximum number of delete TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_delete = 16

[Database]
; Vertica user name for vbr to connect to the database.
; This setting is rarely needed since dbUser is normally identical to the database administrator.
; dbUser = current_username

2.6 - Restore object from backup to an alternate cluster

object_restore_to_other_cluster.ini

; This sample vbr configuration file shows object restore to another cluster from an existing full or object backup.
; To restore objects from an existing backup(object or full), you must use the "--restore-objects" vbr command line option.
; Section headings are enclosed by square brackets.
; Comments have leading semicolons (;) or pound signs (#).
; An equal sign separates options and values.
; Specify arguments marked '!!Mandatory!!' explicitly.
; All commented parameters are set to their default value.

; ------------------------------------------- ;
;;; BASIC PARAMETERS ;;;
; ------------------------------------------- ;

[Mapping]
; There must be one [Mapping] section for all of the nodes in your database cluster.
; !!Mandatory!! Backup host name (no default) and Backup directory (no default)
; node_name = backup_host:backup_dir
v_exampledb_node0001 = backup_host0001:/home/dbadmin/backups
v_exampledb_node0002 = backup_host0002:/home/dbadmin/backups
v_exampledb_node0003 = backup_host0003:/home/dbadmin/backups
v_exampledb_node0004 = backup_host0004:/home/dbadmin/backups

[NodeMapping]
; !!Recommended!! This section is required when performing an object restore from a full/object backup to a different cluster and node names are different between source (backup) and destination (restoring) databases.
v_sourcedb_node0001 = v_exampledb_node0001
v_sourcedb_node0002 = v_exampledb_node0002
v_sourcedb_node0003 = v_exampledb_node0003
v_sourcedb_node0004 = v_exampledb_node0004

[Database]
; !!Recommended!! If you have more than one database defined on this Vertica cluster, use this parameter to specify which database to backup/restore.
; dbName = current_database

; If this parameter is True, vbr prompts the user for database password every time.
; If False, specify location of password config file in 'passwordFile' parameter in [Misc] section.
; dbPromptForPassword = True

; ------------------------------------------- ;
;;; ADVANCED PARAMETERS ;;;
; ------------------------------------------- ;

[Misc]
; !!Recommended!! Snapshot name.
; SnapshotName is useful for monitoring and troubleshooting.
; Valid characters: a-z A-Z 0-9 - _
; snapshotName = backup_snapshot

; Specifies how Vertica handles objects of the same name when restoring schema or table backups.  Options are coexist, createOrReplace or create.
; objectRestoreMode = createOrReplace

; The temp directory location on all database hosts.
; The directory must be readable and writeable by the dbadmin, and must implement POSIX style fcntl lockf locking.
; tempDir = /tmp/vbr

; Full path to the password configuration file.
; Store this file in a directory only readable by the dbadmin.
; (no default)
; passwordFile = /path/to/vbr/pw.txt

; When enabled, Vertica confirms that the specified backup locations contain
; sufficient free space and inodes to allow a successful backup. If a backup
; location has insufficient resources, Vertica displays an error message and
; cancels the backup. If Vertica cannot determine the amount of available space
; or number of inodes in the backupDir, it displays a warning and continues
; with the backup.
; enableFreeSpaceCheck = True

[Transmission]
; Sets options for transmitting the data when using backup hosts.

; Specifies the default port number for the rsync protocol.
; port_rsync = 50000

; The total bandwidth limit for all restore connections in KBPS, 0 for unlimited
; total_bwlimit_restore = 0

; The maximum number of backup TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_restore = 1

[Database]
; Vertica user name for vbr to connect to the database.
; This setting is rarely needed since dbUser is normally identical to the database administrator.
; dbUser = current_username

2.7 - Object replication to an alternate database

replicate.ini

; This sample vbr configuration file shows the replicate vbr task.
; Section headings are enclosed by square brackets.
; Comments have leading semicolons (;) or pound signs (#).
; An equal sign separates options and values.
; Specify arguments marked '!!Mandatory!!' explicitly.
; All commented parameters are set to their default value.

; ------------------------------------------- ;
;;; BASIC PARAMETERS ;;;
; ------------------------------------------- ;

[Mapping]
; There must be one [Mapping] section for all of the nodes in your database cluster.
; !!Mandatory!! Target host name (no default)
; node_name = new_host
v_exampledb_node0001 = destination_host0001
v_exampledb_node0002 = destination_host0002
v_exampledb_node0003 = destination_host0003
v_exampledb_node0004 = destination_host0004

[Misc]
; !!Recommended!! Snapshot name.
; SnapshotName is useful for monitoring and troubleshooting.
; Valid characters: a-z A-Z 0-9 - _
; snapshotName = backup_snapshot


; Specifies which tables and/or schemas to copy.  For tables, the containing schema defaults to public.
; objects for replication. You must specify only one of either objects or includeObjects.
; Use comma-separated list for multiple objects
; (no default)
objects = mytable, myschema, myothertable

; Specifies the set of objects to replicate; wildcards may be used.
; Note: 'includeObjects' is incompatible with 'objects'.
; includeObjects = public.mytable, customer*, s?

; Subtracts from the set of objects to replicate; wildcards may be used
; Note: 'excludeObjects' is incompatible with 'objects'.
; excludeObjects = public.*temp, etl.phase?

; Specifies how Vertica handles objects of the same name when copying schema or tables.
; objectRestoreMode = createOrReplace

[Database]
; !!Recommended!! If you have more than one database defined on this Vertica cluster, use this parameter to specify which database to replicate.
; dbName = current_database

; If this parameter is True, vbr prompts the user for the database password every time.
; If False, specify the location of password config file in 'passwordFile' parameter in [Misc] section.
; dbPromptForPassword = True

; !!Mandatory!! These settings are all mandatory for replication. None of which have defaults.
dest_dbName = target_db
dest_dbUser = dbadmin
dest_dbPromptForPassword = True

; ------------------------------------------- ;
;;; ADVANCED PARAMETERS ;;;
; ------------------------------------------- ;

[Misc]
; The temp directory location on all database hosts.
; The directory must be readable and writeable by the dbadmin, and must implement POSIX style fcntl lockf locking.
; tempDir = /tmp/vbr

; Full path to the password configuration file containing database password credentials
; Store this file in directory readable only by the dbadmin.
; (no default)
; passwordFile = /path/to/vbr/pw.txt

; Specifies the service name of the Vertica Kerberos principal. This only applies to HDFS.
; kerberos_service_name = vertica

; Specifies the realm (authentication domain) of the Vertica Kerberos principal. This only applies to HDFS.
; kerberos_realm = your_auth_domain

; Specifies the location of the keytab file which contains the credentials for the Vertica Kerberos principal. This only applies to HDFS.
; kerberos_keytab_file = /path/to/keytab_file

; Specifies the location of the Hadoop XML configuration files of the HDFS clusters. Only set this when your cluster is on HA. This only applies to HDFS.
; If you have multiple conf directories, please separate them with ':'.
; hadoop_conf_dir = /path/to/conf or /path/to/conf1:/path/to/conf2

[Transmission]
; Specifies the default port number for the rsync protocol.
; port_rsync = 50000

; Total bandwidth limit for all backup connections in KBPS, 0 for unlimited. Vertica distributes
; this bandwidth evenly among the number of connections set in concurrency_backup.
; total_bwlimit_backup = 0

; The maximum number of replication TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_backup = 1

; The maximum number of restore TCP rsync connection threads per node.
; Results vary depending on environment, but values between 2 and 16 are sometimes quite helpful.
; concurrency_restore = 1

; The maximum number of delete TCP rsync connection threads per node.
; Results vary depending on environment, but values between 2 and 16 are sometimes quite helpful.
; concurrency_delete = 16

[Database]
; Vertica user name for vbr to connect to the database.
; This is very rarely be needed since dbUser is normally identical to the database administrator.
; dbUser = current_username

2.8 - Database copy to an alternate cluster

copycluster.ini

; This sample vbr configuration file is configured for the copycluster vbr task.
; Copycluster supports full database copies only, not specific objects.
; Section headings are enclosed by square brackets.
; Comments have leading semicolons (;) or pound signs (#).
; An equal sign separates options and values.
; Specify arguments marked '!!Mandatory!!' explicitly.
; All commented parameters are set to their default value.

; ------------------------------------------- ;
;;; BASIC PARAMETERS ;;;
; ------------------------------------------- ;

[Mapping]
; For each node of the source database, there must be a [Mapping] entry specifying the corresponding hostname of the destination database node.
; !!Mandatory!!  node_name = new_host/ip  (no defaults)
v_exampledb_node0001 = destination_host1.example
v_exampledb_node0002 = destination_host2.example
v_exampledb_node0003 = destination_host3.example
v_exampledb_node0004 = destination_host4.example
; v_exampledb_node0001 = 10.0.90.17
; v_exampledb_node0002 = 10.0.90.18
; v_exampledb_node0003 = 10.0.90.19
; v_exampledb_node0004 = 10.0.90.20

[Database]
; !!Recommended!! If you have more than one database defined on this Vertica cluster, use this parameter to specify which database to copy.
; dbName = current_database

; If this parameter is True, vbr prompts the user for the database password every time.
; If False, specify the location of password config file in 'passwordFile' parameter in [Misc] section.
; dbPromptForPassword = True

; ------------------------------------------- ;
;;; ADVANCED PARAMETERS ;;;
; ------------------------------------------- ;

[Misc]
; !!Recommended!! Snapshot name.
; SnapshotName is used for monitoring and troubleshooting.
; Valid characters: a-z A-Z 0-9 - _
; snapshotName = backup_snapshot

; The temp directory location on all database hosts.
; The directory must be readable and writeable by the dbadmin, and must implement POSIX style fcntl lockf locking.
; tempDir = /tmp/vbr

; Full path to the password configuration file containing database password credentials
; Store this file in directory readable only by the dbadmin.
; (no default)
; passwordFile = /path/to/vbr/pw.txt

[Transmission]
; Specifies the default port number for the rsync protocol.
; port_rsync = 50000

; Total bandwidth limit for all copycluster connections in KBPS, 0 for unlimited. Vertica distributes
; this bandwidth evenly among the number of connections set in concurrency_backup.
; total_bwlimit_backup = 0

; The maximum number of backup TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_backup = 1

; The maximum number of restore TCP rsync connection threads per node.
; Results vary depending on environment, but values between 2 and 16 are sometimes quite helpful.
; concurrency_restore = 1

; The maximum number of delete TCP rsync connection threads per node.
; Results vary depending on environment, but values between 2 and 16 are sometimes quite helpful.
; concurrency_delete = 16

[Database]
; Vertica user name for vbr to connect to the database.
; This setting is rarely needed since dbUser is normally identical to the database administrator
; dbUser = current_username

2.9 - Password file

Unlike other configuration (.ini) files, the password configuration file must be referenced by another configuration file, through its passwordFile parameter.

password.ini

Unlike other configuration (.ini) files, the password configuration file must be referenced by another configuration file, through its passwordFile parameter.

; This is a sample password configuration file.
; Point to this file in the 'passwordFile' parameter of the [Misc] section.
; Section headings are enclosed by square brackets.
; Comments have leading semicolons (;) or pound signs (#).
; Option and values are separated by an equal sign.

[Passwords]
; The database administrator's password, and used if dbPromptForPassword is False.
; dbPassword=myDBsecret

; The password for the rsync user account.
; serviceAccessPass=myrsyncpw

; The password for the dest_dbuser Vertica account, for replication tasks only.
; dest_dbPassword=destDBsecret

3 - Eon Mode database requirements

Eon Mode databases perform the same backup and restore operations as Enterprise Mode databases.

Eon Mode databases perform the same backup and restore operations as Enterprise Mode databases. Additional requirements pertain to Eon Mode because it uses a different architecture.

Eon Mode databases also support saving in-db restore points, which are copy-free backups that enable you to roll back a database to a previous state. Unlike vbr-based backups, restore points are stored in-database and do not require additional data copies to be stored externally. However, because restore points are in-database, they are lost if the database's communal storage is compromised. For more information about restore points, see Revive an Eon DB.

Note

These requirements are for cloud storage locations listed in Backing up and restoring the database, and on-premises with communal storage on HDFS.

Cloud storage requirements

Eon Mode databases must be backed up to supported cloud storage locations. The following [CloudStorage] configuration parameters must be set:

cloud_storage_backup_path
cloud_storage_backup_file_system_path

A backup path is valid for one database only. You cannot use the same path to store backups for multiple databases.

Eon Mode databases that use S3-compatible on-premises cloud storage can back up to Amazon Web Services (AWS) S3.

Cloud storage access

In addition to having access to the cloud storage bucket used for the database's communal storage, you must have access to the cloud storage backup location. Verify that the credential you use to access communal storage also has access to the backup location. For more information about configuring cloud storage access for Vertica, see Configuring cloud storage backups.

Note

While an AWS backup location can be in a different region, backup and restore operations across different S3 regions are incompatible with virtual private cloud (VPC) endpoints.

Eon on-premises and private cloud storage

If an Eon database runs on-premises, then communal storage is not on AWS but on another storage platform that uses the S3 or GS protocol. This means there can be two endpoints and two sets of credentials, depending on where you back up. This additional information is stored in environment variables, and not in vbr configuration parameters.

Backups of Eon Mode on-premises databases do not support AWS IAM profiles.

HDFS on-premises storage

To back up an Eon Mode database that uses HDFS on-premises storage, the communal storage and backup location must use the same HDFS credentials and domain. All vbr operations are supported, except copycluster.

Vertica supports Kerberos authentication, High Availability Name Node, and wire encryption for vbr operations. Vertica does not support at-rest encryption for Hadoop storage.

For details, see Configuring backups to and from HDFS.

Database restore requirements

When restoring a backup of an Eon Mode database, the target database must satisfy the following requirements:

Share the same name as the source database.
Have at least as many nodes as the primary subcluster(s) in the source database.
Have the same node names as the nodes of the source database.
Use the same catalog directory location as the source database.
Use the same port numbers as the source database.
For object-level restore, if you restore to an existing target namespace, the target namespace and the objects' source namespace must have the same shard count, shard boundaries, and node subscriptions. For details, see object-level tasks with multiple namespaces.

You can restore a full or object backup that was taken from a database with primary and secondary subclusters to the primary subclusters in the target database. The database can have only primary subclusters, or it can also have any number of secondary subclusters. Secondary subclusters do not need to match the backup database. The same is true for replicating a database; only the primary subclusters are required. The requirements are similar to those for Revive with commuanal storage.

Use the [Mapping] section in the configuration file to specify the mappings for the primary subcluster.

Object-level tasks with multiple namespaces

Eon Mode databases group schemas and tables into one or more namespaces. By default, Eon databases contain only one namespace, default_namespace, which is created during database creation. Unless you have created additional namespaces, the default_namespace contains all schemas and tables. If you do not specify the namespace of an object, vbr assumes the object belongs to the default_namespace. Full database vbr tasks are unaffected by the number of namespaces.

Important

For vbr tasks, namespaces are prefixed with a period. For example, .n.s.t refers to table t in schema s in namespace n.

For object-level backups, you can specify the included objects in the objects parameter of your vbr configuration file. For example, to create an object-level backup of all objects in the orders and customers schemas in the store_1 namespace, add the following lines to your configuration file:

objects = .store_1.orders*, .store_1.customers.*

Alternatively, you can specify the included and excluded objects using the includeObjects and excludeObjects parameters. If you set these parameters, the objects parameter must be empty.

For object-level restore and replicate vbr tasks, you can use the --target-namespace argument to specify the namespace to which the objects are restored or replicated.

vbr behaves differently depending on whether the target namespace exists:

Exists: vbr attempts to restore or replicate the objects to the existing namespace, which must have the same shard count, shard boundaries, and node subscriptions as the source namespace. If these conditions are not met, the vbr task fails.
Nonexistent: vbr creates a namespace in the target database with the name specified in --target-namespace and the shard count of the source namespace, and then replicates or restores the objects to that namespace.

If no target namespace is specified, vbr attempts to restore or replicate objects to a namespace with the same name as the source namespace.

You can specify how restore operations handle duplicate objects with objectRestoreMode parameter in the vbr configuration file.

The following command restores the store_1.orders schema of the source database to the store_2 namespace in the target database:

$ vbr --task restore --config-file=db.ini --restore-objects=.store_1.orders.* --target-namespace=store_2

If no target namespace is specified, vbr attempts to restore the objects to a namespace with the same name as the source namespace. For example, you can omit the --target-namespace=store_1 argument when restoring the store_1.orders schema to the store_1 namespace:

$ vbr --task restore --config-file=db.ini --restore-objects=.store_1.orders.* 

Restoring a database with multiple communal storage locations

You can back up and restore Eon Mode databases that have multiple communal storage locations. Both object-level and full database restore operations are supported:

Full database restore: the result of the restore operation depends on whether you are restoring to the same communal storage locations from which you performed the backup:
- Same communal storage locations: vbr attempts to copy all data to the communal storage locations from which they were backed up. If a storage location has been dropped since the backup was taken, the restore operation attempts to reinstate the dropped location before restoring the data. If the dropped storage location cannot be reinstated, its associated data is copied to the main communal storage location.
- Different communal storage location: all data is copied to the communal storage location specified in the vbr configuration file. Regardless of how many communal storage locations existed before the restore, there will be only one communal storage location after the full restore.
Object restore: the location to which an object is restored depends on whether it has an existing storage policy in the target database:
- Storage policy: vbr restores the object to the communal storage location specified by the object's highest priority storage policy, which is determined by the following hierarchy, listed from highest priority to lowest:
  1. Table-level policy
  2. Schema-level policy
  3. Database-level policy When the communal storage location specified by the highest priority policy does not exist, vbr attempts to execute the policy with the next highest priority. If none of the policies are valid, the object is restored to the main communal storage location.
- No storage policy: the object is copied to the main communal storage location.

For details on creating and configuring storage policies for multiple communal storage locations, see Configuring your Vertica cluster for Eon Mode.

4 - Requirements for backing up and restoring HDFS storage locations

There are several considerations for backing up and restoring HDFS storage locations:.

There are several considerations for backing up and restoring HDFS storage locations:

The HDFS directory for the storage location must have snapshotting enabled. You can either directly configure this yourself or enable the database administrator’s Hadoop account to do it for you automatically. See Hadoop configuration for backup and restore for more information.
If the Hadoop cluster uses Kerberos, Vertica nodes must have access to certain Hadoop configuration files. See Configuring Kerberos below.
To restore an HDFS storage location, your Vertica cluster must be able to run the Hadoop distcp command. See Configuring distcp on a Vertica Cluster below.
HDFS storage locations do not support object-level backups. You must perform a full database backup to back up the data in your HDFS storage locations.
Data in an HDFS storage location is backed up to HDFS. This backup guards against accidental deletion or corruption of data. It does not prevent data loss in the case of a catastrophic failure of the entire Hadoop cluster. To prevent data loss, you must have a backup and disaster recovery plan for your Hadoop cluster.

Data stored on the Linux native file system is still backed up to the location you specify in the backup configuration file. It and the data in HDFS storage locations are handled separately by the vbr backup script.

Configuring Kerberos

If HDFS uses Kerberos, then to back up your HDFS storage locations you must take the following additional steps:

Grant Hadoop superuser privileges to the Kerberos principals for each Vertica node.
Copy Hadoop configuration files to your database nodes as explained in Accessing Hadoop Configuration Files. Vertica needs access to core-site.xml, hdfs-site.xml, and yarn-site.xml for backup and restore. If your Vertica nodes are co-located on HDFS nodes, these files are already present.
Set the HadoopConfDir parameter to the location of the directory containing these files. The value can be a path, if the files are in multiple directories. For example:
```
=> ALTER DATABASE exampledb SET HadoopConfDir = '/etc/hadoop/conf:/etc/hadoop/test';
```
All three configuration files must be present on this path on every database node.

If your Vertica nodes are co-located on HDFS nodes and you are using Kerberos, you must also change some Hadoop configuration parameters. These changes are needed in order for restoring from backups to work. In yarn-site.xml on every Vertica node, set the following parameters:

Parameter	Value
`yarn.resourcemanager.proxy-user-privileges.enabled`	true
`yarn.resourcemanager.proxyusers.*.groups`
`yarn.resourcemanager.proxyusers.*.hosts`
`yarn.resourcemanager.proxyusers.*.users`
`yarn.timeline-service.http-authentication.proxyusers.*.groups`
`yarn.timeline-service.http-authentication.proxyusers.*.hosts`
`yarn.timeline-service.http-authentication.proxyusers.*.users`

No changes are needed on HDFS nodes that are not also Vertica nodes.

Configuring distcp on a Vertica cluster

Your Vertica cluster must be able to run the Hadoop distcp command to restore a backup of an HDFS storage location. The easiest way to enable your cluster to run this command is to install several Hadoop packages on each node. These packages must be from the same distribution and version of Hadoop that is running on your Hadoop cluster.

The steps you need to take depend on:

The distribution and version of Hadoop running on the Hadoop cluster containing your HDFS storage location.
The distribution of Linux running on your Vertica cluster.

Note

Installing the Hadoop packages necessary to run distcp does not turn your Vertica database into a Hadoop cluster. This process installs just enough of the Hadoop support files on your cluster to run the distcp command. There is no additional overhead placed on the Vertica cluster, aside from a small amount of additional disk space consumed by the Hadoop support files.

Configuration overview

The steps for configuring your Vertica cluster to restore backups for HDFS storage location are:

If necessary, install and configure a Java runtime on the hosts in the Vertica cluster.
Find the location of your Hadoop distribution's package repository.
Add the Hadoop distribution's package repository to the Linux package manager on all hosts in your cluster.
Install the necessary Hadoop packages on your Vertica hosts.
Set two configuration parameters in your Vertica database related to Java and Hadoop.
Confirm that the Hadoop distcp command runs on your Vertica hosts.

The following sections describe these steps in greater detail.

Installing a Java runtime

Your Vertica cluster must have a Java Virtual Machine (JVM) installed to run the Hadoop distcp command. It already has a JVM installed if you have configured it to:

Execute user-defined extensions developed in Java. See Developing user-defined extensions (UDxs) for more information.
Access Hadoop data using the HCatalog Connector. See Using the HCatalog Connector for more information.

If your Vertica database has a JVM installed, verify that your Hadoop distribution supports it. See your Hadoop distribution's documentation to determine which JVMs it supports.

If the JVM installed on your Vertica cluster is not supported by your Hadoop distribution you must uninstall it. Then you must install a JVM that is supported by both Vertica and your Hadoop distribution. See Vertica SDKs for a list of the JVMs compatible with Vertica.

If your Vertica cluster does not have a JVM (or its existing JVM is incompatible with your Hadoop distribution), follow the instructions in Installing the Java runtime on your Vertica cluster.

Finding your Hadoop distribution's package repository

Many Hadoop distributions have their own installation system, such as Cloudera Manager or Ambari. However, they also support manual installation using native Linux packages such as RPM and .deb files. These package files are maintained in a repository. You can configure your Vertica hosts to access this repository to download and install Hadoop packages.

Consult your Hadoop distribution's documentation to find the location of its Linux package repository. This information is often located in the portion of the documentation covering manual installation techniques.

Each Hadoop distribution maintains separate repositories for each of the major Linux package management systems. Find the specific repository for the Linux distribution running your Vertica cluster. Be sure that the package repository that you select matches the version used by your Hadoop cluster.

Configuring Vertica nodes to access the Hadoop Distribution’s package repository

Configure the nodes in your Vertica cluster so they can access your Hadoop distribution's package repository. Your Hadoop distribution's documentation should explain how to add the repositories to your Linux platform. If the documentation does not explain how to add the repository to your packaging system, refer to your Linux distribution's documentation.

The steps you need to take depend on the package management system your Linux platform uses. Usually, the process involves:

Downloading a configuration file.
Adding the configuration file to the package management system's configuration directory.
For Debian-based Linux distributions, adding the Hadoop repository encryption key to the root account keyring.
Updating the package management system's index to have it discover new packages.

You must add the Hadoop repository to all hosts in your Vertica cluster.

Installing the required Hadoop packages

After configuring the repository, you are ready to install the Hadoop packages. The packages you need to install are:

hadoop
hadoop-hdfs
hadoop-client

The names of the packages are usually the same across all Hadoop and Linux distributions. These packages often have additional dependencies. Always accept any additional packages that the Linux package manager asks to install.

To install these packages, use the package manager command for your Linux distribution. The package manager command you need to use depends on your Linux distribution:

On Red Hat and CentOS, the package manager command is yum.
On Debian and Ubuntu, the package manager command is apt-get.
On SUSE the package manager command is zypper.

Consult your Linux distribution's documentation for instructions on installing packages.

Setting configuration parameters

You must set two Hadoop configuration parameters to enable Vertica to restore HDFS data:

JavaBinaryForUDx is the path to the Java executable. You may have already set this value to use Java UDxs or the HCatalog Connector. You can find the path for the default Java executable from the Bash command shell using the command:
```
$ which java
```
HadoopHome is the directory that contains bin/hadoop (the bin directory containing the Hadoop executable file). The default value for this parameter is /usr. The default value is correct if your Hadoop executable is located at /usr/bin/hadoop.

The following example shows how to set and then review the values of these parameters:

=> ALTER DATABASE DEFAULT SET PARAMETER JavaBinaryForUDx = '/usr/bin/java';
=> SELECT current_value FROM configuration_parameters WHERE parameter_name = 'JavaBinaryForUDx';
 current_value
---------------
 /usr/bin/java
(1 row)
=> ALTER DATABASE DEFAULT SET HadoopHome = '/usr';
=> SELECT current_value FROM configuration_parameters WHERE parameter_name = 'HadoopHome';
 current_value
---------------
 /usr
(1 row)

You can also set the following parameters:

HadoopFSReadRetryTimeout and HadoopFSWriteRetryTimeout specify how long to wait before failing. The default value for each is 180 seconds. If you are confident that your file system will fail more quickly, you can improve performance by lowering these values.
HadoopFSReplication specifies the number of replicas HDFS makes. By default, the Hadoop client chooses this; Vertica uses the same value for all nodes.

Caution
Do not change this setting unless directed otherwise by Vertica support.
HadoopFSBlockSizeBytes is the block size to write to HDFS; larger files are divided into blocks of this size. The default is 64MB.

Confirming that distcp runs

After the packages are installed on all hosts in your cluster, your database should be able to run the Hadoop distcp command. To test it:

Log into any host in your cluster as the database superuser.
At the Bash shell, enter the command:
```
$ hadoop distcp
```

The command should print a message similar to the following:

usage: distcp OPTIONS [source_path...] <target_path>
              OPTIONS
 -async                 Should distcp execution be blocking
 -atomic                Commit all changes or none
 -bandwidth <arg>       Specify bandwidth per map in MB
 -delete                Delete from target, files missing in source
 -f <arg>               List of files that need to be copied
 -filelimit <arg>       (Deprecated!) Limit number of files copied to <= n
 -i                     Ignore failures during copy
 -log <arg>             Folder on DFS where distcp execution logs are
                        saved
 -m <arg>               Max number of concurrent maps to use for copy
 -mapredSslConf <arg>   Configuration for ssl config file, to use with
                        hftps://
 -overwrite             Choose to overwrite target files unconditionally,
                        even if they exist.
 -p <arg>               preserve status (rbugpc)(replication, block-size,
                        user, group, permission, checksum-type)
 -sizelimit <arg>       (Deprecated!) Limit number of files copied to <= n
                        bytes
 -skipcrccheck          Whether to skip CRC checks between source and
                        target paths.
 -strategy <arg>        Copy strategy to use. Default is dividing work
                        based on file sizes
 -tmp <arg>             Intermediate work path to be used for atomic
                        commit
 -update                Update target, copying only missingfiles or
                        directories

Repeat these steps on the other hosts in your database to verify that all of the hosts can run distcp.

Troubleshooting

If you cannot run the distcp command, try the following steps:

If Bash cannot find the hadoop command, you may need to manually add Hadoop's bin directory to the system search path. An alternative is to create a symbolic link in an existing directory in the search path (such as /usr/bin) to the hadoop binary.
Ensure the version of Java installed on your Vertica cluster is compatible with your Hadoop distribution.
Review the Linux package installation tool's logs for errors. In some cases, packages may not be fully installed, or may not have been downloaded due to network issues.
Ensure that the database administrator account has permission to execute the hadoop command. You might need to add the account to a specific group in order to allow it to run the necessary commands.

5 - Setting up backup locations

Full and object-level backups reside on backup hosts, the computer systems on which backups and archives are stored.

Important

Full and object-level backups reside on backup hosts, the computer systems on which backups and archives are stored. On the backup hosts, Vertica saves backups in a specific backup location (directory).

You must set up your backup hosts before you can create backups.

The storage format type at your backup locations must support fcntl lockf (POSIX) file locking.

5.1 - Configuring backup hosts and connections

You use vbr to back up your database to one or more hosts (known as backup hosts) that can be outside of your database cluster.

You use vbr to back up your database to one or more hosts (known as backup hosts) that can be outside of your database cluster.

You can use one or more backup hosts or a single cloud storage bucket to back up your database. Use the vbr configuration file to specify which backup host each node in your cluster should use.

Before you back up to hosts outside of the local cluster, configure the target backup locations to work with vbr. The backup hosts you use must:

Have sufficient backup disk space.
Be accessible from your database cluster through SSH.
Have passwordless SSH access for the Database Administrator account.
Have either the Vertica rpm or Python 3.7 and rsync 3.0.5 or later installed.
If you are using a stateful firewall, configure your tcp_keepalive_time and tcp_keepalive_intvl sysctl settings to use values less than your firewall timeout value.

Configuring TCP forwarding on database hosts

vbr depends on TCP forwarding to forward connections from database hosts to backup hosts. For copycluster and replication tasks, you must enable TCP forwarding on both sets of hosts. SSH connections to backup hosts do not require SSH forwarding.

If it is not already set by default, set AllowTcpForwarding = Yes in /etc/ssh/sshd_config and then send a SIGHUP signal to sshd on each host. See the Linux sshd documentation for more information.

If TCP forwarding is not enabled, tasks requiring it fail with the following message: "Errors connecting to remote hosts: Check SSH settings, and that the same Vertica version is installed on all nodes."

On a single-node cluster, vbr uses a random high-number port to create a local ssh tunnel. This fails if PermitOpen is set to restrict the port. Comment out the PermitOpen line in sshd_config.

Creating configuration files for backup hosts

Create separate configuration files for full or object-level backups, using distinct names for each configuration file. Also, use the same node, backup host, and directory location pairs. Specify different backup directory locations for each database.

Note

For optimal network performance when creating a backup, Vertica recommends that you give each node in the cluster its own dedicated backup host.

Preparing backup host directories

Before vbr can back up a database, you must prepare the target backup directory. Run vbr with a task type of init to create the necessary manifests for the backup process. You need to perform the init process only once. After that, Vertica maintains the manifests automatically.

Estimating backup host disk requirements

Wherever you plan to save data backups, consider the disk requirements for historical backups at your site. Also, if you use more than one archive, multiple archives potentially require more disk space. Vertica recommends that each backup host have space for at least twice the database node footprint size. Follow this recommendation regardless of the specifics of your site's backup schedule and retention requirements.

To estimate the database size, use the used_bytes column of the storage_containers system table as in the following example:

=> SELECT SUM(used_bytes) FROM storage_containers WHERE node_name='v_mydb_node0001';
total_size
------------
  302135743
(1 row)

Making backup hosts accessible

You must verify that any firewalls between the source database nodes and the target backup hosts allow connections for SSH and rsync on port 50000.

The backup hosts must be running identical versions of rsync and Python as those supplied in the Vertica installation package.

Setting up passwordless SSH access

For vbr to access a backup host, the database superuser must meet two requirements:

Have an account on each backup host, with write permissions to the backup directory.
Have passwordless SSH access from each database cluster host to the corresponding backup host.

How you fulfill these requirements depends on your platform and infrastructure.

SSH access among the backup hosts and access from the backup host to the database node is not necessary.

If your site does not use a centralized login system (such as LDAP), you can usually add a user with the useradd command or through a GUI administration tool. See the documentation for your Linux distribution for details.

If your platform supports it, you can enable passwordless SSH logins using the ssh-copy-id command to copy a database administrator's SSH identity file to the backup location from one of your database nodes. For example, to copy the SSH identity file from a node to a backup host named backup01:

$ ssh-copy-id -i dbadmin@backup01|
Password:

Try logging into the machine with "ssh dbadmin@backup01". Then, check the contents of the ~/.ssh/authorized_keysfile to verify that you have not added extra keys that you did not intend to include.

$ ssh backup01
Last login: Mon May 23 11:44:23 2011 from host01

Repeat the steps to copy a database administrator's SSH identity to all backup hosts you use to back up your database.

After copying a database administrator's SSH identity, you should be able to log in to the backup host from any of the nodes in the cluster without being prompted for a password.

Increasing the SSH maximum connection settings for a backup host

If your configuration requires backing up multiple nodes to one backup host (n:1), increase the number of concurrent SSH connections to the SSH daemon (sshd). By default, the number of concurrent SSH connections on each host is 10, as set in the sshd_config file with the MaxStartups keyword. The MaxStartups value for each backup host should be greater than the total number of hosts being backed up to this backup host. For more information on configuring MaxStartups, refer to the man page for that parameter.

5.2 - Configuring hard-link local backup hosts

When specifying the backupHost parameter for your hard-link local configuration files, use the database host names (or IP addresses) as known to admintools.

When specifying the backupHost parameter for your hard-link local configuration files, use the database host names (or IP addresses) as known to admintools. Do not use the node names. Host names (or IP addresses) are what you used when setting up the cluster. Do not use localhost for the backupHost parameter.

Listing host names

To query node names and host names:

=> SELECT node_name, host_name FROM node_resources;
    node_name     |   host_name 
------------------+----------------
 v_vmart_node0001 | 192.168.223.11
 v_vmart_node0002 | 192.168.223.22
 v_vmart_node0003 | 192.168.223.33
(3 rows)

Because you are creating a local backup, use square brackets [ ] to map the host to the local host. For more information, refer to [mapping].

[Mapping]
v_vmart_node0001 = []:/home/dbadmin/data/backups
v_vmart_node0002 = []:/home/dbadmin/data/backups
v_vmart_node0003 = []:/home/dbadmin/data/backups

5.3 - Configuring cloud storage backups

Backing up an Enterprise Mode or Eon Mode database to a supported cloud storage location requires that you add parameters to the backup configuration file.

Backing up an Enterprise Mode or Eon Mode database to a supported cloud storage location requires that you add parameters to the backup configuration file. You can create these backups from the local cluster or from your cloud provider's virtual servers. Additional cloud storage configuration is required to configure authentication and encryption.

Configuration file requirements

To back up any Eon Mode or Enterprise Mode cluster to a cloud storage destination, the backup configuration file must include a [CloudStorage] section. Vertica provides a sample cloud storage configuration file that you can copy and edit.

Environment variable requirements

Environment variables securely pass credentials for backup locations. Eon and Enterprise Mode databases require environment variables in the following backup scenarios:

Vertica on Google Cloud Platform (GCP) to Google Cloud Storage (GCS).

For backups to GCS, you must have a hash-based message authentication code (HMAC) key that contains an access ID and a secret. See Eon Mode on GCP prerequisites for instructions on how to create your HMAC key.
On-premises databases to any of the following storage locations:
- Amazon Web Services (AWS)
- Any S3-compatible storage
- Azure Blob Storage (Enterprise Mode only)
On-premises database backups require you to pass your credentials with environment variables. You cannot use other methods of credentialing with cross-endpoint backups.
Any Azure user environment that does not manage resources with Azure managed identities.

The vbr log captures when you sent an environment variable. For security purposes, the value that the environment variable represents is not logged. For details about checking vbr logs, see Troubleshooting backup and restore.

Enterprise Mode and Eon Mode

All Enterprise Mode and Eon Mode databases require the following environment variables:

Environment Variable	Description
`VBR_BACKUP_STORAGE_ACCESS_KEY_ID`	Credentials for the backup location.
`VBR_BACKUP_STORAGE_SECRET_ACCESS_KEY`	Credentials for the backup location.
`VBR_BACKUP_STORAGE_ENDPOINT_URL`	The endpoint for the on-premises S3 backup location, includes the scheme HTTP or HTTPS. Important Do not set this variable for backup locations on AWS or GCS.

Eon Mode only

Eon Mode databases require the following environment variables:

Environment Variable	Description
`VBR_COMMUNAL_STORAGE_ACCESS_KEY_ID`	Credentials for the communal storage location.
`VBR_COMMUNAL_STORAGE_SECRET_ACCESS_KEY`	Credentials for the communal storage location.
`VBR_COMMUNAL_STORAGE_ENDPOINT_URL`	The endpoint for the communal storage, includes the scheme HTTP or HTTPS. Important Do not set this variable for backup locations on GCS.

Azure Blob Storage only

If the user environment does not manage resources with Azure-managed identities, you must provide credentials with environment variables. If you set environment variables in an environment that uses Azure-managed identities, credentials set with environment variables take precedence over Azure-managed identity credentials.

You can back up and restore between two separate Azure accounts. Cross-account operations require a credential configuration JSON object and an endpoint configuration JSON object for each account. Each environment variable accepts a collection of one or more comma-separated JSON objects.

Cross-account and cross-region backup and restore operations might result in decreased performance. For details about performance and cost, see the Azure documentation.

The Azure Blob Storage environment variables are described in the following table:

Environment Variable Description

Environment Variable	Description
`VbrCredentialConfig`	Credentials for the backup location. Each JSON object requires values for the following keys: `accountName`: Name of the storage account. `blobEndpoint`: Host address and optional port for the endpoint to use as the backup location. `accountKey`: Access key for the account. `sharedAccessSignature`: A token that provides access to the backup endpoint.
`VbrEndpointConfig`	The endpoint for the backup location. To backup and restore between two separate Azure accounts, provide each set of endpoint information as a JSON object. Each JSON object requires values for the following keys: `accountName`: Name of the storage account. `blobEndpoint`: Host address and optional port for the endpoint to use as the backup location. `protocol`: HTTPS (default) or HTTP. `isMultiAccountEndpoint`: Boolean (by default false), indicates whether `blobEndpoint` supports multiple accounts

VbrCredentialConfig

Credentials for the backup location. Each JSON object requires values for the following keys:

accountName: Name of the storage account.
blobEndpoint: Host address and optional port for the endpoint to use as the backup location.
accountKey: Access key for the account.
sharedAccessSignature: A token that provides access to the backup endpoint.

VbrEndpointConfig

The endpoint for the backup location. To backup and restore between two separate Azure accounts, provide each set of endpoint information as a JSON object.

Each JSON object requires values for the following keys:

accountName: Name of the storage account.
blobEndpoint: Host address and optional port for the endpoint to use as the backup location.
protocol: HTTPS (default) or HTTP.
isMultiAccountEndpoint: Boolean (by default false), indicates whether blobEndpoint supports multiple accounts

The following commands export the Azure Blob Storage environment variables to the current shell session:

$ export VbrCredentialConfig=[{"accountName": "account1","blobEndpoint": "host[:port]","accountKey": "account-key1","sharedAccessSignature": "sas-token1"}]
$ export VbrEndpointConfig=[{"accountName": "account1", "blobEndpoint": "host[:port]", "protocol": "http"}]

5.4 - Additional considerations for cloud storage

If you are backing up to a supported cloud storage location, you need to do some additional one-time configuration.

If you are backing up to a supported cloud storage location, you need to do some additional one-time configuration. You must also take additional steps if the cluster you are backing up is running on instances in the cloud. For Amazon Web Services (AWS), you might choose to encrypt your backups, which requires additional steps.

By default, bucket access is restricted to the communal storage bucket. For one-time operations with other buckets like backing up and restoring the database, use the appropriate credentials. See Google Cloud Storage parameters and S3 parameters for additional information.

Configuring cloud storage for backups

As with any storage location, you must initialize a cloud storage location with the vbr task init.

Because cloud storage does not support file locking, Vertica uses either your local file system or the cloud storage file system to handle file locks during a backup. You identify this location by setting the cloud_storage_backup_file_system_path parameter in your vbr configuration file. During a backup, Vertica creates a locked identity file on your local or cloud instance, and a duplicate file in your cloud storage backup location. If the files match, Vertica proceeds with the backup, releasing the lock when the backup is complete. As long as the files remain identical, you can use the cloud storage location for backup and restore tasks.

Reinitializing cloud backup storage

If the files in your locking location become out of sync with the files in your backup location, backup and restore tasks fail with an error message. You can resolve locking inconsistencies by rerunning the init task qualified by --cloud-force-init:

$ /opt/vertica/bin/vbr --task init --cloud-force-init -c filename.ini

Note

If a backup fails, confirm that your Vertica cluster has permission to access your cloud storage location.

Configuring authentication for Google Cloud Storage

If you are backing up to Google Cloud Storage (GCS) from a Google Cloud Platform-based cluster, you must provide authentication to the GCS communal storage location. Set the environment variables as detailed in Configuring cloud storage backups to authenticate to GCS storage.

See Eon Mode on GCP prerequisites for additional authentication information, including how to create your hash-based message authentication code (HMAC) key.

Configuring EC2 authentication for Amazon S3

If you are backing up to S3 from an EC2-based cluster, you must provide authentication to your S3 host. Regardless of the authentication type you choose, your credentials do not leave your EC2 cluster. Vertica supports the following authentication types:

AWS credential file
Environment variables
IAM role

AWS credential file - You can manually create a configuration file on your EC2 initiator host at ~/.aws/credentials.

[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY

For more information on credential files, refer to Amazon Web Services documentation.

Environment variables - Amazon Web Services provides the following environment variables:

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY

Use these variables on your initiator to provide authentication to your S3 host. When your session ends, AWS deletes these variables. For more information, refer to the AWS documentation.

IAM role - Create an AWS IAM role and grant that role permission to access your EC2 cluster and S3 resources. This method is recommended for managing long-term access. For more information, refer to Amazon Web Services documentation.

Encrypting backups on Amazon S3

Backups made to Amazon S3 can be encrypted using native server-side S3 encryption capability. For more information on Amazon S3 encryption, refer to Amazon documentation.

Note

Vertica supports server-side encryption only. Client-side encryption is not supported.

Vertica supports the following forms of S3 encryption:

Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)
- Encrypts backups with AES-256
- Amazon manages encryption keys
Server-Side Encryption with AWS KMS-Managed Keys (SSE-KMS)
- Encrypts backups with AES-256
- Requires an encryption key from Amazon Key Management Service
- Your S3 bucket must be from the same region as your encryption key
- Allows auditing of user activity

When you enable encryption of your backups, Vertica encrypts backups as it creates them. If you enable encryption after creating an initial backup, only increments added after you enabled encryption are encrypted. To ensure that your backup is entirely encrypted, create new backups after enabling encryption.

To enable encryption, add the following settings to your configuration file:

cloud_storage_encrypt_transport: Encrypts your backups during transmission. You must enable this parameter if you are using SSE-KMS encryption.
cloud_storage_encrypt_at_rest: Enables encryption of your backups. If you enable encryption and do not provide a KMS key, Vertica uses SSE-S3 encryption.
cloud_storage_sse_kms_key_id: If you are using KMS encryption, use this parameter to provide your key ID.

See [CloudStorage] for more information on these settings.

The following example shows a typical configuration for KMS encryption of backups.


[CloudStorage]
cloud_storage_encrypt_transport = True
cloud_storage_encrypt_at_rest = sse
cloud_storage_sse_kms_key_id = 6785f412-1234-4321-8888-6a774ba2aaaa

5.5 - Configuring backups to and from HDFS

To back up an Eon Mode database that uses HDFS on-premises storage, the communal storage and backup location must use the same HDFS credentials and domain.

Eon Mode only

To back up an Eon Mode database that uses HDFS on-premises storage, the communal storage and backup location must use the same HDFS credentials and domain. All vbr operations are supported, except copycluster.

Vertica supports Kerberos authentication, High Availability Name Node, and TLS (wire encryption) for vbr operations.

Creating a cloud storage configuration file

To back up Eon Mode on-premises with communal storage on HDFS, you must provide a backup configuration file. In the [CloudStorage] section, provide the cloud_storage_backup_path and cloud_storage_backup_file_system_path values.

If you use Kerberos authentication or High Availability NameNode with your Hadoop cluster, the vbr utility requires access to the same values set in the bootstrapping file that you created during the database install. Include these values in the [misc] section of the backup file.

The following table maps the vbr configuration option to its associated bootstrap file parameter:

vbr Configuration Option	Bootstrap File Parameter
kerberos_service_name	KerberosServiceName
kerberos_realm	KerberosRealm
kerberos_keytab_file	KerberosKeytabFile
hadoop_conf_dir	HadoopConfDir

For example, if KerberosServiceName is set to principal-name in the bootstrap file, set kerberos_service_name to principal-name in the [Misc] section of your configuration file.

Encryption between communal storage and backup locations

Vertica supports vbr operations using wire encryption between your communal storage and backup locations. Use the cloud_storage_encrypt_transport parameter in the [CloudStorage] section of your backup configuration file to configure encryption.

To enable encryption:

Set cloud_storage_encrypt_transport to true.
Use the swebhdfs:// protocol for cloud_storage_backup_path.

If you do not use encryption:

Set cloud_storage_encrypt_transport to false.
Use the webhdfs:// protocol for cloud_storage_backup_path.

Vertica does not support at-rest encryption for Hadoop storage.

6 - Creating backups

You should perform full backups of your database regularly.

Important

You should perform full backups of your database regularly. You should also perform a full backup under the following circumstances:

Before…

Upgrading Vertica to another release.
Dropping a partition.
Adding, removing, or replacing nodes in the database cluster.

After…

Loading a large volume of data.
Adding, removing, or replacing nodes in the database cluster.
Recovering a cluster from a crash.

If…

The epoch of the latest backup predates the current ancient history mark.

Ideally, schedule ongoing backups to back up your data. You can run the Vertica vbr from a cron job or other task scheduler.

You can also back up selected objects. Use object backups to supplement full backups, not to replace them. Backup types are described in Types of backups.

Running vbr does not affect active database applications. vbr supports creating backups while concurrently running applications that execute DML statements, including COPY, INSERT, UPDATE, DELETE, and SELECT.

Backup locations and contents

Full and object-level backups reside on backup hosts, the computer systems on which backups and archives are stored.

Vertica saves backups in a specific backup location, the directory on a backup host. This location can contain multiple backups, both full and object-level, including associated archives. The backups are also compatible, allowing you to restore any objects from a full database backup. Backup locations for Eon Mode databases must be on S3.

Note

Vertica does not recommend concurrent backups. If you must run multiple backups concurrently, use separate backup and temp directories for each. Having separate backup directories detracts from the advantage of sharing data among historical backups.

Before beginning a backup, you must prepare your backup locations using the vbr init task, as in the following example:

$ vbr -t init -c full_backup.ini

For more information about backup locations, see Setting up backup locations.

Backups contain all committed data for the backed-up objects as of the start time of the backup. Backups do not contain uncommitted data or data committed during the backup. Backups do not delay mergeout or load activity.

Backing up HDFS storage locations

If your Vertica cluster uses HDFS storage locations, you must do some additional configuration before you can perform backups. See Requirements for backing up and restoring HDFS storage locations.

HDFS storage locations support only full backup and restore. You cannot perform object backup or restore on a cluster that uses HDFS storage locations.

Impact of backups on Vertica nodes

While a backup is taking place, the backup process can consume additional storage. The amount of space consumed depends on the size of your catalog and any objects that you drop during the backup. The backup process releases this storage when the backup is complete.

Best practices for creating backups

When creating backup configuration files:

Create separate configuration files to create full and object-level backups.
Use a unique snapshot name in each configuration file.
Use the same backup host directory location for both kinds of backups:
- Because the backups share disk space, they are compatible when performing a restore.
- Each cluster node must also use the same directory location on its designated backup host.
For best network performance, use one backup host per cluster node.
Use one directory on each backup node to store successive backups.
For future reference, append the major Vertica version number to the configuration file name (mybackup9x).

The selected objects of a backup can include one or more schemas or tables, or a combination of both. For example, you can include schema S1 and tables T1 and T2 in an object-level backup. Multiple backups can be combined into a single backup. A schema-level backup can be integrated with a database backup (and a table backup integrated with a schema-level backup, and so on).

6.1 - Types of backups

vbr supports the following kinds of backups:.

vbr supports the following kinds of backups:

The vbr configuration file includes the snapshotName parameter. Use different snapshot names for different types of backups, including different combinations of objects in object-level backups. Backups with the same snapshot name form a time sequence limited by restorePointLimit,. Avoid giving all backups the same snapshot name; otherwise, they eventually interfere with each other.

Full backups

A full backup is a complete copy of the database catalog, its schemas, tables, and other objects. This type of backup provides a consistent image of the database at the time the backup occurred. You can use a full backup for disaster recovery to restore a damaged or incomplete database. You can also restore individual objects from a full backup.

When a full backup already exists, vbr performs incremental backups, whose scope is confined to data that is new or changed since the last full backup occurred. You can specify the number of historical backups to keep.

Archives contain a collection of same-name backups. Each archive can have a different retention policy. For example, TBak might be the name of an object-level backup of table T. If you create a daily backup each week, the seven backups of a given week become part of the TBak archive. Keeping a backup archive lets you revert back to any one of the saved backups.

Object-level backups

An object-level backup consists of one or more schemas or tables or a group of such objects. The conglomerate parts of the object-level backup do not contain the entire database. When an object-level backup exists, you can restore all of its contents or individual objects.

Note

Object-level backups are not supported for Enterprise Mode databases that use a Hadoop File System (HDFS) storage location.

Object-level backups contain the following object types:

Object Type	Description
Selected objects	Objects you choose to be part of an object-level backup. For example, if you specify tables `T1` and `T2` to include in an object-level backup, they are the selected objects.
Dependent objects	Objects that must be included as part of an object-level backup, due to dependencies. Suppose you want to create an object-level backup that includes a table with a foreign key. To do so, table constraints require that you include the primary key table, and `vbr` enforces this requirement. Projections anchored on a table in the selected objects are also dependent objects.
Principal objects	The objects on which both selected and dependent objects depend are called principal objects. For example, each table and projection has an owner, and each is a principal object.

Hard-link local backups

Valid only for Enterprise Mode, hard-link local backups are saved directly on the database nodes, and can be performed on the entire database or specific objects. Typically you use this kind of backup temporarily before performing a disruptive operation. Do not rely on this kind of backup for long-term use; it cannot protect you from node failures because data and backups are on the same nodes.

A checkpoint backup is a hard-link local backup that comprises a complete copy of the database catalog, and a set of hard file links to corresponding data files. You must save a hard-link local backup on the same file system that is used by the catalog and database files.

6.2 - Creating full backups

Before you create a database backup, verify the following:.

Before you create a database backup, verify the following:

You have prepared your backup directory with the vbr init task:
```
$ vbr -t init -c full_backup.ini
```
Your database is running. It is unnecessary for all nodes to be up in a K-safe database. However, any nodes that are DOWN are not backed up.
All of the backup hosts are up and available.
The backup host (either on the database cluster or elsewhere) has sufficient disk space to store the backups.
The user account of the user who starts vbr has write access to the target directories on the host backup location. This user can be dbadmin or another assigned role. However, you cannot run vbr as root.
Each backup has a unique file name.
If you want to keep earlier backups, restorePointLimit is set to a number greater than 1 in the configuration file.
If you are backing up an Eon Mode database, you have met the Eon Mode database requirements.

Run vbr from a terminal. Use the database administrator account from an initiator node in your database cluster. The command requires only the --task backup and --config-file arguments (or their short forms, -t and -c).

If your configuration file does not contain the database administrator password, vbr prompts you to enter the password. It does not display what you type.

vbr requires no further interaction after you invoke it.

The following example shows a full backup:

$ vbr -t backup -c full_backup.ini
Starting backup of database VTDB.
Participating nodes: v_vmart_node0001, v_vmart_node0002, v_vmart_node0003, v_vmart_node0004.
Snapshotting database.
Snapshot complete.
Approximate bytes to copy: 2315056043 of 2356089422 total.
[==================================================] 100%
Copying backup metadata.
Finalizing backup.
Backup complete!

By default, no output is displayed, other than the progress bar. To include additional progress information, use the --debug option, with a value of 1, 2, or 3.

6.3 - Creating object-level backups

Use object-level backups to back up individual schemas or tables.

Use object-level backups to back up individual schemas or tables. Object-level backups are especially useful for multi-tenanted database sites. For example, an international airport could use a multi-tenanted database to represent different airlines in its schemas. Then, tables could maintain various types of information for the airline, including ARRIVALS, DEPARTURES, and PASSENGER information. With such an organization, creating object-level backups of the specific schemas would let you restore by airline tenant, or any other important data segment.

To create one or more object-level backups, create a configuration file specifying the backup location, the object-level backup name, and a list of objects to include. You can use the includeObjects and excludeObjects parameters together with wildcards to specify the objects of interest. For more information about specifying the objects to include, see Including and excluding objects.

Important

If your Eon Mode database has multiple namespaces, you must specify the namespace to which the objects belong. For vbr tasks, namespace names are prefixed with a period. For example, .n.s.t refers to table t in schema s in namespace n. See Eon Mode database requirements for more information.

For more information about configuration files for full or object-level backups, see Sample vbr configuration files and vbr configuration file reference.

While not required, Vertica recommends that you first create a full backup before creating any object-level backups.

Note

Apache Kafka uses internal configuration settings to maintain the integrity of your data. When backing up your Kafka data, Vertica recommends that you perform a full database backup rather than an object-level backup.

Performing the backup

Before you can create a backup, you must prepare your backup directory with the vbr -init task. You must also create a configuration file specifying which objects to back up.

Run vbr from a terminal using the database administrator account from a node in your database cluster. You cannot run vbr as root.

You can create an object-level backup as in the following example.

$ vbr --task backup --config-file objectbak.ini
Preparing...
Found Database port:  5433
Copying...
[==================================================] 100%
All child processes terminated successfully.
Committing changes on all backup sites...
backup done!

Naming conventions

Give each object-level backup configuration file a distinct and descriptive name. For instance, at an airport terminal, schema-based backup configuration files use a naming convention with an airline prefix, followed by further description, such as:

AIR1_daily_arrivals_backup

AIR2_hourly_arrivals_backup

AIR2_hourly_departures_backup

AIR3_daily_departures_backup

When database and object-level backups exist, you can recover the backup of your choice.

Caution

Do not change object names in an object-level configuration file if a backup already exists. Doing so overwrites the original configuration file, and you cannot restore it from the earlier backup. Instead, create a different configuration file.

Understanding object-level backup contents

Object-level backups comprise only the elements necessary to restore the schema or table, including the selected, dependent, and principal objects. An object-level backup includes the following contents:

Storage: Data files belonging to any specified objects
Metadata: Including the cluster topology, timestamp, epoch, AHM, and so on
Catalog snippet: Persistent catalog objects serialized into the principal and dependent objects

Some of the elements that AIR2 comprises, for instance, are its parent schema, tables, named sequences, primary key and foreign key constraints, and so on. To create such a backup, vbr saves the objects directly associated with the table. It also saves any dependencies, such as foreign key (FK) tables, and creates an object map from which to restore the backup.

Note

Because the data in local temp tables persists only within a session, local temporary tables are excluded when you create an object-level backup. For global temporary tables, vbr stores the table's definition.

Making changes after an object-level backup

Be aware how changes made after an object-level backup affect subsequent backups. Suppose you create an object-level backup and later drop schemas and tables from the database. In this case, the objects you dropped are also dropped from subsequent backups. If you do not save an archive of the object backup, such objects could be lost permanently.

Changing a table name after creating a table backup does not persist after restoring the backup. Suppose that, after creating a backup, you drop a user who owns any selected or dependent objects in that backup. In this case, restoring the backup re-creates the object and assigns ownership to the user performing the restore. If the owner of a restored object still exists, that user retains ownership of the restored object.

To restore a dropped table from a backup:

Rename the newly created table from t1 to t2.
Restore the backup containing t1.
Restore t1. Tables t1 and t2 now coexist.

For information on how Vertica handles object overwrites, refer to the objectRestoreMode parameter in [misc].

K-safety can increase after an object backup. Restoration of a backup fails if both of the following conditions occur:

An increase in K-safety occurs.
Any table in the backup has insufficient projections.

Changing principal and dependent objects

If you create a backup and then drop a principal object, restoring the backup restores that principal object. If the owner of the restored object has also been dropped, Vertica assigns the restored object to the current dbadmin.

You can specify how Vertica handles object overwrites in the vbr configuration file. For more information, refer to the objectRestoreMode parameter in [misc].

IDENTITY sequences are dependent objects because they cannot exist without their tables. An object-level backup includes such objects, along with the tables on which they depend.

Named sequences are not dependent objects because they exist autonomously. A named sequence remains after you drop the table in which the sequence is used. In this case, the named sequence is a principal object. Thus, you must back up the named sequence with the table. Then you can regenerate it, if it does not already exist when you restore the table. If the sequence does exist, vbr uses it, unmodified. Sequence values could repeat, if you restore the full database and then restore a table backup to a newer epoch.

Considering constraint references

When database objects are related through constraints, you must back them up together. For example, a schema with tables whose constraints reference only tables in the same schema can be backed up. However, a schema containing a table with an FK/PK constraint on a table in another schema cannot. To back up the second table, you must include the other schema in the list of selected objects.

Configuration files for object-level backups

vbr automatically associates configurations with different backup names but uses the same backup location.

Always create a cluster-wide configuration file and one or more object-level configuration files pointing to the same backup location. Storage between backups is shared, preventing multiple copies of the same data. For object-level backups, using the same backup location causes vbr to encounter fewer OID conflict prevention techniques. Avoiding OID conflict prevention results in fewer problems when restoring the backup.

When using cluster and object configuration files with the same backup location, vbr includes additional provisions to ensure that the object-level backups can be used following a full cluster restore. One approach to restoring a full cluster is to use a full database backup to bootstrap the cluster. After the cluster is operational again, you can restore the most recent object-level backups for schemas and tables.

Attempting to restore a full database using an object-level configuration file fails, resulting in this error:

VMart=> /tmp/vbr --config-file=Table2.ini -t restore
Preparing...
Invalid metadata file. Cannot restore.
restore failed!

See Restoring all objects from an object-level backup for more information.

Backup epochs

Each backup includes the epoch to which its contents can be restored. When vbr restores data, Vertica updates to the current epoch.

vbr attempts to create an object-level backup five times before an error occurs and the backup fails.

6.4 - Creating hard-link local backups

You can use the hardLinkLocal option to create a full or object-level backup with hard file links on a local database host.

You can use the hardLinkLocal option to create a full or object-level backup with hard file links on a local database host.

Creating hard-link local backups can provide the following advantages over a remote host backup:

Speed: A hard-link local backup is significantly faster than a remote host backup. When backing up, vbr does not copy files if the backup directory exists on the same file system as the database directory.
Reduced network activities: The hard-link local backup minimizes network load because it does not require rsync to copy files to a remote backup host.
Less disk space: The backup includes a copy of the catalog and hard file links. Therefore, the local backup uses significantly less disk space than a backup with copies of database data files. However, a hard-link local backup saves a full copy of the catalog each time you run vbr. Thus, the disk size increases with the catalog size over time.

Hard-link local backups can help you during experimental designs and development cycles. Database designers and developers can create hard-link local object backups of schemas and tables on a regular schedule during design and development phases. If any new developments are unsuccessful, developers can restore one or more objects from the backup.

Planning hard-link local backups

If you plan to use hard-link local backups as a standard site procedure, design your database and hardware configuration appropriately. Consider storing all of the data files on one file system per node. Such a configuration has the advantage of being set up automatically for hard-link local backups.

Specifying backup directory locations

The backupDir parameter of the configuration file specifies the location of the top-level backup directory. Hard-link local backups require that the backup directory be located on the same Linux file system as the database data. The Linux operating system cannot create hard file links to another file system.

Do not create the hard-link local backup directory in a database data storage location. For example, as a best practice, the database data directory should not be at the top level of the file system, as it is in the following example:

/home/dbadmin/data/VMart/v_vmart_node0001

Instead, Vertica recommends adding another subdirectory for data above the database level, such as in this example:

/home/dbadmin/data/dbdata/VMart/v_vmart_node0001

You can then create the hard-link local backups subdirectory as a peer of the data directory you just created, such as in this example:

/home/dbadmin/data/backups
/home/dbadmin/data/dbdata

When you specify the hard-link backup location, be sure to avoid these common errors when adding the hardLinkLocal=True parameter to the configuration file:

If ...	Then...	Solution
You specify a backup directory on a different node	`vbr` issues an error message and aborts the backup.	Change the configuration file to include a backup directory on the same host and file system as the database files. Then, run `vbr` again.
You specify a backup location on the same node, but a backup destination directory on a different file system from the database and catalog files.	`vbr` issues a warning message and performs the backup by copying (not linking) the files from one file system to the other.	No action required, but copying consumes more disk space and takes longer than linking.

Creating the backup

Before creating a full hard-link local database backup of an Enterprise Mode database, verify the following:

Your database is running. All nodes need not be up in a K-safe database for vbr to run. However, be aware that any nodes that are DOWN are not backed up.
The user account that starts vbr (dbadmin or other) has write access to the target backup directories.

Hard-link backups are not supported in Eon Mode.

When you create a full or object-level hard link local backup, that backup contains the following:

Backup	Catalog	Database files
Full backup	Full copy	Hard file links to all database files
Object-level backup	Full copy	Hard file links for all objects listed in the configuration file, and any of their dependent objects

Run the vbr script from a terminal using the database administrator account from a node in your database cluster. You cannot run vbr as root.

Hard-link backups use the same vbr arguments as other backups. Configuring a backup as a hard-link backup is done entirely in the configuration file. The following example shows the syntax:

$ vbr --task backup --config fullbak.ini

Creating hard-link local backups for external media storage

You can use hard-link local backups as a staging mechanism to back up to tape or other forms of storage media. The following steps present a simplified approach to saving, and then restoring, hard-link local backups from tape storage:

Create a configuration file by copying an existing one or one of the samples described in Sample vbr configuration files.
Edit the configuration file (localbak.ini in this example) to include the hardLinkLocal=True parameter in the [Transmission] section.

Run vbr with the configuration file:

$ vbr --task backup --config-file localbak.ini

Copy the hard-link local backup directory with a separate process (not vbr) to tape or other external media.
If the database becomes corrupted, transfer the backup files from tape to their original backup directory and restore as explained in Restoring hard-link local backups.

Note

Vertica recommends that you preserve the directory containing the hard-link backup after copying it to other media. If you delete the directory and later copy the files back from external media, the copied files will no longer be links. Instead, they will use as much disk space as if you had done a full (not hard-link) backup.

Restoring hard-link local backups requires some additional (manual) steps. Do not use them as a substitute for regular full backups (Creating full backups).

Hard-link local backups and disaster recovery

Hard-link local backups are only as reliable as the disk on which they are stored. If the local disk becomes corrupt, so does the hard-link local backup. In this case, you are unable to restore the database from the hard-link local backup because it is also corrupt.

All sites should maintain full backups externally for disaster recovery because hard-link local backups do not actually copy any database files.

6.5 - Incremental or repeated backups

As a best practice, Vertica recommends that you take frequent backups if database contents diverge in significant ways.

As a best practice, Vertica recommends that you take frequent backups if database contents diverge in significant ways. Always take backups after any event that significantly modifies the database, such as performing a rebalance. Mixing many backups with significant differences can weaken data K-safety. For example, taking backups both before and after a rebalance is not a recommended practice in cases where the backups are all part of one archive.

Each time you back up your database with the same configuration file, vbr creates an additional backup and might remove the oldest backup. The backup operation copies new storage containers, which can include:

Data that existed the last time you performed a database backup
New and changed data since the last full backup

Use the restorePointLimit parameter in the configuration file to increase the number of stored backups. If a backup task would cause this limit to be exceeded, vbr deletes the oldest backup after a successful backup.

When you run a backup task, vbr first creates the new backup in the specified location, which might temporarily exceed the limit. It then checks whether the number of backups exceeds the value of restorePointLimit, and, if necessary, deletes the oldest backups until only restorePointLimit remain. If the requested backup fails or is interrupted, vbr does not delete any backups.

When you restore a database, you can choose to restore from any retained backup rather than the most recent, so raise the limit if you expect to need access to older backups.

7 - Restoring backups

You can use the vbr restore task to restore your full database or selected objects from backups created by vbr.

You can use the vbr restore task to restore your full database or selected objects from backups created by vbr. Typically you use the same configuration file for both operations. The minimal restore command is:

$ vbr --task restore --config-file config-file

You must log in using the database administrator's account (not root).

For full restores, the database must be DOWN. For object restores, the database must be UP.

Usually you restore to the cluster that you backed up, but you can also restore to an alternate cluster if the original one is no longer available.

Restoring must be done on the same architecture as the backup from which you are restoring. You cannot back up an Enterprise Mode database and restore it in Eon Mode or vice versa.

You can perform restore tasks on Permanent node types. You cannot restore data on Ephemeral, Execute, or Standby nodes. To restore or replicate to these nodes, you must first change the destination node type to PERMANENT. For more information, refer to Setting node type.

Restoring objects to a higher Vertica version

Vertica supports restoration to a database that is no more than one minor version higher than the current database version. For example, you can restore objects from a 12.0.x database to a 12.1.x database.

If restored objects require a UDx library that is not present in the later-version database, Vertica displays the following error:

ERROR 2858:  Could not find function definition

You can resolve this issue by installing compatible libraries in the target database.

Restoring HDFS storage locations

If your Vertica cluster uses HDFS storage locations, you must do some additional configuration before you can restore. See Requirements for backing up and restoring HDFS storage locations.

HDFS storage locations support only full backup and restore. You cannot perform object backup or restore on a cluster that uses HDFS storage locations.

7.1 - Restoring a database from a full backup

You can restore a full database backup to the database that was backed up, or to an alternate cluster with the same architecture.

You can restore a full database backup to the database that was backed up, or to an alternate cluster with the same architecture. One reason to restore to an alternate cluster is to set up a test cluster to investigate a problem in your production cluster.

To restore a full database backup, you must verify that:

Database is DOWN. You cannot restore a full backup when the database is running.
All backup hosts are available.
Backup directory exists and contains backups of the data to restore.
Cluster to which you are restoring the backup has:
- Same number of nodes as used to create the backup (Enterprise Mode), or at least as many nodes as the primary subclustes (Eon Mode)
- Same architecture as the one used to create the backup
- Identical node names
Target database already exists on the cluster where you are restoring data.
- Database can be completely empty, without any data or schema.
- Database name must match the name in the backup
- All node names in the database must match the names of the nodes in the configuration file.
The user performing the restore is the database administrator.
If you are restoring an Eon Mode database, you have met the Eon Mode database requirements.

You can use only a full database backup to restore a complete database. If you have saved multiple backup archives, you can restore from either the last backup or a specific archive.

When your Eon Mode database has multiple communal storage locations, vbr attempts to copy each database object to its associated storage location. If a storage location has been dropped since the backup was taken, the restore operation attempts to reinstate the dropped location before restoring the data. If the dropped storage location cannot be reinstated, its associated data is copied to the main communal storage location.

Restoring from a full database backup injects the OIDs from each backup into the restored catalog of the full database backup. The catalog also receives all archives. Additionally, the OID generator and current epoch are set to the current epoch.

You can also restore a full backup to a different database than the one you backed up. See Restoring a database to an alternate cluster.

Important

When you restore an Eon Mode database to another database, the restore operation copies the source database's communal storage. The original communal storage is unaffected.

Restoring the most recent backup

Usually, when a node or cluster is DOWN, you want to return the cluster to its most-recent state. Doing so requires restoring a full database backup. You can restore any full database backup from the archive by identifying the name in the configuration file.

To restore from the most recent backup, use the vbr restore task with the configuration file. If your password configuration file does not contain the database superuser password, vbr prompts you to enter it.

The following example shows how you can use the db.ini configuration file for restoration:

> vbr --task restore --config-file db.ini
Copying...
1871652633 out of 1871652633, 100%
All child processes terminated successfully.
restore done!

Restoring an archive

If you saved multiple backups, you can specify an archive to restore. To list the archives that exist to choose one to restore, use the vbr --listbackup task, with a specific configuration file. See Viewing backups.

To restore from an archive, add the --archive parameter to the command line. The value is the date_timestamp suffix of the directory name that identifies the archive to restore. For example:

$ vbr --task restore --config-file fullbak.ini --archive=20121111_205841

The --archive parameter identifies the archive created on 11-11-2012 (_archive20121111), at time 205841 (20:58:41). You need specify only the _archive suffix, because the configuration file identifies the backup name of the subdirectory, and the OID identifier indicates the backup is an archive.

Restore failures in Eon Mode

When a restore operation fails, vbr can leave extra files in the communal storage location. If you use communal storage in the cloud, those extra files cost you money. To remove them, restart the database and call CLEAN_COMMUNAL_STORAGE with an argument of true.

7.2 - Restoring a database to an alternate cluster

Vertica supports restoring a full backup to an alternate cluster.

Requirements

The process is similar to the process for Restoring a database from a full backup, with the following additional requirements.

The destination database must:

Be DOWN.
Share the same name as the source database.
Have the same number of nodes as the source database.
Have the same names as the source nodes.
Use the same catalog directory location as the source database.
Use the same port numbers as the source database.

Procedure

Copy the vbr configuration file that you used to create the backup to any node on the destination cluster.
If you are using a stored password, copy the password configuration file to the same location as the vbr configuration file.
From the destination node, issue a vbr restore command, such as:
```
$ vbr -t restore -c full.ini
```
After the restore has completed, start the restored database.

7.3 - Restoring all objects from an object-level backup

To restore everything in an object-level backup to the database from which it was taken, use the vbr restore task with the configuration file you used to create the backup, as in the following example:.

$ vbr --task restore --config-file MySchema.ini
Copying...
1871652633 out of 1871652633, 100%
All child processes terminated successfully.
restore done!

The database must be UP.

You can specify how Vertica reacts to duplicate objects by setting the objectRestoreMode parameter in the configuration file.

Object-level backup and restore are not supported for HDFS storage locations.

Restoring objects to a changed cluster

Unlike restoring from a full database backup, vbr supports restoring object-level backups after adding nodes to the cluster. Any nodes that were not in the cluster when you created the object-level backup do not participate in the restore. You can rebalance your cluster after the restore to distribute data among the new nodes.

You cannot restore an object-level backup after removing nodes, altering node names, or changing IP addresses. Trying to restore an object-level backup after such changes causes vbr to fail and display this message:

Preparing...
Topology changed after backup; cannot restore.
restore failed!

Projection epoch after restore

All object-level backup and restore events are treated as DDL events. If a table does not participate in an object-level backup, possibly because a node is down, restoring the backup affects the projection in the following ways:

Its epoch is reset to 0.
It must recover any data that it does not have by comparing epochs and other recovery procedures.

Catalog locks during restore

As with other databases, Vertica transactions follow strict locking protocols to maintain data integrity.

When restoring an object-level backup into a cluster that is UP, vbr begins by copying data and managing storage containers. If necessary, vbr splits the containers. This process does not require any database locks.

After completing data-copying tasks, vbr first requires a table object lock (O-lock) and then a global catalog lock (GCLX).

In some circumstances, other database operations, such as DML statements, are in progress when the process attempts to get an O-lock on the table. In such cases, vbr is blocked from progress until the DML statement completes and releases the lock. After securing an O-lock first, and then a GCLX lock, vbr blocks other operations that require a lock on the same table.

While vbr holds its locks, concurrent table modifications are blocked. Database system operations, such as the Tuple Mover (TM) transferring data from memory to disk, are canceled to permit the object-level restore to complete.

Catalog restore events

Each object-level backup includes a section of the database catalog, or a snippet. A snippet contains the selected objects, their dependent objects, and principal objects. A catalog snippet is similar in structure to the database catalog but consists of a subset representing the object information. Objects being restored can be read from the catalog snippet and used to update both global and local catalogs.

Each object from a restored backup is updated in the catalog. If the object no longer exists, vbr drops the object from the catalog. Any dependent objects that are not in the backup are also dropped from the catalog.

vbr uses existing dependency verification methods to check the catalog and adds a restore event to the catalog for each restored table. That event also includes the epoch at which the event occurred. If a node misses the restore table event, it recovers projections anchored on the given table.

Reverting object DDL changes

If you restore the database to an epoch that precedes changes to an object's DDL, the restore operation reverts the object to its earlier definition. For example, if you change a table column's data type from CHAR(8) to CHAR(16) in epoch 10, and then restore the database from epoch 5, the column reverts to CHAR(8) data type.

Restoring objects to a higher Vertica version

If restored objects require a UDx library that is not present in the later-version database, Vertica displays the following error:

ERROR 2858:  Could not find function definition

You can resolve this issue by installing compatible libraries in the target database.

Catalog size limitations

Object-level restores can fail if your catalog size is greater than five percent of the total memory available in the node performing the restore. In this situation, Vertica recommends restoring individual objects from the backup. For more information, refer to Restoring individual objects.

7.4 - Restoring individual objects

You can use vbr to restore individual tables and schemas from a full or object-level backup: qualify the restore task with --restore-objects, and specify the objects to restore as a comma-delimited list:.

You can use vbr to restore individual tables and schemas from a full or object-level backup: qualify the restore task with --restore-objects, and specify the objects to restore as a comma-delimited list:

Important

$ vbr --task restore --config-file=filename --restore-objects='objectname[,...]' [--archive=archive-id] [--target-namespace=namespace-name]

The following requirements and restrictions apply:

The database must be running, and nodes must be UP.
Tables must include their schema names.
Do not embed spaces before or after comma delimiters of the --restore-objects list; otherwise, vbr interprets the space as part of the object name.
Object-level restore is not supported for HDFS storage locations. To restore an HDFS storage location you must do a full restore.

If the schema has a disk quota and restoring the table would exceed the quota, the operation fails.

By default, --restore-objects restores the specified objects from the most recent backup. You can restore from an earlier backup with the --archive parameter.

The --target-namespace parameter is only valid for Eon Mode databases with multiple namespaces. The parameter specifies the namespace in the target cluster to which objects are restored. For more information, see Eon Mode database requirements.

The following example uses the db.ini configuration file, which includes the database administrator's password:

> vbr --task restore --config-file=db.ini --restore-objects=salesschema,public.sales_table,public.customer_info
Preparing...
Found Database port:  5433
Copying...
[==================================================] 100%
All child processes terminated successfully.
All extract object child processes terminated successfully.
Copying...
[==================================================] 100%
All child processes terminated successfully.
restore done!

Object dependencies

When you restore an object, Vertica does not always restore dependent objects. For example, if you restore a schema containing views, Vertica does not automatically restore the tables of those views. One exception applies: if database tables are linked through foreign keys, you must restore them together, unless drop_foreign_constraints is set in the vbr configuration file to true.

Note

You must also set objectRestoreMode to coexist, otherwise Vertica ignores drop_foreign_constraints.

Duplicate objects

You can specify how restore operations handle duplicate objects by configuring objectRestoreMode. By default, it is set to createOrReplace, so if a duplicate object exists, the restore operation overwrites it with the archived version.

Interactions with data loaders

When doing a restore with objectRestoreMode set to coexist, vbr creates new data loaders and their corresponding state tables, but does not change the table names in the loader COPY clauses. After the restore, you can use ALTER DATA LOADER to update the COPY statement in the restored data loader to use the new table name.

Eon Mode considerations

Restoring objects to an Eon Mode database can leave unneeded files in cloud storage. These files have no effect on database performance or data integrity. However, they can incur extra cloud storage expenses. To remove these files, restart the database and call CLEAN_COMMUNAL_STORAGE with an argument of true.

7.5 - Restoring objects to an alternate cluster

You can use the restore task to copy objects from one database to another.

You can use the restore task to copy objects from one database to another. You might do this to "promote" tables from a development environment to a production environment, for example. All restrictions described in Restoring individual objects apply when restoring to an alternate cluster.

To restore to an alternate database, you must make changes to a copy of the configuration file that was used to create the backup. The changes are in the [Mapping] and [NodeMapping] sections. Essentially, you create a configuration file for the restore operation that looks to vbr like a backup of the target database, but it actually describes the backup from the source database. See Restore object from backup to an alternate cluster for an example configuration file.

The following example uses two databases, named source and target. The source database contains a table named sales. The following source_snapshot.ini configuration file is used to back up the source database:

[Misc]
snapshotName = source_snapshot
restorePointLimit = 2
objectRestoreMode = createOrReplace

[Database]
dbName = source
dbUser = dbadmin
dbPromptForPassword = True

[Transmission]

[Mapping]
v_source_node0001 = 192.168.50.168:/home/dbadmin/backups/

The target_snapshot.ini file starts as a copy of source_snapshot.ini. Because the [Mapping] section describes the database that vbr operates on, we must change the node names to point to the target nodes. We must also add the [NodeMapping] section and change the database name:

[Misc]
snapshotName = source_snapshot
restorePointLimit = 2
objectRestoreMode = createOrReplace

[Database]
dbName = target
dbUser = dbadmin
dbPromptForPassword = True

[Transmission]

[Mapping]
v_target_node0001 = 192.168.50.151:/home/dbadmin/backups/
[NodeMapping]
v_source_node0001 = v_target_node0001

As far as vbr is concerned, we are restoring objects from a backup of the target database. In reality, we are restoring from the source database.

The following command restores the sales table from the source backup into the target database:

$ vbr --task restore --config-file target_snapshot.ini --restore-objects sales
Starting object restore of database target.
Participating nodes: v_target_node0001.
Objects to restore: sales.
Enter vertica password:
Restoring from restore point: source_snapshot_20160204_191920
Loading snapshot catalog from backup.
Extracting objects from catalog.
Syncing data from backup to cluster nodes.
[==================================================] 100%
Finalizing restore.
Restore complete!

7.6 - Restoring hard-link local backups

You restore from hard-link local backups the same way that you restore from full backups, using the restore task.

You restore from hard-link local backups the same way that you restore from full backups, using the restore task. If you used hard-link local backups to back up to external media, you need to take some additional steps.

Transferring backups to and from remote storage

When a full hard-link local backup exists, you can transfer the backup to other storage media, such as tape or a locally-mounted NFS directory. Transferring hard-link local backups to other storage media may copy the data files associated with the hard file links.

You can use a different directory when you return the backup files to the hard-link local backup host. However, you must also change the backupDir parameter value in the configuration file before restoring the backup.

Complete the following steps to restore hard-link local backups from external media:

If the original backup directory no longer exists on one or more local backup host nodes, re-create the directory.

The directory structure into which you restore hard-link backup files must be identical to what existed when the backup was created. For example, if you created hard-link local backups at the following backup directory, you can then re-create that directory structure:
```
/home/dbadmin/backups/localbak
```
Copy the backup files to their original backup directory, as specified for each node in the configuration file. For more information, refer to [Mapping].
Restore the backup, using one of three options:
1. To restore the latest version of the backup, move the backup files to the following directory:
```
/home/dbadmin/backups/localbak/node_name/snapshotname
```
2. To restore a different backup version, move the backup files to this directory:
```
/home/dbadmin/backups/localbak/node_name/snapshotname_archivedate_timestamp
```
When the backup files are returned to their original backup directory, use the original configuration file to invoke vbr. Verify that the configuration file specifies hardLinkLocal = true. Then restore the backup as follows:
```
$ vbr --task restore --config-file localbak.ini
```

7.7 - Ownership of restored objects

For a full restore, objects have the owners that they had in the backed-up database.

When performing an object restore, Vertica inserts data into existing database objects. By default, the restore does not affect the ownership, storage policies, or permissions of the restored objects. However, if the restored object does not already exist, Vertica re-creates it. In this situation, the restored object is owned by the user performing the restore. Vertica does not restore dependent grants, roles, or client authentications with restored objects.

If the storage policies of a restored object are not valid, vbr applies the default storage policy. Restored storage policies can become invalid due to HDFS storage locations, table incompatibility, and unavailable min-max values at restore time.

Sometimes, Vertica encounters a catalog object that it does not need to restore. When this situation occurs, Vertica generates a warning message for that object and the restore continues.

Examples

Suppose you have a full backup, including Schema1, owned by the user Alice. Schema1 contains Table1, owned by Bob, who eventually passes ownership to Chris. The user dbadmin performs the restore. The following scenarios might occur that affect ownership of these objects.

Scenario 1:

Schema1.Table1 has been dropped at some point since the backup was created. When dbadmin performs the restore, Vertica re-creates Schema1.Table1. As the user performing the restore, dbadmin takes ownership of Schema1.Table1. Because Schema1 still exists, Alice retains ownership of the schema.

Scenario 2:

Schema1 is dropped, along with all contained objects. When dbadmin performs the restore, Vertica re-creates the schema and all contained objects. dbadmin takes ownership of Schema1 and Schema1.Table1.

Scenario 3:

Schema1 and Schema1.Table1 both exist in the current database. When dbadmin rolls back to an earlier backup, the ownership of the objects remains unchanged. Alice owns Schema1, and Bob owns Schema1.Table1.

Scenario 4:

Schema1.Table1 exists and dbadmin wants to roll back to an earlier version. In the time since the backup was made, ownership of Schema1.Table1 has changed to Chris. When dbadmin restores Schema1.Table1, Alice remains owner of Schema1 and Chris remains owner of Schema1.Table1. The restore does not revert ownership of Schema1.Table1 from Chris to Bob.

8 - Copying the database to another cluster

The vbr task copycluster combines two other vbr tasks—backup and restore—as a single operation, enabling you to back up an entire data from one Enterprise Mode database cluster and then restore it on another.

Important

The vbr task copycluster combines two other vbr tasks— backup and restore—as a single operation, enabling you to back up an entire data from one Enterprise Mode database cluster and then restore it on another. This can facilitate routine operations, such as copying a database between development and production environments.

Caution

copycluster overwrites all existing data in the destination database. To preserve that data, back up the destination database before launching the copycluster task.

Restrictions

copycluster is invalid with Eon databases. It is also incompatible with HDFS storage locations; Vertica does not transfer data to a remote HDFS cluster as it does for a Linux cluster.

Prerequisites

copycluster requires that the target and source database clusters be identical in the following respects:

Vertica hotfix version—for example, 12.0.1-1

Number of nodes and node names, as shown in the system table NODES:

=> SELECT node_name FROM nodes;
  node_name
------------------
 v_vmart_node0001
 v_vmart_node0002
 v_vmart_node0003
(3 rows)

Database name

Vertica catalog, data, and temp directory paths as shown in the system table DISK_STORAGE:

=> SELECT node_name,storage_path,storage_usage FROM disk_storage;
    node_name     |                     storage_path                     | storage_usage
------------------+------------------------------------------------------+---------------
 v_vmart_node0001 | /home/dbadmin/VMart/v_vmart_node0001_catalog/Catalog | CATALOG
 v_vmart_node0001 | /home/dbadmin/VMart/v_vmart_node0001_data            | DATA,TEMP
 v_vmart_node0001 | /home/dbadmin/verticadb                              | DEPOT
 v_vmart_node0002 | /home/dbadmin/VMart/v_vmart_node0002_catalog/Catalog | CATALOG
...

Note

Directory paths for the catalog, data, and temp storage are the same on all nodes.

Database administrator accounts

The following requirements also apply:

The target cluster has adequate disk space for copycluster to complete.
The source cluster's database administrator must be able to log in to all target cluster nodes through SSH without a password.

Note
Passwordless access within the cluster is not the same as passwordless access between clusters. The SSH ID of the administrator account on the source cluster and the target cluster are likely not the same. You must configure each host in the target cluster to accept the SSH authentication of the source cluster.

Copycluster procedure

Create a configuration file for the copycluster operation. The Vertica installation includes a sample configuration file:
```
/opt/vertica/share/vbr/example_configs/copycluster.ini
```
For each node in the source database, create a [Mapping] entry that specifies the host name of each destination database node. Unlike other vbr tasks such as restore and backup, mappings for copycluster only require the destination host name. copycluster always stores backup data in the catalog and data directories of the destination database.

The following example configures vbr to copy the vmart database from its three-node v_vmart cluster to the test-host cluster:
```
[Misc]
snapshotName = CopyVmart
tempDir = /tmp/vbr

[Database]
dbName = vmart
dbUser = dbadmin
dbPassword = password
dbPromptForPassword = False

[Transmission]
encrypt = False
port_rsync = 50000

[Mapping]
; backupDir is not used for cluster copy
v_vmart_node0001= test-host01
v_vmart_node0002= test-host02
v_vmart_node0003= test-host03
```
Stop the target cluster.

As database administrator, invoke the vbr task copycluster from a source database node:

$ vbr -t copycluster -c copycluster.ini
Starting copy of database VMART.
Participating nodes: vmart_node0001, vmart_node0002, vmart_node0003, vmart_node0004.
Enter vertica password:
Snapshotting database.
Snapshot complete.
Determining what data to copy.
[==================================================] 100%
Approximate bytes to copy: 987394852 of 987394852 total.
Syncing data to destination cluster.
[==================================================] 100%
Reinitializing destination catalog.
Copycluster complete!

Important

If the copycluster task is interrupted, the destination cluster retains data files that already transferred. If you retry the operation, Vertica does not resend these files.

9 - Replicating objects to another database cluster

The vbr task replicate supports replication of tables and schemas from one database cluster to another.

The vbr task replicate supports replication of tables and schemas from one database cluster to another. You might consider replication for the following reasons:

Copy tables and schemas between test, staging, and production clusters.Replicate certain objects immediately after an important change, such as a large table data load, instead of waiting until the next scheduled backup.

In both cases, replicating objects is generally more efficient than exporting and importing them. The first replication of an object replicates the entire object. Subsequent replications copy only data that has changed since the last replication. Vertica replicates data as of the current epoch on the target database. Used with a cron job, you can replicate key objects to create a backup database.

Replicate versus copycluster

replicate only supports tables, schemas, and—in Eon Mode databases—namespaces. In situations where the target database is down, or you plan to replicate the entire database, Vertica recommends that you use the copycluster task to copy the database to another cluster. Thereafter, you can use replicate to update individual objects.

Replication procedure

To replicate objects to another database, perform these actions from the source database:

Verify replication requirements.
Identify the objects to replicate and target database in the vbr configuration file.
Replicate objects.

Verify replication requirements

The following requirements apply to the source and target databases and their respective clusters:

All nodes in both databases are UP, else DOWN nodes are handled as described below.
Versions of the two databases must be compatible. Vertica supports object replication to a target database up to one minor version higher than the current database version. For example, you can replicate objects from a 12.0.x database to a 12.1.x database.
The same Linux user is associated with the dbadmin account of both databases.
The source cluster database administrator can log on to all target nodes through SSH without a password.

Note
The SSH ID of the administrator account on the source cluster and the target cluster are likely not the same. You must configure each host in the target cluster to accept the SSH authentication of the source cluster.
Enterprise Mode: The following requirements apply:
- Both databases have the same number of nodes.
- Clusters of both databases have the same number of fault groups, where corresponding fault groups in each cluster have the same number of nodes.
Eon Mode: The following requirements apply:
- The primary subclusters of both databases have the same node subscriptions.
- Primary subclusters of the target database have as many or more nodes as primary subclusters of the source database.
- For databases with multiple namespaces, the target and source namespaces must satisfy the requirements described in Eon Mode database requirements.

Edit vbr configuration file

Tip

As a best practice, create a separate configuration file for each replication task.

Edit the vbr configuration file to use for the replicate task as follows:

In the [misc] section, set the objects parameter to the objects to be replicated:
```
; Identify the objects that you want to replicate
objects = schema.objectName    
```
Important
If your Eon Mode database has multiple namespaces, you must specify the namespace to which the objects belong. For vbr tasks, namespace names are prefixed with a period. For example, .n.s.t refers to table t in schema s in namespace n. See Eon Mode database requirements for more information.
In the [misc] section, set the snapshotName parameter to a unique snapshot identifier. Multiple replicate tasks can run concurrently with each other and with backup tasks, but only if their snapshot names are different.
```
snapshotName = name
```
In the [database] section, set the following parameters:
```
; parameters used to replicate objects between databases
dest_dbName =
dest_dbUser =
dest_dbPromptForPassword =
```
If you use a stored password, be sure to configure the dest_dbPassword parameter in your password configuration file.

In the [mapping] section, map source nodes to target hosts:

[Mapping]
v_source_node0001 = targethost01
v_source_node0002 = targethost02
v_source_node0003 = targethost03

Replicate objects

Run vbr with the replicate task:

vbr -t replicate -c configfile.ini

The replicate task can run concurrently with backup and other replicate tasks in either direction, provided all tasks have unique snapshot names. replicate cannot run concurrently with other vbr tasks.

Handling DOWN nodes

You can replicate objects if some nodes are down in either the source or target database, provided the nodes are visible on the network.

The effect of DOWN nodes on a replication task depends on whether they are present in the source or target database.

Location	Effect on replication
DOWN source nodes	Vertica can replicate objects from a source database containing DOWN nodes. If nodes in the source database are DOWN, set the corresponding nodes in the target database to DOWN as well.
DOWN target nodes	Vertica can replicate objects when the target database has DOWN nodes. If nodes in the target database are DOWN, exclude the corresponding source database nodes using the `--nodes` parameter on the `vbr` command line.

Monitoring object replication

You can monitor object replication in the following ways:

View vbr logs on the source database
Check database logs on the source and target databases
Query REMOTE_REPLICATION_STATUS on the source database

10 - Including and excluding objects

You specify objects to include in backup, restore, and replicate operations with the vbr configuration and command-line parameters includeObjects and --include-objects, respectively.

You specify objects to include in backup, restore, and replicate operations with the vbr configuration and command-line parameters includeObjects and --include-objects, respectively. You can optionally modify the set of included objects with the vbr configuration and command line parameters excludeObjects and --exclude-objects, respectively. Both parameters support wildcard expressions to include and exclude groups of objects.

Important

For example, you might back up all tables in the schema store, and then exclude from the backup the table store.orders and all tables in the same schema whose name includes the string account:

vbr --task=backup --config-file=db.ini --include-objects 'store.*' --exclude-objects 'store.orders,store.*account*'

Wildcard characters

Character	Description
?	Matches any single character. Case-insensitive.
	Matches 0 or more characters. Case-insensitive.
\	Escapes the next character. To include a literal ? or * in your table or schema name, use the \ character immediately before the escaped character. To escape the \ character itself, use a double \.
"	Escapes the . character. To include a literal . in your table or schema name, wrap the character in double quotation marks.

Matching schemas

Any string pattern without a period (.) character represents a schema. For example, the following includeObjects list can match any schema name that starts with the string customer, and any two-character schema name that starts with the letter s:

includeObjects = customer*,s?

When a vbr operation specifies a schema that is unqualified by table references, the operation includes all tables of that schema. In this case, you cannot exclude individual tables from the same schema. For example, the following vbr.ini entries are invalid:

; invalid:
includeObjects = VMart
excludeObjects = VMart.?table?

You can exclude tables from an included schema by identifying the schema with the pattern schemaname.*. In this case, the pattern explicitly specifies to include all tables in that schema with the wildcard *. In the following example, the include-objects parameter includes all tables in the VMart schema, and then excludes specific tables—specifically, the table VMart.sales and all VMart tables that include the string account:


--include-objects 'VMart.*'
--exclude-objects 'VMart.sales,VMart.*account*'

Matching tables

Any pattern that includes a period (.) represents a table. For example, in a configuration file, the following includeObjects list matches the table name sales.newclients, and any two-character table name in the same schema:

includeObjects = sales.newclients,sales.??

You can also match all schemas and tables in a database or backup by using the pattern *.*. For example, you can restore all tables and schemas in a backup using this command:

--include-objects '*.*'

Because a vbr parameter is evaluated on the command line, you must enclose the wildcards in single quote marks to prevent Linux from misinterpreting them.

Testing wildcard patterns

You can test the results of any pattern by using the --dry-run parameter with a backup or restore command. Commands that include --dry-run do not affect your database. Instead, vbr displays the result of the command without executing it. For more information on --dry-run, refer to the vbr reference.

Using wildcards with backups

You can identify objects to include in your object backup tasks using the includeObjects and excludeObjects parameters in your configuration file. A typical configuration file might include the following content:

[Misc]
snapshotName = dbobjects
restorePointLimit = 1
enableFreeSpaceCheck = True
includeObjects = VMart.*,online_sales.*
excludeObjects = *.*temp*

In this example, the backup would include all tables from the VMart and online_sales schemas, while excluding any table containing the string 'temp' in its name belonging to any schema.

After it evaluates included objects, vbr evaluates excluded objects and removes excluded objects from the included set. For example, if you included schema1.table1 and then excluded schema1.table1, that object would be excluded. If no other objects were included in the task, the task would fail. The same is true for wildcards. If an exclusion pattern removes all included objects, the task fails.

Using wildcards with restore

You can identify objects to include in your restore tasks using the --include-objects and --exclude-objects parameters.

Note

Take extra care when using wildcard patterns to restore database objects. Depending on your object restore mode settings, restored objects can overwrite existing objects. Test the impact of a wildcard restore with the --dry-run vbr parameter before performing the actual task.

As with backups, vbr evaluates excluded objects after it evaluates included objects and removes excluded objects from the included set. If no objects remain, the task fails.

A typical restore command might include this content. (Line wrapped in the documentation for readability, but this is one command.)

$ vbr -t restore -c verticaconfig --include-objects 'customers.*,sales??'
    --exclude-objects 'customers.199?,customers.200?'

This example includes the schema customers, minus any tables with names matching 199 and 200 plus one character, as well as all any schema matching 'sales' plus two characters.

Another typical restore command might include this content.

$ vbr -t restore -c replicateconfig --include-objects '*.transactions,flights.*'
    --exclude-objects 'flights.DTW*,flights.LAS*,flights.LAX*'

This example includes any table named transactions, regardless of schema, and any tables beginning with DTW, LAS, or LAX belonging to the schema flights. Although these three-letter airport codes are capitalized in the example, vbr is case-insensitive.

11 - Managing backups

vbr provides several tasks related to managing backups: listing them, checking their integrity, selectively deleting them, and more.

Important

vbr provides several tasks related to managing backups: listing them, checking their integrity, selectively deleting them, and more. In addition, vbr has parameters to allow you to restrict its use of system resources.

11.1 - Viewing backups

You can view backups in three ways:.

You can view backups in three ways:

vbr listbackup task: List backups on the local or remote backup host.
DATABASE_BACKUPS system table: Query for historical information about backups.
vbr log file: Check the status of a backup. The log file resides on the node where you ran vbr, in the directory specified by the vbr configuration parameter tempDir, by default set to /tmp/vbr.

vbr listbackup

The vbr task listbackup returns a list of all backups on backup hosts, whether local or remote. If unqualified by task options, listbackup returns the list to standard output in columnar format.

The following example lists two full backups of a three-node cluster, where each node is mapped to the same backup host, bkhost. Backups are listed in reverse chronological order:

$ vbr -t listbackup -c fullbackup.ini
backup                            backup_type   epoch   objects   include_patterns   exclude_patterns   nodes(hosts)                                                                                        version            file_system_type
backup_snapshot_20220912_131918   full          3915                                                    v_vmart_node0001(10.20.100.247), v_vmart_node0002(10.20.100.248), v_vmart_node0003(10.20.100.249)   v12.0.2-20220911   [Linux]
backup_snapshot_20220909_122300   full          3910                                                    v_vmart_node0001(10.20.100.247), v_vmart_node0002(10.20.100.248), v_vmart_node0003(10.20.100.249)   v12.0.2-20220911   [Linux]

The following table contains information about output columns that are returned from a vbr listbackup task:

Column	Description
`backup`	Identifies a backup by concatenating the configured snapshot name with the backup timestamp: `snapshot-name``_``YYYYMMDD``_``HHMMSS` For example, the following identifier identifies a backup generated by the configuration file that sets snapshotName to `monthlyBackup` on April 14 2022, at 13:44:52. `monthlyBackup_20220414_134452` Use the timestamp portion of this identifier—`20220414_134452`—to specify the archived backup you wish to restore.
`backup_type`	Type of backup, full or object.
`epoch`	Epoch when the backup was created.
`objects`	Objects that were backed up, blank if a full backup.
`include_patterns`	Wildcard patterns included in object backup tasks using the `includeObjects` parameter in your configuration file, blank for full backups.
`exclude_patterns`	Wildcard patterns included in your object backup tasks using the `excludeObjects` parameter in your configuration file, blank for full backups.
`nodes (hosts)`	(Enterprise Mode only) Names of database nodes and hosts that received the backup.
`version`	Version of Vertica used to create the backup.
`file_system_type`	Storage location file system of the Vertica hosts that comprise this backup—for example, Linux or GCS.
`communal_storage`	(Eon Mode only) Communal storage location for the backup.

Important

If you try to list backups on a local cluster with no database, the backup configuration node-host mappings must provide full paths. If the configuration maps to local hosts using the [] shortcut, the listbackup task fails.

Listbackup options

You can qualify the listbackup task with one or more options:

vbr --task listbackup [--list-all] [--json] [--list-output-file filepath] --config-file filepath

Option	Description
`--list-all`	Generate a list of all snapshots stored on the hosts and paths listed in the specified configuration file.
`--json`	Use JSON delimited format.
`--list-output-file`	Redirect output to the specified file.

The following example qualifies the listbackup task with the --list-all option. The output shows three nightly backups from nodes vmart_1, vmart_2, and v_mart3, which the configuration file nightly.ini maps to their respective hosts doca01, doca02, and doca03. The listbackup output shows that these locations contain not only object backups that were generated with nightly.ini, but also full backups created with a second configuration file, weekly.ini, which maps to the same nodes and host:

$ vbr --task listbackup --list-all --config-file /home/dbadmin/nightly.ini

backup                   backup_type  epoch  objects      include_patterns  exclude_patterns  nodes(hosts)                                      version  file_system_type
weekly_20220508_183249   full         1720                                       vmart_1(doca01), vmart_2(doca02), vmart_3(doca03) v11.0.1  [Linux]
weekly_20220501_182816   full         1403                                       vmart_1(doca01), vmart_2(doca02), vmart_3(doca03) v11.0.1  [Linux]
weekly_20220424_192754   full         1109                                       vmart_1(doca01), vmart_2(doca02), vmart_3(doca03) v11.0.1  [Linux]
nightly_20220507_183034  object       1705   sales_schema                        vmart_1(doca01), vmart_2(doca02), vmart_3(doca03) v11.0.1  [Linux]
nightly_20220506_181808  object       1692   sales_schema                        vmart_1(doca01), vmart_2(doca02), vmart_3(doca03) v11.0.1  [Linux]
nightly_20220505_193906  object       1632   sales_schema                        vmart_1(doca01), vmart_2(doca02), vmart_3(doca03) v11.0.1  [Linux]

Query backup history

You can query the system table DATABASE_BACKUPS to get historical information about backups. The objects column lists which objects were included in object-level backups.

Important

Do not use the backup_timestamp value to restore an archive. Instead, use the values provided by vbr listbackup task.

=> SELECT * FROM v_monitor.database_backups;
-[ RECORD 1 ]----+------------------------------
backup_timestamp | 2013-05-10 14:41:12.673381-04
node_name        | v_vmart_node0003
snapshot_name    | schemabak
backup_epoch     | 174
node_count       | 3
file_system_type | [Linux]
objects          | public, store, online_sales
-[ RECORD 2 ]----+------------------------------
backup_timestamp | 2013-05-13 11:17:30.913176-04
node_name        | v_vmart_node0003
snapshot_name    | kantibak
backup_epoch     | 175
node_count       | 3
file_system_type | [Linux]
objects          |
-[ RECORD 13 ]---+------------------------------
backup_timestamp | 2013-05-16 07:02:23.721657-04
node_name        | v_vmart_node0003
snapshot_name    | objectbak
backup_epoch     | 180
node_count       | 3
file_system_type | [Linux]
objects          | test, test2
-[ RECORD 14 ]---+------------------------------
backup_timestamp | 2013-05-16 07:19:44.952884-04
node_name        | v_vmart_node0003
snapshot_name    | table1bak
backup_epoch     | 180
node_count       | 3
file_system_type | [Linux]
objects          | test
-[ RECORD 15 ]---+------------------------------
backup_timestamp | 2013-05-16 07:20:18.585076-04
node_name        | v_vmart_node0003
snapshot_name    | table2bak
backup_epoch     | 180
node_count       | 3
file_system_type | [Linux]
objects          | test2

11.2 - Checking backup integrity

Vertica can confirm the integrity of your backup files and the manifest that identifies them.

Vertica can confirm the integrity of your backup files and the manifest that identifies them. By default, backup integrity checks output their results to the command line.

Quick check

The quick-check task gathers all backup metadata from the backup location specified in the configuration file and compares that metadata to the backup manifest. A quick check does not verify the objects themselves. Instead, this task outputs an exceptions list of any discrepancies between objects in the backup location and objects listed in the backup manifest.

Use the following format to perform quick check task:

$ vbr -t quick-check -c configfile.ini

For example:

$ vbr -t quick-check -c backupconfig.ini

Full check

The full-check task verifies all objects listed in the backup manifest against filesystem metadata. A full check includes the same steps as a quick check. You can include the optional --report-file parameter to output results to a delimited JSON file. This task outputs an exceptions list that identifies the following inconsistencies:

Incomplete restore points
Damaged restore points
Missing backup files
Unreferenced files

Use the following template to perform a full check task:

$ vbr -t full-check -c configfile.ini --report-file=path/filename

For example:

$ vbr -t full-check -c backupconfig.ini --report-file=logging/fullintegritycheck.json

11.3 - Repairing backups

Vertica can reconstruct backup manifests and remove unneeded backup objects.

Quick repair

The quick-repair task rebuilds the backup manifest, based on the manifests contained in the backup location.

Use the following template to perform a quick repair task:

$ vbr -t quick-repair -c configfile.ini

Garbage collection

The collect-garbage task rebuilds your backup manifest and deletes any backup objects that do not appear in the manifest. You can include the optional --report-file parameter to output results to a delimited JSON file.

Use the following template to perform a garbage collection task:

$ vbr -t collect-garbage -c configfile.ini --report-file=path/filename

11.4 - Removing backups

You can remove existing backups and restore points using vbr.

You can remove existing backups and restore points using vbr. When you use the remove task, vbr updates the manifests affected by the removal and maintains their integrity. If the backup archive contains multiple restore points, removing one does not affect the others. When you remove the last restore point, vbr removes the backup entirely.

Note

Vertica does not support removing backups through the file system.

Use the following template to perform a remove task:

$ vbr -t remove -c configfile.ini --archive timestamp

You can remove multiple restore points using the archive parameter. To obtain the timestamp for a particular restore point, use the listbackup task.

To remove multiple restore points, use a comma separator:
```
--archive="restore-point1,restore-point2"
```
To remove an inclusive range of restore points, use a colon:
```
--archive="oldest-restore-point:newest-restore-point"
```
To remove all restore points, specify an archive value of all:
```
--archive all
```

The following example shows how you can remove a restore point from an existing backup:

$ vbr -t remove -c backup.ini --archive  20160414_134452
Removing restore points: 20160414_134452
Remove complete!

11.5 - Estimating log file disk requirements

One of the vbr configuration parameters is tempDir.

One of the vbr configuration parameters is tempDir . This parameter specifies the database host location where vbr writes its log files and some other temp files (of negligible size). The default location is the /tmp/vbr directory on each database host. You can change the default location by specifying a different path in the configuration file.

The temporary storage directory also contains local log files describing the progress, throughput, and any errors encountered for each node. Each time you run vbr, the script creates a separate log file, each named with a timestamp. When using default settings, the log file typically uses about 4KB of space per node per backup.

The vbr log files are not removed automatically, so you must delete older log files manually, as necessary.

11.6 - Allocating resources

By default, vbr allows a single rsync connection (for Linux file systems), 10 concurrent threads (for cloud storage connections), and unlimited bandwidth for any backup or restore operation.

By default, vbr allows a single rsync connection (for Linux file systems), 10 concurrent threads (for cloud storage connections), and unlimited bandwidth for any backup or restore operation. You can change these values in your configuration file. See vbr configuration file reference for details about these parameters.

Connections

You might want to increase the number of concurrent connections. If you have many Vertica files, more connections can provide a significant performance boost as each connection increases the number of concurrent file transfers.

For more information, refer to the following parameters in [transmission]:

total_bwlimit_backup
total_bwlimit_restore
concurrency_backup
concurrency_restore

and the following parameters in [CloudStorage]:

cloud_storage_concurrency_backup
cloud_storage_concurrency_restore

Bandwidth limits

You can limit network bandwidth use through the total_bwlimit_backup and total_bwlimit_restore data transmission parameters. For more information, refer to [transmission].

12 - Troubleshooting backup and restore

These tips can help you avoid issues related to backup and restore with Vertica and to troubleshoot any problems that occur.

Check vbr log

The vbr log is separate from the Vertica log. Its location is set by the vbr configuration parameter tempDir, by default /tmp/vbr.

If the log has no explanation for an error or unexpected results, try increasing the logging level with the vbr option --debug:

vbr -t backup -c config-file --debug debug-level

where debug-level is an integer between 0 (default) and 3 (verbose), inclusive. As you increase the logging level, the file size of the log increases. For example:

$ vbr -t backup -c full_backup.ini --debug 3

Note

Scrutinize reports do not include vbr logs.

Check status of backup nodes

Backups fail if you run out of disk space on the backup hosts or if vbr cannot reach them all. Check that you have sufficient space on each backup host and that you can reach each host via ssh.

Sometimes vbr leaves rsync processes running on the database or backup nodes. These processes can interfere with new ones. If you get an rsync error in the console, look for runaway processes and kill them.

Common errors

Object replication fails

If you do not exclude the DOWN node, replication fails with the following error:

Error connecting to a destination database node on the host <hostname> : <error>  ...

Confirm that you excluded all DOWN nodes from the object replication operation.

Error restoring an archive

You might see an error like the following when restoring an archive:

$ vbr --task restore --archive prd_db_20190131_183111 --config-file /home/dbadmin/backup.ini
IOError: [Errno 2] No such file or directory: '/tmp/vbr/vbr_20190131_183111_s0rpYR/prd_db.info'

The problem is that the archive name is not in the correct format. Specify only the date/timestamp suffix of the directory name that identifies the archive to restore, as described in Restoring an Archive. For example:

$ vbr --task restore --archive 20190131_183111 --config-file /home/dbadmin/backup.ini

Backup or restore fails when using an HDFS storage location

When performing a backup of a cluster that includes HDFS storage locations, you might see an error like the following:

ERROR 5127:  Unable to create snapshot No such file /usr/bin/hadoop:
check the HadoopHome configuration parameter

This error is caused by the backup script not being able to back up the HDFS storage locations. You must configure Vertica and Hadoop to enable the backup script to back up these locations. See Requirements for backing up and restoring HDFS storage locations.

Object-level backup and restore are not supported with HDFS storage locations. You must use full backup and restore.

Could not connect to endpoint URL

(Eon Mode) When performing a cross-endpoint operation, you can see a connection error if you failed to specify the endpoint URL for your communal storage (VBR_COMMUNAL_STORAGE_ENDPOINT_URL). When the endpoint is missing but you specify credentials for communal storage, vbr tries to use those credentials to access AWS. This access fails, because those credentials are for your on-premises storage, not AWS. When performing cross-endpoint operations, check that all environment variables described in Cross-Endpoint Backups in Eon Mode are set correctly.

13 - vbr reference

vbr can back up and restore the full database, or specific schemas and tables.

vbr can back up and restore the full database, or specific schemas and tables. It also supports a number of other backup-related tasks—for example, list the history of all backups.

vbr is located in the Vertica binary directory—typically, /opt/vertica/bin/vbr.

Syntax

vbr { --help | -h }
  | { --task | -t } task  { --config-file | -c } configfile [ option[...] ]

Global options

The following options apply to all vbr tasks. For additional options, see Task-Specific Options.

Option	Description
`--help \| -h`	Display a brief `vbr` usage guide.
`{--task \| -t}` `task`	The `vbr` task to execute, one of the following: backup: create a full or object-level backup collect-garbage: rebuild the backup manifest and delete any unreferenced objects in the backup location copycluster: copy the database to another cluster (Enterprise Mode only, invalid for HDFS) full-check: verify all objects in the backup manifest and report missing or unreferenced objects init: prepare a new backup location listbackup: show available backups quick-check: confirm that all backed-up objects are in the backup manifest and report discrepancies between objects in the backup location and objects listed in the backup manifest quick-repair: build a replacement backup manifest based on storage locations and objects remove: remove specified restore points replicate: copy objects from one cluster to another restore: restore a full or object-level backup Note In general, tasks cannot run concurrently, with one exception: multiple `replicate` tasks can run concurrently with each other, and with `backup`.
`{--config-file \| -c}` `path`	File path of the configuration file to use for the given task.
`--debug` `level`	Level of debug messaging to the `vbr` log, an integer from 0 to 3 inclusive, where 0 (default) turns off debug messaging, and 3 is the most verbose level of messaging.
`--nodes` `nodeslist`	(Enterprise Mode only) Comma-delimited list of nodes on which to perform a `vbr` task. Listed nodes must match names in the Mapping section of the configuration file. Use this option to exclude DOWN nodes from a task, so `vbr` does not return with an error. Caution If you use `--nodes` with a `backup` task, be sure that the nodes list includes all UP nodes; omitting any UP node can cause data loss in that backup.
`--showconfig`	Displays the configuration values used to perform a specific task, displayed in raw JSON format before `vbr` starts task execution: `vbr -t` `task``-c``configfile` `--showconfig` `--showconfig` can also show settings for a given configuration file: `vbr -c` `configfile` `--showconfig`

Task-specific options

Some vbr tasks support additional options, described in the sections that follow.

The following vbr tasks have no task-specific options:

copycluster
quick-check
quick-repair

Backup

Create a full database or object-level backup, depending on configuration file settings.

Option	Description
`--dry-run`	Perform a test run to evaluate impact of the backup operation—for example, its size and potential overhead.

Collect-garbage

Rebuild the backup manifest and delete any unreferenced objects in the backup location.

Option	Description
`--report-file`	Output results to a delimited JSON file.

Full-check

Produce a full backup integrity check that verifies all objects in the backup manifest against file system metadata, and then outputs missing and unreferenced objects.

Option	Description
`--report-file`	Output results to a delimited JSON file.

Init

Create a backup directory or prepare an existing one for use, and create backup manifests. This task must precede the first vbr backup operation.

Option	Description
`--cloud-force-init`	Qualifies the `--task init` command to force the `init` task to succeed on S3 or GS storage targets when an identity/lock file mismatch occurs.
`--report-file`	Output results to a delimited JSON file.

Listbackup

Displays backups associated with the specified configuration file. Use this task to get archive (restore point) identifiers for restore and remove tasks.

Option	Description
`--list-all`	List all backups stored on the hosts and paths in the configuration file.
`--list-output-file` `filename`	Redirect output to the specified file.
`--json`	Use JSON delimited format.

Remove

Remove the backup restore points specified by the --archive option.

Option Description

Option	Description
`--archive`	Restore points to remove, one of the following: `timestamp`: A single restore point to remove. `timestamp``:``timestamp`: A range of contiguous restore points to remove. `all`: Remove all restore points. You obtain timestamp identifiers for the target restore points with the `listbackup` task. For details, see vbr listbackup.

--archive

Restore points to remove, one of the following:

timestamp: A single restore point to remove.
timestamp:timestamp: A range of contiguous restore points to remove.
all: Remove all restore points.

You obtain timestamp identifiers for the target restore points with the listbackup task. For details, see vbr listbackup.

Replicate

Copy objects from one cluster to an alternate cluster. This task can run concurrently with backup and other replicate tasks.

Option Description

--archive Timestamp of the backup restore point to replicate, obtained from the listbackup task.

--dry-run Perform a test run to evaluate impact of the replicate operation—for example, its size and potential overhead.

Option	Description
`--archive`	Timestamp of the backup restore point to replicate, obtained from the `listbackup` task.
`--dry-run`	Perform a test run to evaluate impact of the replicate operation—for example, its size and potential overhead.
`--target-namespace`	Eon Mode only, the namespace in the target database to which objects are replicated. `vbr` behaves differently depending on whether the target namespace exists: Exists: `vbr` attempts to restore or replicate the objects to the existing namespace, which must have the same shard count, shard boundaries, and node subscriptions as the source namespace. If these conditions are not met, the `vbr` task fails. Nonexistent: `vbr` creates a namespace in the target database with the name specified in `--target-namespace` and the shard count of the source namespace, and then replicates or restores the objects to that namespace. If no target namespace is specified, `vbr` attempts to restore or replicate objects to a namespace with the same name as the source namespace.

--target-namespace

Eon Mode only, the namespace in the target database to which objects are replicated.

vbr behaves differently depending on whether the target namespace exists:

Exists: vbr attempts to restore or replicate the objects to the existing namespace, which must have the same shard count, shard boundaries, and node subscriptions as the source namespace. If these conditions are not met, the vbr task fails.
Nonexistent: vbr creates a namespace in the target database with the name specified in --target-namespace and the shard count of the source namespace, and then replicates or restores the objects to that namespace.

If no target namespace is specified, vbr attempts to restore or replicate objects to a namespace with the same name as the source namespace.

Restore

Restore a full or object-level database backup.

Option	Description
`--archive`	Timestamp of the backup to restore, obtained from the `listbackup` task. If omitted, `vbr` restores the latest backup of the specified configuration.
`--restore-objects`	Comma-delimited list of objects—tables and schemas—to restore from a given backup.
`--include-objects`	Comma-delimited list of database objects or patterns of objects to include from a full or object-level backup.
`--exclude-objects`	Comma-delimited list of database objects or patterns of objects to exclude from the set specified by `--include-objects`. This option can only be used together with `--include-objects`.
`--dry-run`	Perform a test run to evaluate impact of the restore operation—for example, its size and potential overhead.
`--target-namespace`	Eon Mode only, the namespace in the target database to which objects are restored. `vbr` behaves differently depending on whether the target namespace exists: Exists: `vbr` attempts to restore or replicate the objects to the existing namespace, which must have the same shard count, shard boundaries, and node subscriptions as the source namespace. If these conditions are not met, the `vbr` task fails. Nonexistent: `vbr` creates a namespace in the target database with the name specified in `--target-namespace` and the shard count of the source namespace, and then replicates or restores the objects to that namespace. If no target namespace is specified, `vbr` attempts to restore or replicate objects to a namespace with the same name as the source namespace.

Note

The --restore-objects option and the --include-objects/exclude-objects options are mutually exclusive. You can use --include-objects to specify a set of objects and combine it with --exclude-objects to remove objects from the set.

Interrupting vbr

To cancel a backup, use Ctrl+C or send a SIGINT to the vbr Python process. vbr stops the backup process after it completes copying the data. Canceling a vbr backup with Ctrl+C closes the session immediately.

The files generated by an interrupted backup process remain in the target backup location directory. The next backup process picks up where the interrupted process left off.

Backup operations are atomic, so interrupting a backup operation does not affect the previous backup. The latest backup replaces the previous backup only after all other backup steps are complete.

Caution

restore or copycluster operations overwrite the database catalog directory. Interrupting either of these processes leaves the database unusable until you restart the process and allow it to finish.

14 - vbr configuration file reference

vbr configuration files divide backup settings into sections, under section-specific headings such as [Database] and [CloudStorage], which contain database access and cloud storage location settings, respectively.

vbr configuration files divide backup settings into sections, under section-specific headings such as [Database] and [CloudStorage], which contain database access and cloud storage location settings, respectively. Sections can appear in any order and can be repeated—for example, multiple [Database] sections.

Important

Section headings are case-sensitive.

14.1 - [CloudStorage]

The [CloudStorage] section replaces the now-deprecated [S3] section of earlier releases.

Eon Mode only

Sets options for storing backup data on in a supported cloud storage location.

The [CloudStorage] and [Mapping] configuration sections are mutually exclusive. If you include both, the backup fails with this error message:

Config has conflicting sections (Mapping, CloudStorage), specify only one of them.

Important

The [CloudStorage] section replaces the now-deprecated [S3] section of earlier releases. Likewise, cloud storage-specific configuration variables replace the equivalent S3 configuration variables.

Do not include [S3] and [CloudStorage] sections in the same configuration file; otherwise, vbr will use [S3] configuration settings and ignore [CloudStorage] settings, which can yield unexpected results.

Options

cloud_storage_backup_file_system_path

Host and path that you are using to handle file locking during the backup process. The format is [host]:path. vbr must be able to create a passwordless ssh connection to the location that you specify here.

To use a local NFS file system, omit the host: []:path.

cloud_storage_backup_path

Backup location. For S3-compatible or cloud locations, provide the bucket name and backup path. For HDFS locations, provide the appropriate protocol and backup path.

When you back up to cloud storage, all nodes back up to the same cloud storage bucket. You must create the backup location in the cloud storage before performing a backup. The following example specifies the backup path for S3 storage:

cloud_storage_backup_path = s3://backup-bucket/database-backup-path/

When you back up to an HDFS location, use the swebhdfs protocol if you use wire encryption. Use the webhdfs protocol if you do not use wire encryption. The following example uses encryption:

cloud_storage_backup_path = swebhdfs://backup-nameservice/database-backup-path/

cloud_storage_ca_bundle

Path to an SSL server certificate bundle.

Note

The key (*pem) file must be on the same path on all nodes of the database cluster.

For example:

cloud_storage_ca_bundle = /home/user/ssl-folder/ca-bundle

cloud_storage_concurrency_backup

The maximum number of concurrent backup threads for backup to cloud storage. For very large data volumes (greater than 10TB), you might need to reduce this value to avoid vbr failures.

Default: 10

cloud_storage_concurrency_delete

The maximum number of concurrent delete threads for deleting files from cloud storage. If the vbr configuration file contains a [CloudStorage] section, this value is set to 10 by default.

Default: 10

cloud_storage_concurrency_restore

The maximum number of concurrent restore threads for restoring from cloud storage. For very large data volumes (greater than 10TB), you might need to reduce this value to avoid vbr failures.

Default: 10

cloud_storage_encrypt_at_rest

S3 storage only. To enable at-rest encryption of your backups to S3, specify a value of sse. For more information, see Encrypting Backups on Amazon S3.

This value takes the following form:

cloud_storage_encrypt_at_rest = sse

cloud_storage_encrypt_transport

Boolean. If true, uses SSL encryption to encrypt data moving between your Vertica cluster and your cloud storage instance.

You must set this parameter to true if backing up or restoring from:

Amazon EC2 cluster
Google Cloud Storage (GCS)
Eon Mode on-premises database with communal storage on HDFS, to use wire encryption.

Default: true

cloud_storage_sse_kms_key_id

S3 storage only. If you use Amazon Key Management Security, use this parameter to provide your key ID. If you enable encryption and do not include this parameter, vbr uses SSE-S3 encryption.

This value takes the following form:

cloud_storage_sse_kms_key_id = key-id

14.2 - [database]

Sets options for accessing the database.

Sets options for accessing the database and, for replication, the destination.

Database options

dbName

Name of the database to back up. If you do not supply a database name, vbr selects the current database to back up.

OpenText recommends that you provide a database name.

dbPromptForPassword

Boolean, whether vbr prompts for a password. If set to false (no prompt at runtime), then the dbPassword parameter in the password configuration file must provide the password; otherwise, vbr prompts for one at runtime.

As a best practice, set dbPromptForPassword to false if dbUseLocalConnection is set to true.

Default: true

dbUser

Vertica user that performs vbr operations on the database operations. In the case of replicate tasks, this user is the source database user. You must be logged on as the database administrator to back up the database. The user password can be stored in the dbPassword parameter of the password configuration file; otherwise, vbr prompts for one at runtime.

Default: Current user name

dbUseLocalConnection

Boolean, whether vbr accesses the target database over a local connection with the user's Vertica password. If dbUseLocalConnection is enabled, vbr can operate on a local database without the user password being set in the vbr configuration. vbr ignores the passwordFile parameter and any settings in the password configuration file, including dbPassword.

If dbUseLocalConnection is enabled, then an authentication method must be granted to vbr users—typically a dbadmin—where method type is set to trust, and access is set to local:

=> CREATE AUTHENTICATION h1 method 'trust' local;
=> GRANT AUTHENTICATION h1 to dbadmin;

Default: false

Destination options

Set destination database parameters only if replicating objects on alternate clusters:

dest_dbName: Name of the destination database.
dest_dbPromptForPassword: Boolean, whether vbr prompts for the destination database password. If set to false (no prompt at runtime), then dest_dbPassword parameter in the password configuration file must provide the password; otherwise, vbr prompts for one at runtime.
dest_dbUser: Vertica user name in the destination database to use for loading replicated data. This user must have superuser privileges.

14.3 - [mapping]

Specifies all database nodes to include in an Enterprise Mode database backup.

Enterprise Mode only

Specifies all database nodes to include in an Enterprise Mode database backup. This section also specifies the backup host and directory of each node. If objects are replicated to an alternative database, the [Mapping] section maps target database nodes to the corresponding source database backup locations.

Note

[CloudStorage] and [Mapping] configuration sections are mutually exclusive. If you include both, the backup fails.

Format

Unlike other configuration file sections, the [Mapping] section does not use named parameters. Instead, it contains entries of the following format:

dbNode = backupHost:backupDir

dbNode

Name of the database node as recognized by Vertica. This value is not the node's host name; rather, it is the name Vertica uses internally to identify the node, typically in this format:

v_dbname_node000int

To find database node names in your cluster, query the node_name column of the NODES system table.

backupHost

The target host name or IP address on which to store this node's backup. backupHost is different from dbNode. The copycluster command uses this value to identify the target database node host name.

IPv6 addresses must be enclosed by square brackets []. For example:

v_backup_restore_node0001 = [fdfb:dbfa:0:2000::112]:/backupdir/backup_restore.2021-06-01T16:17:57
v_backup_restore_node0002 = [fdfb:dbfa:0:2000::113]:/backupdir/backup_restore.2021-06-01T16:17:57
v_backup_restore_node0003 = [fdfb:dbfa:0:2000::114]:/backupdir/backup_restore.2021-06-01T16:17:57

Important

Although supported, backups to an NFS host might perform poorly, particularly on networks shared with rsync operations.

backupDir

The full path to the directory on the backup host or node where the backup will be stored. The following requirements apply this directory:

Already exists when you run vbr with --task backup
Writable by the user account used to run vbr.
Unique to the database you are backing up. Multiple databases cannot share the same backup directory.
File system at this location supports fcntl lockf file locking.

For example:

[Mapping]
v_sec_node0001 = pri_bsrv01:/archive/backup
v_sec_node0002 = pri_bsrv02:/archive/backup
v_sec_node0003 = pri_bsrv03:/archive/backup

Mapping to the local host

vbr does not support using localhost to specify a backup host. To back up a database node to its own disk, specify the host name with empty square brackets. For example:

[Mapping]
NodeName = []:/backup/path

Mapping to the same database

The following example shows a [Mapping] section that specifies a single node to back up: v_vmart_node0001. The node is assigned to backup host srv01 and backup directory /home/dbadmin/backups. Although a single-node cluster is backed up, and the backup host and the database node are the same system, they are specified differently.

Specify the backup host and directory using a colon (:) as a separator:

[Mapping]
v_vmart_node0001 = srv01:/home/dbadmin/backups

Mapping to an alternative database

Note

Replicating objects to an alternative database requires the vbr configuration file to include a [NodeMapping] section. This section points source nodes to their target database nodes.

To restore an alternative database, add mapping information as follows:

[Mapping]
targetNode = backupHost:backupDir

For example:

[Mapping]
v_sec_node0001 = pri_bsrv01:/archive/backup
v_sec_node0002 = pri_bsrv02:/archive/backup
v_sec_node0003 = pri_bsrv03:/archive/backup

14.4 - [misc]

Configures basic backup settings.

Options

passwordFile

Path name of the password configuration file, ignored if dbUseLocalConnection (under [Database] is set to true.

restorePointLimit

Number of earlier backups to retain with the most recent backup. If set to 1 (the default), Vertica maintains two backups: the latest backup and the one before it.

Note

vbr saves multiple backups to the same location, which are shared through hard links. In such cases, the listbackup task displays the common backup prefix with unique time and date suffixes: my_archive20111111_205841

Default: 1

snapshotName

Base name of the backup used in the directory tree structure that vbr creates for each node, containing up to 240 characters limited to the following:

a–z
A–Z
0–9
Hyphen (-)
Underscore (_)

Each iteration in this series (up to restorePointLimit) consists of snapshotName and the backup timestamp. Each series of backups should have a unique and descriptive snapshot name. Full and object-level backups cannot share names. For most vbr tasks, snapshotName serves as a useful identifier in diagnostics and system tables. For object restore and replication tasks, snapshotName is used to build schema names in coexist mode operations.

Default: snapshotName

tempDir

Absolute path to a temporary storage area on the cluster nodes. This path must be the same on all database cluster nodes. vbr uses this directory as temporary storage for log files, lock files, and other bookkeeping information while it copies files from the source cluster node to the destination backup location. vbr also writes backup logs to this location.

The file system at this location must support fcntl lockf (POSIX) file locking.

Caution

Do not use the same location as your database's data or catalog directory. Unexpected files and directories in your data or catalog location can cause errors during database startup or restore.

Default: /tmp/vbr

drop_foreign_constraints

If true, all foreign key constraints are unconditionally dropped during object-level restore. You can then restore database objects independent of their foreign key dependencies.

Important

Vertica only uses this option if objectRestoreMode is set to coexist.

Default: false

enableFreeSpaceCheck

If true (default) or omitted, vbr confirms that the specified backup locations contain sufficient free space to allow a successful backup. If a backup location has insufficient resources, vbr displays an error message and cancels the backup. If vbr cannot determine the amount of available space or number of nodes in the backup directory, it displays a warning and continues with the backup.

Default: true

excludeObjects

Database objects and wildcard patterns to exclude from the set specified by includeObjects. Unicode characters are case-sensitive; others are not.

This parameter can be set only if includeObjects is also set.

hadoop_conf_dir

(Eon Mode on HDFS with high availability (HA) nodes only) Directory path containing the XML configuration files copied from Hadoop.

If the vbr operation includes more than one HA HDFS cluster, use a colon-separated list to provide the directory paths to the XML configuration files for each HA HDFS cluster. For example:

hadoop_conf_dir = path/to/xml-config-hahdfs1:path/to/xml-config-hahdfs2

This value must match the HadoopConfDir value set in the bootstrapping file created during installation.

includeObjects

Database objects and wildcard patterns to include with a backup task. You can use this parameter together with excludeObjects. Unicode characters are case-sensitive; others are not.

The includeObjects and objects parameters are mutually exclusive.

kerberos_keytab_file

(Eon Mode on HDFS only) Location of the keytab file that contains credentials for the Vertica Kerberos principal.

This value must match the KerberosKeytabFile value set in the bootstrapping file created during installation.

kerberos_realm

(Eon Mode on HDFS only) Realm portion of the Vertica Kerberos principal.

This value must match the KerberosRealm value set in the bootstrapping file created during installation.

kerberos_service_name

(Eon Mode on HDFS only) Service name portion of the Vertica Kerberos principal.

This value must match the KerberosServiceName value set in the bootstrapping file created during installation.

Default: vertica

objectRestoreMode

How vbr handles objects of the same name when restoring schema or table backups, one of the following:

createOrReplace: vbr creates any objects that do not exist. If an object does exist, vbr overwrites it with the version from the archive.
create: vbr creates any objects that do not exist and does not replace existing objects. If an object being restored does exist, the restore fails.
coexist: vbr creates the restored version of each object with a name formatted as follows:backup_timestamp_objectname

This approach allows existing and restored objects to exist simultaneously. If the appended information pushes the schema name past the maximum length of 128 characters, Vertica truncates the name. You can perform a reverse lookup of the original schema name by querying the system table TRUNCATED_SCHEMATA.

Tables named in the COPY clauses of data loaders are not changed. You can use ALTER DATA LOADER to rename target tables.

In all modes, vbr restores data with the current epoch. Object restore mode settings do not apply to backups and full restores.

Default: createOrReplace

objects

For an object-level backup or object replication, object (schema or table) names to include. To specify more than one object, enter multiple names in a comma-delimited list. If you specify no objects, vbr creates a full backup.

Important

This parameter cannot be used together with the parameters includeObjects and excludeObjects.

You specify objects as follows:

Specify table names in the form schema.objectname. For example, to make backups of the table customers from the schema finance, enter: finance.customers

If a public table and a schema have the same name, vbr backs up only the schema. Use the schema.objectname convention to avoid confusion.
Object names can include UTF-8 alphanumeric characters. Object names cannot include escape characters, single- (') or double-quote (") characters.
Specify non-alphanumeric characters with a backslash () followed by a hex value. For instance, if the table name is my table (my followed by a space character, then table), enter the object name as follows:

objects=my\20table
If an object name includes a period, enclose the name with double quotes.

Tip

To identify objects with wildcards, use the includeObjects/excludeObjects parameters.

14.5 - [NodeMapping]

vbr uses the node mapping section exclusively to restore objects from a backup of one database to a different database.

vbr uses the node mapping section exclusively to restore objects from a backup of one database to a different database. Be sure to update the [Mapping] section of your configuration file to point your target database nodes to their source backup locations. The target database must have at least as many UP nodes as the source database.

Use the following format to specify node mapping: source_node = target_node For example, you can use the following mapping to restore content from one 4-node database to an alternate 4-node database.


[NodeMapping]
v_sourcedb_node0001 = v_targetdb_node0001
v_sourcedb_node0002 = v_targetdb_node0002
v_sourcedb_node0003 = v_targetdb_node0003
v_sourcedb_node0004 = v_targetdb_node0004

See Restoring a database to an alternate cluster for a complete example.

14.6 - [transmission]

Sets options for transmitting data when using backup hosts.

Options

concurrency_backup

Maximum number of backup TCP rsync connection threads per node. To improve local and remote backup, replication, and copy cluster performance, you can increase the number of threads available to perform backups.

Increasing the number of threads allocates more CPU resources to the backup task and can, for remote backups, increase the amount of bandwidth used. The optimal value for this setting depends greatly on your specific configuration and requirements. Values higher than 16 produce no additional benefit.

Default: 1

concurrency_delete

Maximum number of delete TCP rsync connections per node. To improve local and remote restore, replication, and copycluster performance, increase the number of threads available to delete files.

Increasing the number of threads allocates more CPU resources to the delete task and can increase the amount of bandwidth used for deletes on remote backups. The optimal value for this setting depends on your specific configuration and requirements.

Default: 16

concurrency_restore

Maximum number of restore TCP rsync connections per node. To improve local and remote restore, replication, and copycluster performance, increase the number of threads available to perform restores.

Increasing the number of threads allocates more CPU resources to the restore task and can increase the amount of bandwidth used for restores of remote backups. The optimal value for this setting depends greatly on your specific configuration and requirements. Values higher than 16 produce no additional benefit.

Default: 1

copyOnHardLinkFailure

If a hard-link local backup cannot create links, copy the data instead. Copying takes longer than linking, so the default behavior is to return an error if links cannot be created on any node.

Default: false

encrypt

Whether transmitted data is encrypted while it is copied to the target backup location. Set this parameter to true only if performing a backup over an untrusted network—for example, backing up to a remote host across the Internet.

Important

Encrypting data transmission causes significant processing overhead and slows transfer. One of the processor cores of each database node is consumed during the encryption process. Use this option only if you are concerned about the security of the network used when transmitting backup data.

Omit this parameter from the configuration file for hard-link local backups. If you set both encrypt and hardLinkLocal to true in the same configuration file, vbr issues a warning and ignores encrypt.

Default: false

hardLinkLocal

Whether to create a full- or object-level backup using hard file links on the local file system, rather than copying database files to a remote backup host. Add this configuration parameter manually to the Transaction section of the configuration file.

For details on usage, see Full Hardlink Backup/Restore.

Default: false

port_rsync

Default port number for the rsync protocol. Change this value if the default rsync port is in use on your cluster, or you need rsync to use another port to avoid a firewall restriction.

Default: 50000

serviceAccessUser

User name used for simple authentication of rsync connections. This user is neither a Linux nor Vertica user name, but rather an arbitrary identifier used by the rsync protocol. If you omit setting this parameter, rsync runs without authentication, which can create a potential security risk. If you choose to save the password, store it in the password configuration file.

total_bwlimit_backup

Total bandwidth limit in KBps for backup connections. Vertica distributes this bandwidth evenly among the number of connections set in concurrency_backup. The default value of 0 allows unlimited bandwidth.

The total network load allowed by this value is the number of nodes multiplied by the value of this parameter. For example, a three node cluster and a total_bwlimit_backup value of 100 would allow 300Kbytes/sec of network traffic.

Default: 0

total_bwlimit_restore

Total bandwidth limit in KBps for restore connections. distributes this bandwidth evenly among the number of connections set in concurrency_restore. The default value of 0 allows unlimited bandwidth.

The total network load allowed by this value is the number of nodes multiplied by the value of this parameter. For example, a three node cluster and a total_bwlimit_restore value of 100 would allow 300Kbytes/sec of network traffic.

Default: 0

14.7 - Password configuration file

For improved security, store passwords in a password configuration file and then restrict read access to that file.

For improved security, store passwords in a password configuration file and then restrict read access to that file. Set the passwordFile parameter in your vbr configuration file to this file.

[passwords] password settings

All password configuration parameters are inside the file's [Passwords] section.

dbPassword: Database administrator's Vertica password, used if the dbPromptForPassword parameter is false. This parameter is ignored if dbUseLocalConnection is set to true.
dest_dbPassword: Password for the dest_dbuser Vertica account, for replication tasks only.
serviceAccessPass: Password for the rsync user account.

Examples

See Password file.

Backing up and restoring the database

Important

Supported cloud storage

Additional considerations for HDFS storage locations

1 - Common use cases

Routine backups in Enterprise Mode

Routine backups in Eon Mode

Checkpoint backups: backing up before a major operation

Restoring selected objects

Restoring an entire database

Copying a cluster

Replicating selected objects to another database

2 - Sample vbr configuration files

2.1 - External full backup/restore

backup_restore_full_external.ini

2.2 - Backup/restore to cloud storage

backup_restore_cloud_storage.ini

2.3 - Full hard-link backup/restore

backup_restore_full_hardlink.ini

2.4 - Full local backup/restore

backup_restore_full_local.ini

2.5 - Object-level local backup/restore in Enterprise Mode

backup_restore_object_local.ini

2.6 - Restore object from backup to an alternate cluster

object_restore_to_other_cluster.ini

2.7 - Object replication to an alternate database

replicate.ini

2.8 - Database copy to an alternate cluster

copycluster.ini

2.9 - Password file

password.ini

3 - Eon Mode database requirements

Note

Cloud storage requirements

Cloud storage access

Note

Eon on-premises and private cloud storage

HDFS on-premises storage

Database restore requirements

Object-level tasks with multiple namespaces

Important

Restoring a database with multiple communal storage locations

4 - Requirements for backing up and restoring HDFS storage locations

Configuring Kerberos

Configuring distcp on a Vertica cluster

Note

Configuration overview

Installing a Java runtime

Finding your Hadoop distribution's package repository

Configuring Vertica nodes to access the Hadoop Distribution’s package repository

Installing the required Hadoop packages

Setting configuration parameters

Caution

Confirming that distcp runs

Troubleshooting

5 - Setting up backup locations

Important

5.1 - Configuring backup hosts and connections

Configuring TCP forwarding on database hosts

Creating configuration files for backup hosts

Note

Preparing backup host directories

Estimating backup host disk requirements

Making backup hosts accessible

Setting up passwordless SSH access

Increasing the SSH maximum connection settings for a backup host

See also

5.2 - Configuring hard-link local backup hosts

Listing host names

5.3 - Configuring cloud storage backups

Configuration file requirements

Environment variable requirements

Enterprise Mode and Eon Mode

Important

Eon Mode only

Important

Azure Blob Storage only

5.4 - Additional considerations for cloud storage

Configuring cloud storage for backups

Reinitializing cloud backup storage