This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Backing up and restoring the database
Creating regular database backups is an important part of basic maintenance tasks.
Important
Inadequate security on backups can compromise overall database security. Be sure to secure backup locations and strictly limit access to backups only to users who already have permissions to access all database data.
Creating regular database backups is an important part of basic maintenance tasks. Vertica supplies a comprehensive utility, vbr
, for this purpose. vbr
lets you perform the following operations. Unless otherwise noted, operations are supported in both Enterprise Mode and Eon Mode:
-
Back up a database.
-
Back up specific objects (schemas or tables) in a database.
-
Restore a database or individual objects from backup.
-
Copy a database to another cluster. For example, to promote a test cluster to production (Enterprise Mode only).
-
Replicate individual objects (schemas or tables) to another cluster.
-
List available backups.
When you run vbr
, you specify a configuration (.ini) file. In this file you specify all of the configuration parameters for the operation: what to back up, where to back it up, how many backups to keep, whether to encrypt transmissions, and much more. Vertica provides several Sample vbr configuration files that you can use as templates.
You can use vbr
to restore a backup created by vbr
. Typically, you use the same configuration file for both operations. Common use cases introduces the most common vbr
operations.
When performing a backup, you can save your data to one of the following locations:
You cannot back up an Enterprise Mode database and restore it in Eon Mode, or vice versa.
Supported cloud storage
Vertica supports backup and restore operations in the following cloud storage locations:
-
Amazon Web Services (AWS) S3
-
S3-compatible private cloud storage, such as Pure Storage or Minio
-
Google Cloud Storage (GCS)
-
Azure Blob Storage
If you are backing up an Eon Mode database, you must use a supported cloud storage location.
You cannot perform backup or restore operations between different cloud providers. For example, you cannot back up or restore from GCS to an S3 location.
Additional considerations for HDFS storage locations
If your database has any storage locations on HDFS, additional configuration is required to enable those storage locations for backup operations. See Requirements for backing up and restoring HDFS storage locations.
1 - Common use cases
You can use vbr to perform many tasks related to backup and restore.
You can use vbr
to perform many tasks related to backup and restore. The vbr reference describes all of the tasks in detail. This section summarizes common use cases. For each of these cases, there are additional requirements not covered here. Be sure to read the linked topics for details.
This is not a complete list of Backup/Restore capabilities.
Routine backups in Enterprise Mode
A full backup stores a copy of your data in another location—ideally a location that is separated from your database location, such as on different hardware or in the cloud. You give the backup a name (the snapshot name), which allows you to have different backups and backup types without interference. In your configuration file, you can map database nodes to backup locations and set some other parameters.
Before your first backup, run the vbr init task.
Use the vbr backup task to perform a full backup. The External full backup/restore example provides a starting point for your configuration. For complete documentation of full backups, see Creating full backups.
Routine backups in Eon Mode
For the most part, backups in Eon Mode work the same way as backups in Enterprise Mode. Eon Mode has some additional requirements described in Eon Mode database requirements, and some configuration parameters are different for backups to cloud storage. You can back up or restore Eon Mode databases that run in the cloud or on-premises using a supported cloud storage location.
Use the vbr backup task to perform a full backup. The Backup/restore to cloud storage example provides a starting point for your configuration. For complete documentation of full backups, see Creating full backups.
Checkpoint backups: backing up before a major operation
It is a good idea to back up your database before performing destructive operations such as dropping tables, or before major operations such as upgrading Vertica to a new version.
You can perform a regular full backup for this purpose, but a faster way is to create a hard-link local backup. This kind of backup copies your catalog and links your data files to another location on the local file system on each node. (You can also do a hard-link backup of specific objects rather than the whole database.) A hard-link local backup does not provide the same protection as a backup stored externally. For example, it does not protect you from local system failures. However, for a backup that you expect to need only temporarily, a hard-link local backup is an expedient option. Do not use hard-link local backups as substitutes for regular backups to other nodes.
Hard-link backups use the same vbr backup task as other backups, but with a different configuration. The Full hard-link backup/restore example provides a starting point for your configuration. See Creating hard-link local backups for more information.
Restoring selected objects
Sometimes you need to restore specific objects, such as a table you dropped, rather than the entire database. You can restore individual tables or schemas from any backup that contains them, whether a full backup or an object backup.
Use the vbr restore task and the --restore-objects
parameter to specify what to restore. Usually you use the same configuration file that you used to create the backup. See Restoring individual objects for more information.
Restoring an entire database
You can restore both Enterprise Mode and Eon Mode databases from complete backups. You cannot use restore to change the mode of your database. In Eon Mode, you can restore to the primary subcluster without regard to secondary subclusters.
Use the vbr restore task to restore a database. As when restoring selected objects, you usually use the same configuration file that you used to create the backup. See Restoring a database from a full backup and Restoring hard-link local backups for more information.
Copying a cluster
You might need to copy a database to another cluster of computers, such as when you are promoting a database from a staging environment to production. Copying a database to another cluster is essentially a simultaneous backup and restore operation. The data is backed up from the source database cluster and restored to the destination cluster in a single operation.
Use the vbr copycluster task to copy a cluster. The Database copy to an alternate cluster example provides a starting point for your configuration. See Copying the database to another cluster for more information.
Replicating selected objects to another database
You might want to replicate specific tables or schemas from one database to another. For example, you might do this to copy data from a production database to a test database to investigate a problem in isolation. Another example is when you complete a large data load in one database, replication to another database might be more efficient than repeating the load operation in the other database.
Use the vbr replicate task to replicate objects. You specify the objects to replicate in the configuration file. The Object replication to an alternate database example provides a starting point for your configuration. See Replicating objects to another database cluster for more information.
2 - Sample vbr configuration files
The vbr utility uses configuration files to provide the information it needs to back up and restore a full or object-level backup or copy a cluster.
The vbr utility uses configuration files to provide the information it needs to back up and restore a full or object-level backup or copy a cluster. No default configuration file exists. You must always specify a configuration file with the vbr command.
Vertica includes sample configuration files that you can copy, edit, and deploy for various vbr tasks. Vertica automatically installs these files at:
/opt/vertica/share/vbr/example_configs
2.1 - External full backup/restore
An external (distributed) backup backs up each database node to a distinct backup host.
backup_restore_full_external.ini
An external (distributed) backup backs up each database node to a distinct backup host. Nodes are mapped to hosts in the [Mapping] section.
To restore, use the same configuration file that you used to create the backup.
; This sample vbr configuration file shows full or object backup and restore to a separate remote backup-host for each respective database host.
; Section headings are enclosed by square brackets.
; Comments have leading semicolons (;) or pound signs (#).
; An equal sign separates options and values.
; Specify arguments marked '!!Mandatory!!' explicitly.
; All commented parameters are set to their default value.
; ------------------------------------------- ;
;;; BASIC PARAMETERS ;;;
; ------------------------------------------- ;
[Mapping]
; !!Mandatory!! This section defines what host and directory will store the backup for each node.
; node_name = backup_host:backup_dir
; In this "parallel backup" configuration, each node backs up to a distinct external host.
; To backup all database nodes to a single external host, use that single hostname/IP address in each entry below.
v_exampledb_node0001 = 10.20.100.156:/home/dbadmin/backups
v_exampledb_node0002 = 10.20.100.157:/home/dbadmin/backups
v_exampledb_node0003 = 10.20.100.158:/home/dbadmin/backups
v_exampledb_node0004 = 10.20.100.159:/home/dbadmin/backups
[Misc]
; !!Recommended!! Snapshot name. Object and full backups should always have different snapshot names.
; Backups with the same snapshotName form a time sequence limited by restorePointLimit.
; SnapshotName is used for naming archives in the backup directory, and for monitoring and troubleshooting.
; Valid characters: a-z A-Z 0-9 - _
; snapshotName = backup_snapshot
[Database]
; !!Recommended!! If you have more than one database defined on this Vertica cluster, use this parameter to specify which database to backup/restore.
; dbName = current_database
; If this parameter is True, vbr prompts the user for the database password every time.
; If False, specify the location of password config file in 'passwordFile' parameter in [Misc] section.
; dbPromptForPassword = True
; If true, vbr attempts to connect to the database using a local connection.
; dbUseLocalConnection = False
; ------------------------------------------- ;
;;; ADVANCED PARAMETERS ;;;
; ------------------------------------------- ;
[Misc]
; The temp directory location on all database hosts.
; The directory must be readable and writeable by the dbadmin, and must implement POSIX style fcntl lockf locking.
; tempDir = /tmp/vbr
; Specifies the number of historical backups to retain in addition to the most recent backup.
; 1 current + n historical backups
; restorePointLimit = 1
; Full path to the password configuration file
; Store this file in directory readable only by the dbadmin
; (no default)
; passwordFile = /path/to/vbr/pw.txt
; When enabled, Vertica confirms that the specified backup locations contain
; sufficient free space and inodes to allow a successful backup. If a backup
; location has insufficient resources, Vertica displays an error message explaining the shortage and
; cancels the backup. If Vertica cannot determine the amount of available space
; or number of inodes in the backupDir, it displays a warning and continues
; with the backup.
; enableFreeSpaceCheck = True
[Transmission]
; Specifies the default port number for the rsync protocol.
; port_rsync = 50000
; Total bandwidth limit for all backup connections in KBPS, 0 for unlimited. Vertica distributes
; this bandwidth evenly among the number of connections set in concurrency_backup.
; total_bwlimit_backup = 0
; The maximum number of backup TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_backup = 1
; The total bandwidth limit for all restore connections in KBPS, 0 for unlimited
; total_bwlimit_restore = 0
; The maximum number of restore TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_restore = 1
; The maximum number of delete TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_delete = 16
[Database]
; Vertica user name for vbr to connect to the database.
; This setting is rarely needed since dbUser is normally identical to the database administrator
; dbUser = current_username
2.2 - Backup/restore to cloud storage
You can backup and restore Enterprise Mode and Eon Mode databases to a cloud storage location.
backup_restore_cloud_storage.ini
You can backup and restore Enterprise Mode and Eon Mode databases to a cloud storage location. You must back up Eon Mode databases to a supported cloud storage location. Configuration settings in the [CloudStorage] section are identical for both Enterprise Mode and Eon Mode.
There are one-time configurations that you must complete before your first backup to a new cloud storage location. See Additional considerations for cloud storage for more information.
Backups to on-premises cloud storage destinations require additional configuration for both Enterprise Mode and Eon databases. For details about the additional requirements, see Configuring cloud storage backups.
To restore, use the same configuration file that you used to create the backup. To restore selected objects rather than the entire database, specify the objects to restore on the vbr
command line using --restore-objects
.
; This sample vbr configuration file shows backup to Cloud Storage e.g AWS S3, GCS, HDFS or on-premises (e.g. Pure Storage)
; This can be used for Vertica databases in Enterprise or Eon mode.
; Section headings are enclosed by square brackets.
; Comments have leading semicolons (;) or pound signs (#).
; Option and values are separated by an equal sign.
; Only arguments marked as '!!Mandatory!!' must be specified explicitly.
; All commented parameters are set to their default value.
; ------------------------------------------- ;
;;; BASIC PARAMETERS ;;;
; ------------------------------------------- ;
[CloudStorage]
; This section replaces the [Mapping] section and is required to back up to cloud storage.
; !!Mandatory!! Backup location on Cloud or HDFS (no default).
cloud_storage_backup_path = gs://backup_bucket/database_backup_path/
; cloud_storage_backup_path = s3://backup_bucket/database_backup_path/
; cloud_storage_backup_path = webhdfs://backup_nameservice/database_backup_path/
; cloud_storage_backup_path = azb://backup_account/backup_container/
; !!Mandatory!! directory used to manage locking during a backup (no default). If the directory is mounted on the initiator host, you
; should use "[]" instead of the local host name. The file system must support POSIX fcntl flock.
cloud_storage_backup_file_system_path = []:/home/dbadmin/backup_locks_dir/
[Misc]
; !!Recommended!! Snapshot name
; Backups with the same snapshotName form a time sequence limited by restorePointLimit.
; SnapshotName is used for naming archives in the backup directory, and for monitoring and troubleshooting.
; Valid values: a-z A-Z 0-9 - _
; snapshotName = backup_snapshot
; Specifies how Vertica handles objects of the same name when restoring schema or table backups.
; objectRestoreMode = createOrReplace
; Specifies which tables and/or schemas to copy. For tables, the containing schema defaults to public.
; Note: 'objects' is incompatible with 'includeObjects' and 'excludeObjects'.
; (no default)
; objects = mytable, myschema, myothertable
; Specifies the set of objects to backup/restore; wildcards may be used.
; Note: 'includeObjects' is incompatible with 'objects'.
; includeObjects = public.mytable, customer*, s?
; Subtracts from the set of objects to backup/restore; wildcards may be used
; Note: 'excludeObjects' is incompatible with 'objects'.
; excludeObjects = public.*temp, etl.phase?
[Database]
; !!Recommended!! If you have more than one database defined on this Vertica cluster, use this parameter to specify which database to backup/restore.
; dbName = current_database
; If this parameter is True, vbr prompts the user for the database password every time.
; If False, specify the location of password config file in 'passwordFile' parameter in [Misc] section.
; dbPromptForPassword = True
; If true, vbr attempts to connect to the database using a local connection.
; dbUseLocalConnection = False
; ------------------------------------------- ;
;;; ADVANCED PARAMETERS ;;;
; ------------------------------------------- ;
[CloudStorage]
; Specifies encryption-at-rest on S3
; cloud_storage_encrypt_at_rest = sse
; cloud_storage_sse_kms_key_id = <key_id>
; Specifies SSL encrypted transfer.
; cloud_storage_encrypt_transport = True
; Specifies the number of threads for upload/download - backup
; cloud_storage_concurrency_backup = 10
; Specifies the number of threads for upload/download - restore
; cloud_storage_concurrency_restore = 10
; Specifies the number of threads for deleting objects from the backup location
; cloud_storage_concurrency_delete = 10
; Specifies the path to a custom SSL server certificate bundle
; cloud_storage_ca_bundle = /home/user/ssl_folder/ca_bundle.pem
[Misc]
; The temp directory location on all database hosts.
; The directory must be readable and writeable by the dbadmin, and must implement POSIX style fcntl lockf locking.
; tempDir = /tmp/vbr
; Specifies the number of historical backups to retain in addition to the most recent backup.
; 1 current + n historical backups
; restorePointLimit = 1
; Full path to the password configuration file
; Store this file in directory readable only by the dbadmin.
; (no default)
; passwordFile = /path/to/vbr/pw.txt
; Specifies the service name of the Vertica Kerberos principal. This only applies to HDFS.
; kerberos_service_name = vertica
; Specifies the realm (authentication domain) of the Vertica Kerberos principal. This only applies to HDFS.
; kerberos_realm = your_auth_domain
; Specifies the location of the keytab file which contains the credentials for the Vertica Kerberos principal. This only applies to HDFS.
; kerberos_keytab_file = /path/to/keytab_file
; Specifies the location of the Hadoop XML configuration files of the HDFS clusters. Only set this when your cluster is on HA. This only applies to HDFS.
; If you have multiple conf directories, please separate them with ':'.
; hadoop_conf_dir = /path/to/conf or /path/to/conf1:/path/to/conf2
[Database]
; Vertica user name for vbr to connect to the database.
; This setting is rarely needed since dbUser is normally identical to the database administrator
; dbUser = current_username
2.3 - Full hard-link backup/restore
The following requirements apply to configuring hard-link local backups:.
backup_restore_full_hardlink.ini
The following requirements apply to configuring hard-link local backups:
-
Under the [Transmission]
section, add the parameter hardLinkLocal :
hardLinkLocal = True
-
The backup directory must be in the same file system as the database data directory.
-
Omit the encrypt parameter. If the configuration file sets both parameters encrypt and hardLinkLocal to true, then vbr issues a warning and ignores the encrypt parameter.
; This sample vbr configuration file shows backup and restore using hard-links to data files on each database host for that host's backup.
; Section headings are enclosed by square brackets.
; Comments have leading semicolons (;) or pound signs (#).
; An equal sign separates options and values.
; Specify arguments marked '!!Mandatory!!' explicitly.
; All commented parameters are set to their default value.
; ------------------------------------------- ;
;;; BASIC PARAMETERS ;;;
; ------------------------------------------- ;
[Mapping]
; For each database node there must be one [Mapping] entry to indicate the directory to store the backup.
; !!Mandatory!! Backup host name (no default) and Backup directory (no default).
; node_name = backup_host:backup_dir
; Must use [] for hardlink backups
v_exampledb_node0001 = []:/home/dbadmin/backups
v_exampledb_node0002 = []:/home/dbadmin/backups
v_exampledb_node0003 = []:/home/dbadmin/backups
v_exampledb_node0004 = []:/home/dbadmin/backups
[Misc]
; !!Recommended!! Snapshot name. Object and full backups should always have different snapshot names.
; Backups with the same snapshotName form a time sequence limited by restorePointLimit.
; Valid characters: a-z A-Z 0-9 - _
; snapshotName = backup_snapshot
[Transmission]
; !!Mandatory!! Identifies the backup as a hardlink style backup.
hardLinkLocal = True
; If copyOnHardLinkFailure is True, when a hard-link local backup cannot create links the data is copied instead.
copyOnHardLinkFailure = False
; ------------------------------------------- ;
;;; ADVANCED PARAMETERS ;;;
; ------------------------------------------- ;
[Database]
; !!Recommended!! If you have more than one database defined on this Vertica cluster, use this parameter to specify which database to backup/restore.
; dbName = current_database
; If this parameter is True, vbr prompts the user for the database password every time.
; If False, specify the location of password config file in 'passwordFile' parameter in [Misc] section.
; dbPromptForPassword = True
[Misc]
; The temp directory location on all database hosts.
; The directory must be readable and writeable by the dbadmin, and must implement POSIX style fcntl lockf locking.
; tempDir = /tmp/vbr
; Full path to the password configuration file
; Store this file in directory readable only by the dbadmin.
; (no default)
; passwordFile =
; Specifies the number of historical backups to retain in addition to the most recent backup.
; 1 current + n historical backups
; restorePointLimit = 1
; When enabled, Vertica confirms that the specified backup locations contain
; sufficient free space and inodes to allow a successful backup. If a backup
; location has insufficient resources, Vertica displays an error message explaining the shortage and
; cancels the backup. If Vertica cannot determine the amount of available space
; or number of inodes in the backupDir, it displays a warning and continues
; with the backup.
; enableFreeSpaceCheck = True
[Database]
; Vertica user name for vbr to connect to the database.
; This setting is rarely needed since dbUser is normally identical to the database administrator.
; dbUser = current_username
2.4 - Full local backup/restore
backup_restore_full_local.ini
; This is a sample vbr configuration file for backup and restore using a file system on each database host for that host's backup.
; Section headings are enclosed by square brackets.
; Comments have leading semicolons (;) or pound signs (#).
; An equal sign separates options and values.
; Specify arguments marked '!!Mandatory!!' explicitly.
; All commented parameters are set to their default value.
; ------------------------------------------- ;
;;; BASIC PARAMETERS ;;;
; ------------------------------------------- ;
[Mapping]
; !!Mandatory!! For each database node there must be one [Mapping] entry to indicate the directory to store the backup.
; node_name = backup_host:backup_dir
; [] indicates backup to localhost
v_exampledb_node0001 = []:/home/dbadmin/backups
v_exampledb_node0002 = []:/home/dbadmin/backups
v_exampledb_node0003 = []:/home/dbadmin/backups
v_exampledb_node0004 = []:/home/dbadmin/backups
[Misc]
; !!Recommended!! Snapshot name
; Backups with the same snapshotName form a time sequence limited by restorePointLimit.
; SnapshotName is used for naming archives in the backup directory, and for monitoring and troubleshooting.
; Valid values: a-z A-Z 0-9 - _
; snapshotName = backup_snapshot
[Database]
; !!Recommended!! If you have more than one database defined on this Vertica cluster, use this parameter to specify which database to backup/restore.
; dbName = current_database
; If this parameter is True, vbr prompts the user for the database password every time.
; If False, specify the location of password config file in 'passwordFile' parameter in [Misc] section.
; dbPromptForPassword = True
; ------------------------------------------- ;
;;; ADVANCED PARAMETERS ;;;
; ------------------------------------------- ;
[Misc]
; The temp directory location on all database hosts.
; The directory must be readable and writeable by the dbadmin, and must implement POSIX style fcntl lockf locking.
; tempDir = /tmp/vbr
; Specifies the number of historical backups to retain in addition to the most recent backup.
; 1 current + n historical backups
; restorePointLimit = 1
; Full path to the password configuration file
; Store this file in directory readable only by the dbadmin.
; (no default)
; passwordFile = /path/to/vbr/pw.txt
; When enabled, Vertica confirms that the specified backup locations contain
; sufficient free space and inodes to allow a successful backup. If a backup
; location has insufficient resources, Vertica displays an error message explaining the shortage and
; cancels the backup. If Vertica cannot determine the amount of available space
; or number of inodes in the backupDir, it displays a warning and continues
; with the backup.
; enableFreeSpaceCheck = True
[Transmission]
; The total bandwidth limit for all restore connections in KBPS, 0 for unlimited
; total_bwlimit_restore = 0
; The maximum number of restore TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_restore = 1
; Total bandwidth limit for all backup connections in KBPS, 0 for unlimited. Vertica distributes
; this bandwidth evenly among the number of connections set in concurrency_backup.
; total_bwlimit_backup = 0
; The maximum number of backup TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_backup = 1
; The maximum number of delete TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_delete = 16
[Database]
; Vertica user name for vbr to connect to the database.
; This setting is rarely needed since dbUser is normally identical to the database administrator
; dbUser = current_username
2.5 - Object-level local backup/restore in Enterprise Mode
An object backup backs up only the schemas or tables that are specified in the [Misc] section by the parameter objects, or parameters includeObjects and excludeObjects.
backup_restore_object_local.ini
An object backup backs up only the schemas or tables that are specified in the [Misc]
section by the parameter objects, or parameters includeObjects and excludeObjects.
For an object restore, use the same configuration file that you used to create the backup, and specify the objects to restore with the vbr command-line parameter
--restore-objects
.
; This sample vbr configuration file shows object-level backup and restore
; using a file system on each database host for that host's backup.
; Section headings are enclosed by square brackets.
; Comments have leading semicolons (;) or pound signs (#).
; Option and values are separated by an equal sign.
; Only arguments marked as '!!Mandatory!!' must be specified explicitly.
; All commented parameters are set to their default value.
; ------------------------------------------- ;
;;; BASIC PARAMETERS ;;;
; ------------------------------------------- ;
[Mapping]
; There must be one [Mapping] section for all of the nodes in your database cluster.
; !!Mandatory!! Backup host name (no default) and Backup directory (no default)
; node_name = backup_host:backup_dir
; [] indicates backup to localhost
v_exampledb_node0001 = []:/home/dbadmin/backups
v_exampledb_node0002 = []:/home/dbadmin/backups
v_exampledb_node0003 = []:/home/dbadmin/backups
v_exampledb_node0004 = []:/home/dbadmin/backups
[Misc]
; !!Recommended!! Snapshot name. Object and full backups should always have different snapshot names.
; Backups with the same snapshotName form a time sequence limited by restorePointLimit.
; SnapshotName is used for naming archives in the backup directory, and for monitoring and troubleshooting.
; Valid values: a-z A-Z 0-9 - _
; snapshotName = backup_snapshot
; Specifies how Vertica handles objects of the same name when restoring schema or table backups.
; objectRestoreMode = createOrReplace
; Specifies which tables and/or schemas to copy. For tables, the containing schema defaults to public.
; Note: 'objects' is incompatible with 'includeObjects' and 'excludeObjects'.
; (no default)
objects = mytable, myschema, myothertable
; Specifies the set of objects to backup/restore; wildcards may be used.
; Note: 'includeObjects' is incompatible with 'objects'.
; includeObjects = public.mytable, customer*, s?
; Subtracts from the set of objects to backup/restore; wildcards may be used
; Note: 'excludeObjects' is incompatible with 'objects'.
; excludeObjects = public.*temp, etl.phase?
[Database]
; !!Recommended!! If you have more than one database defined on this Vertica cluster, use this parameter to specify which database to backup/restore.
; dbName = current_database
; If this parameter is True, vbr will prompt user for database password every time.
; If set to False, specify location of password config file in 'passwordFile' parameter in [Misc] section.
; dbPromptForPassword = True
; ------------------------------------------- ;
;;; ADVANCED PARAMETERS ;;;
; ------------------------------------------- ;
[Misc]
; The temp directory location on all database hosts.
; The directory must be readable and writeable by the dbadmin, and must implement POSIX style fcntl lockf locking.
; tempDir = /tmp/vbr
; Specifies the number of historical backups to retain in addition to the most recent backup.
; 1 current + n historical backups
; restorePointLimit = 1
; Full path to the password configuration file
; Store this file in directory readable only by the dbadmin.
; (no default)
; passwordFile = /path/to/vbr/pw.txt
; When enabled, Vertica confirms that the specified backup locations contain
; sufficient free space and inodes to allow a successful backup. If a backup
; location has insufficient resources, Vertica displays an error message explaining the shortage and
; cancels the backup. If Vertica cannot determine the amount of available space
; or number of inodes in the backupDir, it displays a warning and continues
; with the backup.
; enableFreeSpaceCheck = True
[Transmission]
; The total bandwidth limit for all restore connections in KBPS, 0 for unlimited
; total_bwlimit_restore = 0
; The maximum number of restore TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_restore = 1
; Total bandwidth limit for all backup connections in KBPS, 0 for unlimited. Vertica distributes
; this bandwidth evenly among the number of connections set in concurrency_backup.
; total_bwlimit_backup = 0
; The maximum number of backup TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_backup = 1
; The maximum number of delete TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_delete = 16
[Database]
; Vertica user name for vbr to connect to the database.
; This setting is rarely needed since dbUser is normally identical to the database administrator.
; dbUser = current_username
2.6 - Restore object from backup to an alternate cluster
object_restore_to_other_cluster.ini
; This sample vbr configuration file shows object restore to another cluster from an existing full or object backup.
; To restore objects from an existing backup(object or full), you must use the "--restore-objects" vbr command line option.
; Section headings are enclosed by square brackets.
; Comments have leading semicolons (;) or pound signs (#).
; An equal sign separates options and values.
; Specify arguments marked '!!Mandatory!!' explicitly.
; All commented parameters are set to their default value.
; ------------------------------------------- ;
;;; BASIC PARAMETERS ;;;
; ------------------------------------------- ;
[Mapping]
; There must be one [Mapping] section for all of the nodes in your database cluster.
; !!Mandatory!! Backup host name (no default) and Backup directory (no default)
; node_name = backup_host:backup_dir
v_exampledb_node0001 = backup_host0001:/home/dbadmin/backups
v_exampledb_node0002 = backup_host0002:/home/dbadmin/backups
v_exampledb_node0003 = backup_host0003:/home/dbadmin/backups
v_exampledb_node0004 = backup_host0004:/home/dbadmin/backups
[NodeMapping]
; !!Recommended!! This section is required when performing an object restore from a full/object backup to a different cluster and node names are different between source (backup) and destination (restoring) databases.
v_sourcedb_node0001 = v_exampledb_node0001
v_sourcedb_node0002 = v_exampledb_node0002
v_sourcedb_node0003 = v_exampledb_node0003
v_sourcedb_node0004 = v_exampledb_node0004
[Database]
; !!Recommended!! If you have more than one database defined on this Vertica cluster, use this parameter to specify which database to backup/restore.
; dbName = current_database
; If this parameter is True, vbr prompts the user for database password every time.
; If False, specify location of password config file in 'passwordFile' parameter in [Misc] section.
; dbPromptForPassword = True
; ------------------------------------------- ;
;;; ADVANCED PARAMETERS ;;;
; ------------------------------------------- ;
[Misc]
; !!Recommended!! Snapshot name.
; SnapshotName is useful for monitoring and troubleshooting.
; Valid characters: a-z A-Z 0-9 - _
; snapshotName = backup_snapshot
; Specifies how Vertica handles objects of the same name when restoring schema or table backups. Options are coexist, createOrReplace or create.
; objectRestoreMode = createOrReplace
; The temp directory location on all database hosts.
; The directory must be readable and writeable by the dbadmin, and must implement POSIX style fcntl lockf locking.
; tempDir = /tmp/vbr
; Full path to the password configuration file.
; Store this file in a directory only readable by the dbadmin.
; (no default)
; passwordFile = /path/to/vbr/pw.txt
; When enabled, Vertica confirms that the specified backup locations contain
; sufficient free space and inodes to allow a successful backup. If a backup
; location has insufficient resources, Vertica displays an error message and
; cancels the backup. If Vertica cannot determine the amount of available space
; or number of inodes in the backupDir, it displays a warning and continues
; with the backup.
; enableFreeSpaceCheck = True
[Transmission]
; Sets options for transmitting the data when using backup hosts.
; Specifies the default port number for the rsync protocol.
; port_rsync = 50000
; The total bandwidth limit for all restore connections in KBPS, 0 for unlimited
; total_bwlimit_restore = 0
; The maximum number of backup TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_restore = 1
[Database]
; Vertica user name for vbr to connect to the database.
; This setting is rarely needed since dbUser is normally identical to the database administrator.
; dbUser = current_username
2.7 - Object replication to an alternate database
replicate.ini
; This sample vbr configuration file shows the replicate vbr task.
; Section headings are enclosed by square brackets.
; Comments have leading semicolons (;) or pound signs (#).
; An equal sign separates options and values.
; Specify arguments marked '!!Mandatory!!' explicitly.
; All commented parameters are set to their default value.
; ------------------------------------------- ;
;;; BASIC PARAMETERS ;;;
; ------------------------------------------- ;
[Mapping]
; There must be one [Mapping] section for all of the nodes in your database cluster.
; !!Mandatory!! Target host name (no default)
; node_name = new_host
v_exampledb_node0001 = destination_host0001
v_exampledb_node0002 = destination_host0002
v_exampledb_node0003 = destination_host0003
v_exampledb_node0004 = destination_host0004
[Misc]
; !!Recommended!! Snapshot name.
; SnapshotName is useful for monitoring and troubleshooting.
; Valid characters: a-z A-Z 0-9 - _
; snapshotName = backup_snapshot
; Specifies which tables and/or schemas to copy. For tables, the containing schema defaults to public.
; objects for replication. You must specify only one of either objects or includeObjects.
; Use comma-separated list for multiple objects
; (no default)
objects = mytable, myschema, myothertable
; Specifies the set of objects to replicate; wildcards may be used.
; Note: 'includeObjects' is incompatible with 'objects'.
; includeObjects = public.mytable, customer*, s?
; Subtracts from the set of objects to replicate; wildcards may be used
; Note: 'excludeObjects' is incompatible with 'objects'.
; excludeObjects = public.*temp, etl.phase?
; Specifies how Vertica handles objects of the same name when copying schema or tables.
; objectRestoreMode = createOrReplace
[Database]
; !!Recommended!! If you have more than one database defined on this Vertica cluster, use this parameter to specify which database to replicate.
; dbName = current_database
; If this parameter is True, vbr prompts the user for the database password every time.
; If False, specify the location of password config file in 'passwordFile' parameter in [Misc] section.
; dbPromptForPassword = True
; !!Mandatory!! These settings are all mandatory for replication. None of which have defaults.
dest_dbName = target_db
dest_dbUser = dbadmin
dest_dbPromptForPassword = True
; ------------------------------------------- ;
;;; ADVANCED PARAMETERS ;;;
; ------------------------------------------- ;
[Misc]
; The temp directory location on all database hosts.
; The directory must be readable and writeable by the dbadmin, and must implement POSIX style fcntl lockf locking.
; tempDir = /tmp/vbr
; Full path to the password configuration file containing database password credentials
; Store this file in directory readable only by the dbadmin.
; (no default)
; passwordFile = /path/to/vbr/pw.txt
; Specifies the service name of the Vertica Kerberos principal. This only applies to HDFS.
; kerberos_service_name = vertica
; Specifies the realm (authentication domain) of the Vertica Kerberos principal. This only applies to HDFS.
; kerberos_realm = your_auth_domain
; Specifies the location of the keytab file which contains the credentials for the Vertica Kerberos principal. This only applies to HDFS.
; kerberos_keytab_file = /path/to/keytab_file
; Specifies the location of the Hadoop XML configuration files of the HDFS clusters. Only set this when your cluster is on HA. This only applies to HDFS.
; If you have multiple conf directories, please separate them with ':'.
; hadoop_conf_dir = /path/to/conf or /path/to/conf1:/path/to/conf2
[Transmission]
; Specifies the default port number for the rsync protocol.
; port_rsync = 50000
; Total bandwidth limit for all backup connections in KBPS, 0 for unlimited. Vertica distributes
; this bandwidth evenly among the number of connections set in concurrency_backup.
; total_bwlimit_backup = 0
; The maximum number of replication TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_backup = 1
; The maximum number of restore TCP rsync connection threads per node.
; Results vary depending on environment, but values between 2 and 16 are sometimes quite helpful.
; concurrency_restore = 1
; The maximum number of delete TCP rsync connection threads per node.
; Results vary depending on environment, but values between 2 and 16 are sometimes quite helpful.
; concurrency_delete = 16
[Database]
; Vertica user name for vbr to connect to the database.
; This is very rarely be needed since dbUser is normally identical to the database administrator.
; dbUser = current_username
2.8 - Database copy to an alternate cluster
copycluster.ini
; This sample vbr configuration file is configured for the copycluster vbr task.
; Copycluster supports full database copies only, not specific objects.
; Section headings are enclosed by square brackets.
; Comments have leading semicolons (;) or pound signs (#).
; An equal sign separates options and values.
; Specify arguments marked '!!Mandatory!!' explicitly.
; All commented parameters are set to their default value.
; ------------------------------------------- ;
;;; BASIC PARAMETERS ;;;
; ------------------------------------------- ;
[Mapping]
; For each node of the source database, there must be a [Mapping] entry specifying the corresponding hostname of the destination database node.
; !!Mandatory!! node_name = new_host/ip (no defaults)
v_exampledb_node0001 = destination_host1.example
v_exampledb_node0002 = destination_host2.example
v_exampledb_node0003 = destination_host3.example
v_exampledb_node0004 = destination_host4.example
; v_exampledb_node0001 = 10.0.90.17
; v_exampledb_node0002 = 10.0.90.18
; v_exampledb_node0003 = 10.0.90.19
; v_exampledb_node0004 = 10.0.90.20
[Database]
; !!Recommended!! If you have more than one database defined on this Vertica cluster, use this parameter to specify which database to copy.
; dbName = current_database
; If this parameter is True, vbr prompts the user for the database password every time.
; If False, specify the location of password config file in 'passwordFile' parameter in [Misc] section.
; dbPromptForPassword = True
; ------------------------------------------- ;
;;; ADVANCED PARAMETERS ;;;
; ------------------------------------------- ;
[Misc]
; !!Recommended!! Snapshot name.
; SnapshotName is used for monitoring and troubleshooting.
; Valid characters: a-z A-Z 0-9 - _
; snapshotName = backup_snapshot
; The temp directory location on all database hosts.
; The directory must be readable and writeable by the dbadmin, and must implement POSIX style fcntl lockf locking.
; tempDir = /tmp/vbr
; Full path to the password configuration file containing database password credentials
; Store this file in directory readable only by the dbadmin.
; (no default)
; passwordFile = /path/to/vbr/pw.txt
[Transmission]
; Specifies the default port number for the rsync protocol.
; port_rsync = 50000
; Total bandwidth limit for all copycluster connections in KBPS, 0 for unlimited. Vertica distributes
; this bandwidth evenly among the number of connections set in concurrency_backup.
; total_bwlimit_backup = 0
; The maximum number of backup TCP rsync connection threads per node.
; Optimum settings depend on your particular environment.
; For best performance, experiment with values between 2 and 16.
; concurrency_backup = 1
; The maximum number of restore TCP rsync connection threads per node.
; Results vary depending on environment, but values between 2 and 16 are sometimes quite helpful.
; concurrency_restore = 1
; The maximum number of delete TCP rsync connection threads per node.
; Results vary depending on environment, but values between 2 and 16 are sometimes quite helpful.
; concurrency_delete = 16
[Database]
; Vertica user name for vbr to connect to the database.
; This setting is rarely needed since dbUser is normally identical to the database administrator
; dbUser = current_username
2.9 - Password file
Unlike other configuration (.ini) files, the password configuration file must be referenced by another configuration file, through its passwordFile parameter.
password.ini
Unlike other configuration (.ini
) files, the password configuration file must be referenced by another configuration file, through its passwordFile parameter.
; This is a sample password configuration file.
; Point to this file in the 'passwordFile' parameter of the [Misc] section.
; Section headings are enclosed by square brackets.
; Comments have leading semicolons (;) or pound signs (#).
; Option and values are separated by an equal sign.
[Passwords]
; The database administrator's password, and used if dbPromptForPassword is False.
; dbPassword=myDBsecret
; The password for the rsync user account.
; serviceAccessPass=myrsyncpw
; The password for the dest_dbuser Vertica account, for replication tasks only.
; dest_dbPassword=destDBsecret
3 - Eon Mode database requirements
Eon Mode databases perform the same backup and restore operations as Enterprise Mode databases.
Eon Mode databases perform the same backup and restore operations as Enterprise Mode databases. Additional requirements pertain to Eon Mode because it uses a different architecture.
Eon Mode databases also support saving in-db restore points, which are copy-free backups that enable you to roll back a database to a previous state. Unlike vbr
-based backups, restore points are stored in-database and do not require additional data copies to be stored externally. However, because restore points are in-database, they are lost if the database's communal storage is compromised. For more information about restore points, see Revive an Eon DB.
Cloud storage requirements
Eon Mode databases must be backed up to supported cloud storage locations. The following [CloudStorage] configuration parameters must be set:
A backup path is valid for one database only. You cannot use the same path to store backups for multiple databases.
Eon Mode databases that use S3-compatible on-premises cloud storage can back up to Amazon Web Services (AWS) S3.
Cloud storage access
In addition to having access to the cloud storage bucket used for the database's communal storage, you must have access to the cloud storage backup location. Verify that the credential you use to access communal storage also has access to the backup location. For more information about configuring cloud storage access for Vertica, see Configuring cloud storage backups.
Note
While an AWS backup location can be in a different region, backup and restore operations across different S3 regions are incompatible with virtual private cloud (VPC) endpoints.
Eon on-premises and private cloud storage
If an Eon database runs on-premises, then communal storage is not on AWS but on another storage platform that uses the S3 or GS protocol. This means there can be two endpoints and two sets of credentials, depending on where you back up. This additional information is stored in environment variables, and not in vbr
configuration parameters.
Backups of Eon Mode on-premises databases do not support AWS IAM profiles.
HDFS on-premises storage
To back up an Eon Mode database that uses HDFS on-premises storage, the communal storage and backup location must use the same HDFS credentials and domain. All vbr
operations are supported, except copycluster
.
Vertica supports Kerberos authentication, High Availability Name Node, and wire encryption for vbr
operations. Vertica does not support at-rest encryption for Hadoop storage.
For details, see Configuring backups to and from HDFS.
Database restore requirements
When restoring a backup of an Eon Mode database, the target database must satisfy the following requirements:
- Share the same name as the source database.
- Have at least as many nodes as the primary subcluster(s) in the source database.
- Have the same node names as the nodes of the source database.
- Use the same catalog directory location as the source database.
- Use the same port numbers as the source database.
- For object-level restore, if you restore to an existing target namespace, the target namespace and the objects' source namespace must have the same shard count, shard boundaries, and node subscriptions. For details, see object-level tasks with multiple namespaces.
You can restore a full or object backup that was taken from a database with primary and secondary subclusters to the primary subclusters in the target database. The database can have only primary subclusters, or it can also have any number of secondary subclusters. Secondary subclusters do not need to match the backup database. The same is true for replicating a database; only the primary subclusters are required. The requirements are similar to those for Revive with commuanal storage.
Use the [Mapping]
section in the configuration file to specify the mappings for the primary subcluster.
Object-level tasks with multiple namespaces
Eon Mode databases group schemas and tables into one or more namespaces. By default, Eon databases contain only one namespace, default_namespace
, which is created during database creation. Unless you have created additional namespaces, the default_namespace
contains all schemas and tables. If you do not specify the namespace of an object, vbr
assumes the object belongs to the default_namespace
. Full database vbr
tasks are unaffected by the number of namespaces.
Important
For vbr
tasks, namespaces are prefixed with a period. For example, .n.s.t
refers to table t
in schema s
in namespace n
.
For object-level backups, you can specify the included objects in the objects
parameter of your vbr
configuration file. For example, to create an object-level backup of all objects in the orders
and customers
schemas in the store_1
namespace, add the following lines to your configuration file:
objects = .store_1.orders*, .store_1.customers.*
Alternatively, you can specify the included and excluded objects using the includeObjects
and excludeObjects
parameters. If you set these parameters, the objects
parameter must be empty.
For object-level restore and replicate vbr
tasks, you can use the --target-namespace
argument to specify the namespace to which the objects are restored or replicated.
vbr
behaves differently depending on whether the target namespace exists:
- Exists:
vbr
attempts to restore or replicate the objects to the existing namespace, which must have the same shard count, shard boundaries, and node subscriptions as the source namespace. If these conditions are not met, the vbr
task fails.
- Nonexistent:
vbr
creates a namespace in the target database with the name specified in --target-namespace
and the shard count of the source namespace, and then replicates or restores the objects to that namespace.
If no target namespace is specified, vbr
attempts to restore or replicate objects to a namespace with the same name as the source namespace.
You can specify how restore operations handle duplicate objects with objectRestoreMode parameter in the vbr
configuration file.
The following command restores the store_1.orders
schema of the source database to the store_2
namespace in the target database:
$ vbr --task restore --config-file=db.ini --restore-objects=.store_1.orders.* --target-namespace=store_2
If no target namespace is specified, vbr
attempts to restore the objects to a namespace with the same name as the source namespace. For example, you can omit the --target-namespace=store_1
argument when restoring the store_1.orders
schema to the store_1
namespace:
$ vbr --task restore --config-file=db.ini --restore-objects=.store_1.orders.*
Restoring a database with multiple communal storage locations
You can back up and restore Eon Mode databases that have multiple communal storage locations. Both object-level and full database restore operations are supported:
-
Full database restore: the result of the restore operation depends on whether you are restoring to the same communal storage locations from which you performed the backup:
-
Same communal storage locations: vbr
attempts to copy all data to the communal storage locations from which they were backed up. If a storage location has been dropped since the backup was taken, the restore operation attempts to reinstate the dropped location before restoring the data. If the dropped storage location cannot be reinstated, its associated data is copied to the main communal storage location.
-
Different communal storage location: all data is copied to the communal storage location specified in the vbr
configuration file. Regardless of how many communal storage locations existed before the restore, there will be only one communal storage location after the full restore.
-
Object restore: the location to which an object is restored depends on whether it has an existing storage policy in the target database:
-
Storage policy: vbr
restores the object to the communal storage location specified by the object's highest priority storage policy, which is determined by the following hierarchy, listed from highest priority to lowest:
- Table-level policy
- Schema-level policy
- Database-level policy
When the communal storage location specified by the highest priority policy does not exist,
vbr
attempts to execute the policy with the next highest priority. If none of the policies are valid, the object is restored to the main communal storage location.
-
No storage policy: the object is copied to the main communal storage location.
For details on creating and configuring storage policies for multiple communal storage locations, see Configuring your Vertica cluster for Eon Mode.
4 - Requirements for backing up and restoring HDFS storage locations
There are several considerations for backing up and restoring HDFS storage locations:.
There are several considerations for backing up and restoring HDFS storage locations:
-
The HDFS directory for the storage location must have snapshotting enabled. You can either directly configure this yourself or enable the database administrator’s Hadoop account to do it for you automatically. See Hadoop configuration for backup and restore for more information.
-
If the Hadoop cluster uses Kerberos, Vertica nodes must have access to certain Hadoop configuration files. See Configuring Kerberos below.
-
To restore an HDFS storage location, your Vertica cluster must be able to run the Hadoop distcp
command. See Configuring distcp on a Vertica Cluster below.
-
HDFS storage locations do not support object-level backups. You must perform a full database backup to back up the data in your HDFS storage locations.
-
Data in an HDFS storage location is backed up to HDFS. This backup guards against accidental deletion or corruption of data. It does not prevent data loss in the case of a catastrophic failure of the entire Hadoop cluster. To prevent data loss, you must have a backup and disaster recovery plan for your Hadoop cluster.
Data stored on the Linux native file system is still backed up to the location you specify in the backup configuration file. It and the data in HDFS storage locations are handled separately by the vbr
backup script.
Configuring Kerberos
If HDFS uses Kerberos, then to back up your HDFS storage locations you must take the following additional steps:
-
Grant Hadoop superuser privileges to the Kerberos principals for each Vertica node.
-
Copy Hadoop configuration files to your database nodes as explained in Accessing Hadoop Configuration Files. Vertica needs access to core-site.xml
, hdfs-site.xml
, and yarn-site.xml
for backup and restore. If your Vertica nodes are co-located on HDFS nodes, these files are already present.
-
Set the HadoopConfDir parameter to the location of the directory containing these files. The value can be a path, if the files are in multiple directories. For example:
=> ALTER DATABASE exampledb SET HadoopConfDir = '/etc/hadoop/conf:/etc/hadoop/test';
All three configuration files must be present on this path on every database node.
If your Vertica nodes are co-located on HDFS nodes and you are using Kerberos, you must also change some Hadoop configuration parameters. These changes are needed in order for restoring from backups to work. In yarn-site.xml
on every Vertica node, set the following parameters:
Parameter |
Value |
yarn.resourcemanager.proxy-user-privileges.enabled |
true |
yarn.resourcemanager.proxyusers.*.groups |
|
yarn.resourcemanager.proxyusers.*.hosts |
|
yarn.resourcemanager.proxyusers.*.users |
|
yarn.timeline-service.http-authentication.proxyusers.*.groups |
|
yarn.timeline-service.http-authentication.proxyusers.*.hosts |
|
yarn.timeline-service.http-authentication.proxyusers.*.users |
|
No changes are needed on HDFS nodes that are not also Vertica nodes.
Configuring distcp on a Vertica cluster
Your Vertica cluster must be able to run the Hadoop distcp
command to restore a backup of an HDFS storage location. The easiest way to enable your cluster to run this command is to install several Hadoop packages on each node. These packages must be from the same distribution and version of Hadoop that is running on your Hadoop cluster.
The steps you need to take depend on:
Note
Installing the Hadoop packages necessary to run distcp
does not turn your Vertica database into a Hadoop cluster. This process installs just enough of the Hadoop support files on your cluster to run the distcp
command. There is no additional overhead placed on the Vertica cluster, aside from a small amount of additional disk space consumed by the Hadoop support files.
Configuration overview
The steps for configuring your Vertica cluster to restore backups for HDFS storage location are:
-
If necessary, install and configure a Java runtime on the hosts in the Vertica cluster.
-
Find the location of your Hadoop distribution's package repository.
-
Add the Hadoop distribution's package repository to the Linux package manager on all hosts in your cluster.
-
Install the necessary Hadoop packages on your Vertica hosts.
-
Set two configuration parameters in your Vertica database related to Java and Hadoop.
-
Confirm that the Hadoop distcp
command runs on your Vertica hosts.
The following sections describe these steps in greater detail.
Installing a Java runtime
Your Vertica cluster must have a Java Virtual Machine (JVM) installed to run the Hadoop distcp
command. It already has a JVM installed if you have configured it to:
If your Vertica database has a JVM installed, verify that your Hadoop distribution supports it. See your Hadoop distribution's documentation to determine which JVMs it supports.
If the JVM installed on your Vertica cluster is not supported by your Hadoop distribution you must uninstall it. Then you must install a JVM that is supported by both Vertica and your Hadoop distribution. See Vertica SDKs for a list of the JVMs compatible with Vertica.
If your Vertica cluster does not have a JVM (or its existing JVM is incompatible with your Hadoop distribution), follow the instructions in Installing the Java runtime on your Vertica cluster.
Finding your Hadoop distribution's package repository
Many Hadoop distributions have their own installation system, such as Cloudera Manager or Ambari. However, they also support manual installation using native Linux packages such as RPM and .deb
files. These package files are maintained in a repository. You can configure your Vertica hosts to access this repository to download and install Hadoop packages.
Consult your Hadoop distribution's documentation to find the location of its Linux package repository. This information is often located in the portion of the documentation covering manual installation techniques.
Each Hadoop distribution maintains separate repositories for each of the major Linux package management systems. Find the specific repository for the Linux distribution running your Vertica cluster. Be sure that the package repository that you select matches the version used by your Hadoop cluster.
Configuring Vertica nodes to access the Hadoop Distribution’s package repository
Configure the nodes in your Vertica cluster so they can access your Hadoop distribution's package repository. Your Hadoop distribution's documentation should explain how to add the repositories to your Linux platform. If the documentation does not explain how to add the repository to your packaging system, refer to your Linux distribution's documentation.
The steps you need to take depend on the package management system your Linux platform uses. Usually, the process involves:
-
Downloading a configuration file.
-
Adding the configuration file to the package management system's configuration directory.
-
For Debian-based Linux distributions, adding the Hadoop repository encryption key to the root account keyring.
-
Updating the package management system's index to have it discover new packages.
You must add the Hadoop repository to all hosts in your Vertica cluster.
Installing the required Hadoop packages
After configuring the repository, you are ready to install the Hadoop packages. The packages you need to install are:
-
hadoop
-
hadoop-hdfs
-
hadoop-client
The names of the packages are usually the same across all Hadoop and Linux distributions. These packages often have additional dependencies. Always accept any additional packages that the Linux package manager asks to install.
To install these packages, use the package manager command for your Linux distribution. The package manager command you need to use depends on your Linux distribution:
-
On Red Hat and CentOS, the package manager command is yum
.
-
On Debian and Ubuntu, the package manager command is apt-get
.
-
On SUSE the package manager command is zypper
.
Consult your Linux distribution's documentation for instructions on installing packages.
Setting configuration parameters
You must set two Hadoop configuration parameters to enable Vertica to restore HDFS data:
-
JavaBinaryForUDx is the path to the Java executable. You may have already set this value to use Java UDxs or the HCatalog Connector. You can find the path for the default Java executable from the Bash command shell using the command:
$ which java
-
HadoopHome is the directory that contains bin/hadoop
(the bin directory containing the Hadoop executable file). The default value for this parameter is /usr
. The default value is correct if your Hadoop executable is located at /usr/bin/hadoop
.
The following example shows how to set and then review the values of these parameters:
=> ALTER DATABASE DEFAULT SET PARAMETER JavaBinaryForUDx = '/usr/bin/java';
=> SELECT current_value FROM configuration_parameters WHERE parameter_name = 'JavaBinaryForUDx';
current_value
---------------
/usr/bin/java
(1 row)
=> ALTER DATABASE DEFAULT SET HadoopHome = '/usr';
=> SELECT current_value FROM configuration_parameters WHERE parameter_name = 'HadoopHome';
current_value
---------------
/usr
(1 row)
You can also set the following parameters:
-
HadoopFSReadRetryTimeout and HadoopFSWriteRetryTimeout specify how long to wait before failing. The default value for each is 180 seconds. If you are confident that your file system will fail more quickly, you can improve performance by lowering these values.
-
HadoopFSReplication specifies the number of replicas HDFS makes. By default, the Hadoop client chooses this; Vertica uses the same value for all nodes.
Caution
Do not change this setting unless directed otherwise by Vertica support.
-
HadoopFSBlockSizeBytes is the block size to write to HDFS; larger files are divided into blocks of this size. The default is 64MB.
Confirming that distcp runs
After the packages are installed on all hosts in your cluster, your database should be able to run the Hadoop distcp
command. To test it:
-
Log into any host in your cluster as the database superuser.
-
At the Bash shell, enter the command:
$ hadoop distcp
-
The command should print a message similar to the following:
usage: distcp OPTIONS [source_path...] <target_path>
OPTIONS
-async Should distcp execution be blocking
-atomic Commit all changes or none
-bandwidth <arg> Specify bandwidth per map in MB
-delete Delete from target, files missing in source
-f <arg> List of files that need to be copied
-filelimit <arg> (Deprecated!) Limit number of files copied to <= n
-i Ignore failures during copy
-log <arg> Folder on DFS where distcp execution logs are
saved
-m <arg> Max number of concurrent maps to use for copy
-mapredSslConf <arg> Configuration for ssl config file, to use with
hftps://
-overwrite Choose to overwrite target files unconditionally,
even if they exist.
-p <arg> preserve status (rbugpc)(replication, block-size,
user, group, permission, checksum-type)
-sizelimit <arg> (Deprecated!) Limit number of files copied to <= n
bytes
-skipcrccheck Whether to skip CRC checks between source and
target paths.
-strategy <arg> Copy strategy to use. Default is dividing work
based on file sizes
-tmp <arg> Intermediate work path to be used for atomic
commit
-update Update target, copying only missingfiles or
directories
-
Repeat these steps on the other hosts in your database to verify that all of the hosts can run distcp
.
Troubleshooting
If you cannot run the distcp
command, try the following steps:
-
If Bash cannot find the hadoop
command, you may need to manually add Hadoop's bin
directory to the system search path. An alternative is to create a symbolic link in an existing directory in the search path (such as /usr/bin
) to the hadoop
binary.
-
Ensure the version of Java installed on your Vertica cluster is compatible with your Hadoop distribution.
-
Review the Linux package installation tool's logs for errors. In some cases, packages may not be fully installed, or may not have been downloaded due to network issues.
-
Ensure that the database administrator account has permission to execute the hadoop
command. You might need to add the account to a specific group in order to allow it to run the necessary commands.
5 - Setting up backup locations
Full and object-level backups reside on backup hosts, the computer systems on which backups and archives are stored.
Important
Inadequate security on backups can compromise overall database security. Be sure to secure backup locations and strictly limit access to backups only to users who already have permissions to access all database data.
Full and object-level backups reside on backup hosts, the computer systems on which backups and archives are stored. On the backup hosts, Vertica saves backups in a specific backup location (directory).
You must set up your backup hosts before you can create backups.
The storage format type at your backup locations must support fcntl lockf (POSIX) file locking.
5.1 - Configuring backup hosts and connections
You use vbr to back up your database to one or more hosts (known as backup hosts) that can be outside of your database cluster.
You use vbr
to back up your database to one or more hosts (known as backup hosts) that can be outside of your database cluster.
You can use one or more backup hosts or a single cloud storage bucket to back up your database. Use the vbr
configuration file to specify which backup host each node in your cluster should use.
Before you back up to hosts outside of the local cluster, configure the target backup locations to work with vbr
. The backup hosts you use must:
-
Have sufficient backup disk space.
-
Be accessible from your database cluster through SSH.
-
Have passwordless SSH access for the Database Administrator account.
-
Have either the Vertica rpm or Python 3.7 and rsync 3.0.5 or later installed.
-
If you are using a stateful firewall, configure your tcp_keepalive_time
and tcp_keepalive_intvl sysctl
settings to use values less than your firewall timeout value.
Configuring TCP forwarding on database hosts
vbr
depends on TCP forwarding to forward connections from database hosts to backup hosts. For copycluster and replication tasks, you must enable TCP forwarding on both sets of hosts. SSH connections to backup hosts do not require SSH forwarding.
If it is not already set by default, set AllowTcpForwarding = Yes
in /etc/ssh/sshd_config and then send a SIGHUP signal to sshd on each host. See the Linux sshd documentation for more information.
If TCP forwarding is not enabled, tasks requiring it fail with the following message: "Errors connecting to remote hosts: Check SSH settings, and that the same Vertica version is installed on all nodes."
On a single-node cluster, vbr
uses a random high-number port to create a local ssh tunnel. This fails if PermitOpen
is set to restrict the port. Comment out the PermitOpen
line in sshd_config.
Creating configuration files for backup hosts
Create separate configuration files for full or object-level backups, using distinct names for each configuration file. Also, use the same node, backup host, and directory location pairs. Specify different backup directory locations for each database.
Note
For optimal network performance when creating a backup, Vertica recommends that you give each node in the cluster its own dedicated backup host.
Preparing backup host directories
Before vbr
can back up a database, you must prepare the target backup directory. Run vbr
with a task type of init
to create the necessary manifests for the backup process. You need to perform the init process only once. After that, Vertica maintains the manifests automatically.
Estimating backup host disk requirements
Wherever you plan to save data backups, consider the disk requirements for historical backups at your site. Also, if you use more than one archive, multiple archives potentially require more disk space. Vertica recommends that each backup host have space for at least twice the database node footprint size. Follow this recommendation regardless of the specifics of your site's backup schedule and retention requirements.
To estimate the database size, use the used_bytes
column of the storage_containers
system table as in the following example:
=> SELECT SUM(used_bytes) FROM storage_containers WHERE node_name='v_mydb_node0001';
total_size
------------
302135743
(1 row)
Making backup hosts accessible
You must verify that any firewalls between the source database nodes and the target backup hosts allow connections for SSH and rsync on port 50000.
The backup hosts must be running identical versions of rsync and Python as those supplied in the Vertica installation package.
Setting up passwordless SSH access
For vbr
to access a backup host, the database superuser must meet two requirements:
-
Have an account on each backup host, with write permissions to the backup directory.
-
Have passwordless SSH access from each database cluster host to the corresponding backup host.
How you fulfill these requirements depends on your platform and infrastructure.
SSH access among the backup hosts and access from the backup host to the database node is not necessary.
If your site does not use a centralized login system (such as LDAP), you can usually add a user with the useradd
command or through a GUI administration tool. See the documentation for your Linux distribution for details.
If your platform supports it, you can enable passwordless SSH logins using the ssh-copy-id
command to copy a database administrator's SSH identity file to the backup location from one of your database nodes. For example, to copy the SSH identity file from a node to a backup host named backup01
:
$ ssh-copy-id -i dbadmin@backup01|
Password:
Try logging into the machine with "ssh dbadmin@backup01"
. Then, check the contents of the ~/.ssh/authorized_keysfile
to verify that you have not added extra keys that you did not intend to include.
$ ssh backup01
Last login: Mon May 23 11:44:23 2011 from host01
Repeat the steps to copy a database administrator's SSH identity to all backup hosts you use to back up your database.
After copying a database administrator's SSH identity, you should be able to log in to the backup host from any of the nodes in the cluster without being prompted for a password.
Increasing the SSH maximum connection settings for a backup host
If your configuration requires backing up multiple nodes to one backup host (n:1), increase the number of concurrent SSH connections to the SSH daemon (sshd
). By default, the number of concurrent SSH connections on each host is 10
, as set in the sshd_config
file with the MaxStartups
keyword. The MaxStartups
value for each backup host should be greater than the total number of hosts being backed up to this backup host. For more information on configuring MaxStartups
, refer to the man page for that parameter.
See also
5.2 - Configuring hard-link local backup hosts
When specifying the backupHost parameter for your hard-link local configuration files, use the database host names (or IP addresses) as known to admintools.
When specifying the backupHost
parameter for your hard-link local configuration files, use the database host names (or IP addresses) as known to admintools. Do not use the node names. Host names (or IP addresses) are what you used when setting up the cluster. Do not use localhost
for the backupHost
parameter.
Listing host names
To query node names and host names:
=> SELECT node_name, host_name FROM node_resources;
node_name | host_name
------------------+----------------
v_vmart_node0001 | 192.168.223.11
v_vmart_node0002 | 192.168.223.22
v_vmart_node0003 | 192.168.223.33
(3 rows)
Because you are creating a local backup, use square brackets [ ] to map the host to the local host. For more information, refer to [mapping].
[Mapping]
v_vmart_node0001 = []:/home/dbadmin/data/backups
v_vmart_node0002 = []:/home/dbadmin/data/backups
v_vmart_node0003 = []:/home/dbadmin/data/backups
5.3 - Configuring cloud storage backups
Backing up an Enterprise Mode or Eon Mode database to a supported cloud storage location requires that you add parameters to the backup configuration file.
Backing up an Enterprise Mode or Eon Mode database to a supported cloud storage location requires that you add parameters to the backup configuration file. You can create these backups from the local cluster or from your cloud provider's virtual servers. Additional cloud storage configuration is required to configure authentication and encryption.
Configuration file requirements
To back up any Eon Mode or Enterprise Mode cluster to a cloud storage destination, the backup configuration file must include a [CloudStorage] section. Vertica provides a sample cloud storage configuration file that you can copy and edit.
Environment variable requirements
Environment variables securely pass credentials for backup locations. Eon and Enterprise Mode databases require environment variables in the following backup scenarios:
-
Vertica on Google Cloud Platform (GCP) to Google Cloud Storage (GCS).
For backups to GCS, you must have a hash-based message authentication code (HMAC) key that contains an access ID and a secret. See Eon Mode on GCP prerequisites for instructions on how to create your HMAC key.
-
On-premises databases to any of the following storage locations:
-
Amazon Web Services (AWS)
-
Any S3-compatible storage
-
Azure Blob Storage (Enterprise Mode only)
On-premises database backups require you to pass your credentials with environment variables. You cannot use other methods of credentialing with cross-endpoint backups.
-
Any Azure user environment that does not manage resources with Azure managed identities.
The vbr
log captures when you sent an environment variable. For security purposes, the value that the environment variable represents is not logged. For details about checking vbr
logs, see Troubleshooting backup and restore.
Enterprise Mode and Eon Mode
All Enterprise Mode and Eon Mode databases require the following environment variables:
Environment Variable |
Description |
VBR_BACKUP_STORAGE_ACCESS_KEY_ID |
Credentials for the backup location. |
VBR_BACKUP_STORAGE_SECRET_ACCESS_KEY |
Credentials for the backup location. |
VBR_BACKUP_STORAGE_ENDPOINT_URL |
The endpoint for the on-premises S3 backup location, includes the scheme HTTP or HTTPS.
Important
Do not set this variable for backup locations on AWS or GCS.
|
Eon Mode only
Eon Mode databases require the following environment variables:
Environment Variable |
Description |
VBR_COMMUNAL_STORAGE_ACCESS_KEY_ID |
Credentials for the communal storage location. |
VBR_COMMUNAL_STORAGE_SECRET_ACCESS_KEY |
Credentials for the communal storage location. |
VBR_COMMUNAL_STORAGE_ENDPOINT_URL |
The endpoint for the communal storage, includes the scheme HTTP or HTTPS.
Important
Do not set this variable for backup locations on GCS.
|
Azure Blob Storage only
If the user environment does not manage resources with Azure-managed identities, you must provide credentials with environment variables. If you set environment variables in an environment that uses Azure-managed identities, credentials set with environment variables take precedence over Azure-managed identity credentials.
You can back up and restore between two separate Azure accounts. Cross-account operations require a credential configuration JSON object and an endpoint configuration JSON object for each account. Each environment variable accepts a collection of one or more comma-separated JSON objects.
Cross-account and cross-region backup and restore operations might result in decreased performance. For details about performance and cost, see the Azure documentation.
The Azure Blob Storage environment variables are described in the following table:
Environment Variable |
Description |
VbrCredentialConfig |
Credentials for the backup location. Each JSON object requires values for the following keys:
-
accountName : Name of the storage account.
-
blobEndpoint : Host address and optional port for the endpoint to use as the backup location.
-
accountKey : Access key for the account.
-
sharedAccessSignature : A token that provides access to the backup endpoint.
|
VbrEndpointConfig |
The endpoint for the backup location. To backup and restore between two separate Azure accounts, provide each set of endpoint information as a JSON object.
Each JSON object requires values for the following keys:
-
accountName : Name of the storage account.
-
blobEndpoint : Host address and optional port for the endpoint to use as the backup location.
-
protocol : HTTPS (default) or HTTP.
-
isMultiAccountEndpoint : Boolean (by default false), indicates whether blobEndpoint supports multiple accounts
|
The following commands export the Azure Blob Storage environment variables to the current shell session:
$ export VbrCredentialConfig=[{"accountName": "account1","blobEndpoint": "host[:port]","accountKey": "account-key1","sharedAccessSignature": "sas-token1"}]
$ export VbrEndpointConfig=[{"accountName": "account1", "blobEndpoint": "host[:port]", "protocol": "http"}]
5.4 - Additional considerations for cloud storage
If you are backing up to a supported cloud storage location, you need to do some additional one-time configuration.
If you are backing up to a supported cloud storage location, you need to do some additional one-time configuration. You must also take additional steps if the cluster you are backing up is running on instances in the cloud. For Amazon Web Services (AWS), you might choose to encrypt your backups, which requires additional steps.
By default, bucket access is restricted to the communal storage bucket. For one-time operations with other buckets like backing up and restoring the database, use the appropriate credentials. See Google Cloud Storage parameters and S3 parameters for additional information.
Configuring cloud storage for backups
As with any storage location, you must initialize a cloud storage location with the vbr
task init
.
Because cloud storage does not support file locking, Vertica uses either your local file system or the cloud storage file system to handle file locks during a backup. You identify this location by setting the
cloud_storage_backup_file_system_path
parameter in your vbr
configuration file. During a backup, Vertica creates a locked identity file on your local or cloud instance, and a duplicate file in your cloud storage backup location. If the files match, Vertica proceeds with the backup, releasing the lock when the backup is complete. As long as the files remain identical, you can use the cloud storage location for backup and restore tasks.
Reinitializing cloud backup storage
If the files in your locking location become out of sync with the files in your backup location, backup and restore tasks fail with an error message. You can resolve locking inconsistencies by rerunning the init
task qualified by --cloud-force-init
:
$ /opt/vertica/bin/vbr --task init --cloud-force-init -c filename.ini
Note
If a backup fails, confirm that your Vertica cluster has permission to access your cloud storage location.
Configuring authentication for Google Cloud Storage
If you are backing up to Google Cloud Storage (GCS) from a Google Cloud Platform-based cluster, you must provide authentication to the GCS communal storage location. Set the environment variables as detailed in Configuring cloud storage backups to authenticate to GCS storage.
See Eon Mode on GCP prerequisites for additional authentication information, including how to create your hash-based message authentication code (HMAC) key.
Configuring EC2 authentication for Amazon S3
If you are backing up to S3 from an EC2-based cluster, you must provide authentication to your S3 host. Regardless of the authentication type you choose, your credentials do not leave your EC2 cluster. Vertica supports the following authentication types:
-
AWS credential file
-
Environment variables
-
IAM role
AWS credential file - You can manually create a configuration file on your EC2 initiator host at ~/.aws/credentials.
[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY
For more information on credential files, refer to Amazon Web Services documentation.
Environment variables - Amazon Web Services provides the following environment variables:
-
AWS_ACCESS_KEY_ID
-
AWS_SECRET_ACCESS_KEY
Use these variables on your initiator to provide authentication to your S3 host. When your session ends, AWS deletes these variables. For more information, refer to the AWS documentation.
IAM role - Create an AWS IAM role and grant that role permission to access your EC2 cluster and S3 resources. This method is recommended for managing long-term access. For more information, refer to Amazon Web Services documentation.
Encrypting backups on Amazon S3
Backups made to Amazon S3 can be encrypted using native server-side S3 encryption capability. For more information on Amazon S3 encryption, refer to Amazon documentation.
Note
Vertica supports server-side encryption only. Client-side encryption is not supported.
Vertica supports the following forms of S3 encryption:
When you enable encryption of your backups, Vertica encrypts backups as it creates them. If you enable encryption after creating an initial backup, only increments added after you enabled encryption are encrypted. To ensure that your backup is entirely encrypted, create new backups after enabling encryption.
To enable encryption, add the following settings to your configuration file:
-
cloud_storage_encrypt_transport: Encrypts your backups during transmission. You must enable this parameter if you are using SSE-KMS encryption.
-
cloud_storage_encrypt_at_rest: Enables encryption of your backups. If you enable encryption and do not provide a KMS key, Vertica uses SSE-S3 encryption.
-
cloud_storage_sse_kms_key_id: If you are using KMS encryption, use this parameter to provide your key ID.
See [CloudStorage] for more information on these settings.
The following example shows a typical configuration for KMS encryption of backups.
[CloudStorage]
cloud_storage_encrypt_transport = True
cloud_storage_encrypt_at_rest = sse
cloud_storage_sse_kms_key_id = 6785f412-1234-4321-8888-6a774ba2aaaa
5.5 - Configuring backups to and from HDFS
To back up an Eon Mode database that uses HDFS on-premises storage, the communal storage and backup location must use the same HDFS credentials and domain.
Eon Mode only
To back up an Eon Mode database that uses HDFS on-premises storage, the communal storage and backup location must use the same HDFS credentials and domain. All vbr operations are supported, except copycluster.
Vertica supports Kerberos authentication, High Availability Name Node, and TLS (wire encryption) for vbr operations.
Creating a cloud storage configuration file
To back up Eon Mode on-premises with communal storage on HDFS, you must provide a backup configuration file. In the [CloudStorage] section, provide the cloud_storage_backup_path and cloud_storage_backup_file_system_path values.
If you use Kerberos authentication or High Availability NameNode with your Hadoop cluster, the vbr utility requires access to the same values set in the bootstrapping file that you created during the database install. Include these values in the [misc] section of the backup file.
The following table maps the vbr configuration option to its associated bootstrap file parameter:
vbr Configuration Option |
Bootstrap File Parameter |
kerberos_service_name |
KerberosServiceName |
kerberos_realm |
KerberosRealm |
kerberos_keytab_file |
KerberosKeytabFile |
hadoop_conf_dir |
HadoopConfDir |
For example, if KerberosServiceName is set to principal-name in the bootstrap file, set kerberos_service_name to principal-name in the [Misc] section of your configuration file.
Encryption between communal storage and backup locations
Vertica supports vbr operations using wire encryption between your communal storage and backup locations. Use the cloud_storage_encrypt_transport parameter in the [CloudStorage] section of your backup configuration file to configure encryption.
To enable encryption:
If you do not use encryption:
Vertica does not support at-rest encryption for Hadoop storage.
6 - Creating backups
You should perform full backups of your database regularly.
Important
Inadequate security on backups can compromise overall database security. Be sure to secure backup locations and strictly limit access to backups only to users who already have permissions to access all database data.
You should perform full backups of your database regularly. You should also perform a full backup under the following circumstances:
Before…
-
Upgrading Vertica to another release.
-
Dropping a partition.
-
Adding, removing, or replacing nodes in the database cluster.
After…
-
Loading a large volume of data.
-
Adding, removing, or replacing nodes in the database cluster.
-
Recovering a cluster from a crash.
If…
- The epoch of the latest backup predates the current ancient history mark.
Ideally, schedule ongoing backups to back up your data. You can run the Vertica vbr
from a cron
job or other task scheduler.
You can also back up selected objects. Use object backups to supplement full backups, not to replace them. Backup types are described in Types of backups.
Running vbr
does not affect active database applications. vbr
supports creating backups while concurrently running applications that execute DML statements, including COPY, INSERT, UPDATE, DELETE, and SELECT.
Backup locations and contents
Full and object-level backups reside on backup hosts, the computer systems on which backups and archives are stored.
Vertica saves backups in a specific backup location, the directory on a backup host. This location can contain multiple backups, both full and object-level, including associated archives. The backups are also compatible, allowing you to restore any objects from a full database backup. Backup locations for Eon Mode databases must be on S3.
Note
Vertica does not recommend concurrent backups. If you must run multiple backups concurrently, use separate backup and temp directories for each. Having separate backup directories detracts from the advantage of sharing data among historical backups.
Before beginning a backup, you must prepare your backup locations using the vbr init task, as in the following example:
$ vbr -t init -c full_backup.ini
For more information about backup locations, see Setting up backup locations.
Backups contain all committed data for the backed-up objects as of the start time of the backup. Backups do not contain uncommitted data or data committed during the backup. Backups do not delay mergeout or load activity.
Backing up HDFS storage locations
If your Vertica cluster uses HDFS storage locations, you must do some additional configuration before you can perform backups. See Requirements for backing up and restoring HDFS storage locations.
HDFS storage locations support only full backup and restore. You cannot perform object backup or restore on a cluster that uses HDFS storage locations.
Impact of backups on Vertica nodes
While a backup is taking place, the backup process can consume additional storage. The amount of space consumed depends on the size of your catalog and any objects that you drop during the backup. The backup process releases this storage when the backup is complete.
Best practices for creating backups
When creating backup configuration files:
-
Create separate configuration files to create full and object-level backups.
-
Use a unique snapshot name in each configuration file.
-
Use the same backup host directory location for both kinds of backups:
-
Because the backups share disk space, they are compatible when performing a restore.
-
Each cluster node must also use the same directory location on its designated backup host.
-
For best network performance, use one backup host per cluster node.
-
Use one directory on each backup node to store successive backups.
-
For future reference, append the major Vertica version number to the configuration file name (mybackup
9x).
The selected objects of a backup can include one or more schemas or tables, or a combination of both. For example, you can include schema S1
and tables T1
and T2
in an object-level backup. Multiple backups can be combined into a single backup. A schema-level backup can be integrated with a database backup (and a table backup integrated with a schema-level backup, and so on).
6.1 - Types of backups
vbr supports the following kinds of backups:.
vbr
supports the following kinds of backups:
The vbr
configuration file includes the snapshotName
parameter. Use different snapshot names for different types of backups, including different combinations of objects in object-level backups. Backups with the same snapshot name form a time sequence limited by restorePointLimit
,. Avoid giving all backups the same snapshot name; otherwise, they eventually interfere with each other.
Full backups
A full backup is a complete copy of the database catalog, its schemas, tables, and other objects. This type of backup provides a consistent image of the database at the time the backup occurred. You can use a full backup for disaster recovery to restore a damaged or incomplete database. You can also restore individual objects from a full backup.
When a full backup already exists, vbr
performs incremental backups, whose scope is confined to data that is new or changed since the last full backup occurred. You can specify the number of historical backups to keep.
Archives contain a collection of same-name backups. Each archive can have a different retention policy. For example, TBak
might be the name of an object-level backup of table T
. If you create a daily backup each week, the seven backups of a given week become part of the TBak
archive. Keeping a backup archive lets you revert back to any one of the saved backups.
Object-level backups
An object-level backup consists of one or more schemas or tables or a group of such objects. The conglomerate parts of the object-level backup do not contain the entire database. When an object-level backup exists, you can restore all of its contents or individual objects.
Note
Object-level backups are not supported for Enterprise Mode databases that use a Hadoop File System (HDFS) storage location.
Object-level backups contain the following object types:
Object Type |
Description |
Selected objects |
Objects you choose to be part of an object-level backup. For example, if you specify tables T1 and T2 to include in an object-level backup, they are the selected objects. |
Dependent objects |
Objects that must be included as part of an object-level backup, due to dependencies. Suppose you want to create an object-level backup that includes a table with a foreign key. To do so, table constraints require that you include the primary key table, and vbr enforces this requirement. Projections anchored on a table in the selected objects are also dependent objects. |
Principal objects |
The objects on which both selected and dependent objects depend are called principal objects. For example, each table and projection has an owner, and each is a principal object. |
Hard-link local backups
Valid only for Enterprise Mode, hard-link local backups are saved directly on the database nodes, and can be performed on the entire database or specific objects. Typically you use this kind of backup temporarily before performing a disruptive operation. Do not rely on this kind of backup for long-term use; it cannot protect you from node failures because data and backups are on the same nodes.
A checkpoint backup is a hard-link local backup that comprises a complete copy of the database catalog, and a set of hard file links to corresponding data files. You must save a hard-link local backup on the same file system that is used by the catalog and database files.
6.2 - Creating full backups
Before you create a database backup, verify the following:.
Before you create a database backup, verify the following:
-
You have prepared your backup directory with the vbr init task:
$ vbr -t init -c full_backup.ini
-
Your database is running. It is unnecessary for all nodes to be up in a K-safe database. However, any nodes that are DOWN are not backed up.
-
All of the backup hosts are up and available.
-
The backup host (either on the database cluster or elsewhere) has sufficient disk space to store the backups.
-
The user account of the user who starts vbr
has write access to the target directories on the host backup location. This user can be dbadmin
or another assigned role. However, you cannot run vbr
as root.
-
Each backup has a unique file name.
-
If you want to keep earlier backups, restorePointLimit
is set to a number greater than 1 in the configuration file.
-
If you are backing up an Eon Mode database, you have met the Eon Mode database requirements.
Run vbr
from a terminal. Use the database administrator account from an initiator node in your database cluster. The command requires only the --task backup
and --config-file
arguments (or their short forms, -t
and -c
).
If your configuration file does not contain the database administrator password, vbr
prompts you to enter the password. It does not display what you type.
vbr
requires no further interaction after you invoke it.
The following example shows a full backup:
$ vbr -t backup -c full_backup.ini
Starting backup of database VTDB.
Participating nodes: v_vmart_node0001, v_vmart_node0002, v_vmart_node0003, v_vmart_node0004.
Snapshotting database.
Snapshot complete.
Approximate bytes to copy: 2315056043 of 2356089422 total.
[==================================================] 100%
Copying backup metadata.
Finalizing backup.
Backup complete!
By default, no output is displayed, other than the progress bar. To include additional progress information, use the --debug
option, with a value of 1, 2, or 3.
6.3 - Creating object-level backups
Use object-level backups to back up individual schemas or tables.
Use object-level backups to back up individual schemas or tables. Object-level backups are especially useful for multi-tenanted database sites. For example, an international airport could use a multi-tenanted database to represent different airlines in its schemas. Then, tables could maintain various types of information for the airline, including ARRIVALS, DEPARTURES, and PASSENGER information. With such an organization, creating object-level backups of the specific schemas would let you restore by airline tenant, or any other important data segment.
To create one or more object-level backups, create a configuration file specifying the backup location, the object-level backup name, and a list of objects to include. You can use the includeObjects
and excludeObjects
parameters together with wildcards to specify the objects of interest. For more information about specifying the objects to include, see Including and excluding objects.
Important
If your Eon Mode database has multiple
namespaces, you must specify the namespace to which the objects belong. For
vbr
tasks, namespace names are prefixed with a period. For example,
.n.s.t
refers to table
t
in schema
s
in namespace
n
. See
Eon Mode database requirements for more information.
For more information about configuration files for full or object-level backups, see Sample vbr configuration files and vbr configuration file reference.
While not required, Vertica recommends that you first create a full backup before creating any object-level backups.
Note
Apache Kafka uses internal configuration settings to maintain the integrity of your data. When backing up your Kafka data, Vertica recommends that you perform a
full database backup rather than an object-level backup.
Before you can create a backup, you must prepare your backup directory with the vbr -init task. You must also create a configuration file specifying which objects to back up.
Run vbr
from a terminal using the database administrator account from a node in your database cluster. You cannot run vbr
as root.
You can create an object-level backup as in the following example.
$ vbr --task backup --config-file objectbak.ini
Preparing...
Found Database port: 5433
Copying...
[==================================================] 100%
All child processes terminated successfully.
Committing changes on all backup sites...
backup done!
Naming conventions
Give each object-level backup configuration file a distinct and descriptive name. For instance, at an airport terminal, schema-based backup configuration files use a naming convention with an airline prefix, followed by further description, such as:
AIR1_daily_arrivals_backup
AIR2_hourly_arrivals_backup
AIR2_hourly_departures_backup
AIR3_daily_departures_backup
When database and object-level backups exist, you can recover the backup of your choice.
Caution
Do not change object names in an object-level configuration file if a backup already exists. Doing so overwrites the original configuration file, and you cannot restore it from the earlier backup. Instead, create a different configuration file.
Understanding object-level backup contents
Object-level backups comprise only the elements necessary to restore the schema or table, including the selected, dependent, and principal objects. An object-level backup includes the following contents:
-
Storage: Data files belonging to any specified objects
-
Metadata: Including the cluster topology, timestamp, epoch, AHM, and so on
-
Catalog snippet: Persistent catalog objects serialized into the principal and dependent objects
Some of the elements that AIR2 comprises, for instance, are its parent schema, tables, named sequences, primary key and foreign key constraints, and so on. To create such a backup, vbr
saves the objects directly associated with the table. It also saves any dependencies, such as foreign key (FK) tables, and creates an object map from which to restore the backup.
Note
Because the data in local temp tables persists only within a session, local temporary tables are excluded when you create an object-level backup. For global temporary tables, vbr
stores the table's definition.
Making changes after an object-level backup
Be aware how changes made after an object-level backup affect subsequent backups. Suppose you create an object-level backup and later drop schemas and tables from the database. In this case, the objects you dropped are also dropped from subsequent backups. If you do not save an archive of the object backup, such objects could be lost permanently.
Changing a table name after creating a table backup does not persist after restoring the backup. Suppose that, after creating a backup, you drop a user who owns any selected or dependent objects in that backup. In this case, restoring the backup re-creates the object and assigns ownership to the user performing the restore. If the owner of a restored object still exists, that user retains ownership of the restored object.
To restore a dropped table from a backup:
-
Rename the newly created table from t1 to t2.
-
Restore the backup containing t1.
-
Restore t1. Tables t1 and t2 now coexist.
For information on how Vertica handles object overwrites, refer to the objectRestoreMode
parameter in [misc].
K-safety can increase after an object backup. Restoration of a backup fails if both of the following conditions occur:
Changing principal and dependent objects
If you create a backup and then drop a principal object, restoring the backup restores that principal object. If the owner of the restored object has also been dropped, Vertica assigns the restored object to the current dbadmin.
You can specify how Vertica handles object overwrites in the vbr
configuration file. For more information, refer to the objectRestoreMode parameter in [misc].
IDENTITY sequences are dependent objects because they cannot exist without their tables. An object-level backup includes such objects, along with the tables on which they depend.
Named sequences are not dependent objects because they exist autonomously. A named sequence remains after you drop the table in which the sequence is used. In this case, the named sequence is a principal object. Thus, you must back up the named sequence with the table. Then you can regenerate it, if it does not already exist when you restore the table. If the sequence does exist, vbr
uses it, unmodified. Sequence values could repeat, if you restore the full database and then restore a table backup to a newer epoch.
Considering constraint references
When database objects are related through constraints, you must back them up together. For example, a schema with tables whose constraints reference only tables in the same schema can be backed up. However, a schema containing a table with an FK/PK constraint on a table in another schema cannot. To back up the second table, you must include the other schema in the list of selected objects.
Configuration files for object-level backups
vbr
automatically associates configurations with different backup names but uses the same backup location.
Always create a cluster-wide configuration file and one or more object-level configuration files pointing to the same backup location. Storage between backups is shared, preventing multiple copies of the same data. For object-level backups, using the same backup location causes vbr
to encounter fewer OID conflict prevention techniques. Avoiding OID conflict prevention results in fewer problems when restoring the backup.
When using cluster and object configuration files with the same backup location, vbr
includes additional provisions to ensure that the object-level backups can be used following a full cluster restore. One approach to restoring a full cluster is to use a full database backup to bootstrap the cluster. After the cluster is operational again, you can restore the most recent object-level backups for schemas and tables.
Attempting to restore a full database using an object-level configuration file fails, resulting in this error:
VMart=> /tmp/vbr --config-file=Table2.ini -t restore
Preparing...
Invalid metadata file. Cannot restore.
restore failed!
See Restoring all objects from an object-level backup for more information.
Backup epochs
Each backup includes the epoch to which its contents can be restored. When vbr
restores data, Vertica updates to the current epoch.
vbr
attempts to create an object-level backup five times before an error occurs and the backup fails.
6.4 - Creating hard-link local backups
You can use the hardLinkLocal option to create a full or object-level backup with hard file links on a local database host.
You can use the hardLinkLocal
option to create a full or object-level backup with hard file links on a local database host.
Creating hard-link local backups can provide the following advantages over a remote host backup:
-
Speed: A hard-link local backup is significantly faster than a remote host backup. When backing up, vbr
does not copy files if the backup directory exists on the same file system as the database directory.
-
Reduced network activities: The hard-link local backup minimizes network load because it does not require rsync to copy files to a remote backup host.
-
Less disk space: The backup includes a copy of the catalog and hard file links. Therefore, the local backup uses significantly less disk space than a backup with copies of database data files. However, a hard-link local backup saves a full copy of the catalog each time you run vbr
. Thus, the disk size increases with the catalog size over time.
Hard-link local backups can help you during experimental designs and development cycles. Database designers and developers can create hard-link local object backups of schemas and tables on a regular schedule during design and development phases. If any new developments are unsuccessful, developers can restore one or more objects from the backup.
Planning hard-link local backups
If you plan to use hard-link local backups as a standard site procedure, design your database and hardware configuration appropriately. Consider storing all of the data files on one file system per node. Such a configuration has the advantage of being set up automatically for hard-link local backups.
Specifying backup directory locations
The backupDir
parameter of the configuration file specifies the location of the top-level backup directory. Hard-link local backups require that the backup directory be located on the same Linux file system as the database data. The Linux operating system cannot create hard file links to another file system.
Do not create the hard-link local backup directory in a database data storage location. For example, as a best practice, the database data directory should not be at the top level of the file system, as it is in the following example:
/home/dbadmin/data/VMart/v_vmart_node0001
Instead, Vertica recommends adding another subdirectory for data above the database level, such as in this example:
/home/dbadmin/data/dbdata/VMart/v_vmart_node0001
You can then create the hard-link local backups subdirectory as a peer of the data directory you just created, such as in this example:
/home/dbadmin/data/backups
/home/dbadmin/data/dbdata
When you specify the hard-link backup location, be sure to avoid these common errors when adding the hardLinkLocal=True
parameter to the configuration file:
If ... |
Then... |
Solution |
You specify a backup directory on a different node |
vbr issues an error message and aborts the backup. |
Change the configuration file to include a backup directory on the same host and file system as the database files. Then, run vbr again. |
You specify a backup location on the same node, but a backup destination directory on a different file system from the database and catalog files. |
vbr issues a warning message and performs the backup by copying (not linking) the files from one file system to the other. |
No action required, but copying consumes more disk space and takes longer than linking. |
Creating the backup
Before creating a full hard-link local database backup of an Enterprise Mode database, verify the following:
-
Your database is running. All nodes need not be up in a K-safe database for vbr
to run. However, be aware that any nodes that are DOWN are not backed up.
-
The user account that starts vbr
(dbadmin
or other) has write access to the target backup directories.
Hard-link backups are not supported in Eon Mode.
When you create a full or object-level hard link local backup, that backup contains the following:
Backup |
Catalog |
Database files |
Full backup |
Full copy |
Hard file links to all database files |
Object-level backup |
Full copy |
Hard file links for all objects listed in the configuration file, and any of their dependent objects |
Run the vbr
script from a terminal using the database administrator account from a node in your database cluster. You cannot run vbr
as root.
Hard-link backups use the same vbr
arguments as other backups. Configuring a backup as a hard-link backup is done entirely in the configuration file. The following example shows the syntax:
$ vbr --task backup --config fullbak.ini
You can use hard-link local backups as a staging mechanism to back up to tape or other forms of storage media. The following steps present a simplified approach to saving, and then restoring, hard-link local backups from tape storage:
-
Create a configuration file by copying an existing one or one of the samples described in Sample vbr configuration files.
-
Edit the configuration file (localbak.ini
in this example) to include the hardLinkLocal=True
parameter in the [Transmission]
section.
-
Run vbr
with the configuration file:
$ vbr --task backup --config-file localbak.ini
-
Copy the hard-link local backup directory with a separate process (not vbr
) to tape or other external media.
-
If the database becomes corrupted, transfer the backup files from tape to their original backup directory and restore as explained in Restoring hard-link local backups.
Note
Vertica recommends that you preserve the directory containing the hard-link backup after copying it to other media. If you delete the directory and later copy the files back from external media, the copied files will no longer be links. Instead, they will use as much disk space as if you had done a full (not hard-link) backup.
Restoring hard-link local backups requires some additional (manual) steps. Do not use them as a substitute for regular full backups (Creating full backups).
Hard-link local backups and disaster recovery
Hard-link local backups are only as reliable as the disk on which they are stored. If the local disk becomes corrupt, so does the hard-link local backup. In this case, you are unable to restore the database from the hard-link local backup because it is also corrupt.
All sites should maintain full backups externally for disaster recovery because hard-link local backups do not actually copy any database files.
6.5 - Incremental or repeated backups
As a best practice, Vertica recommends that you take frequent backups if database contents diverge in significant ways.
As a best practice, Vertica recommends that you take frequent backups if database contents diverge in significant ways. Always take backups after any event that significantly modifies the database, such as performing a rebalance. Mixing many backups with significant differences can weaken data K-safety. For example, taking backups both before and after a rebalance is not a recommended practice in cases where the backups are all part of one archive.
Each time you back up your database with the same configuration file, vbr
creates an additional backup and might remove the oldest backup. The backup operation copies new storage containers, which can include:
Use the restorePointLimit
parameter in the configuration file to increase the number of stored backups. If a backup task would cause this limit to be exceeded, vbr
deletes the oldest backup after a successful backup.
When you run a backup task, vbr
first creates the new backup in the specified location, which might temporarily exceed the limit. It then checks whether the number of backups exceeds the value of restorePointLimit
, and, if necessary, deletes the oldest backups until only restorePointLimit
remain. If the requested backup fails or is interrupted, vbr
does not delete any backups.
When you restore a database, you can choose to restore from any retained backup rather than the most recent, so raise the limit if you expect to need access to older backups.
7 - Restoring backups
You can use the vbr restore task to restore your full database or selected objects from backups created by vbr.
You can use the vbr restore
task to restore your full database or selected objects from backups created by vbr
. Typically you use the same configuration file for both operations. The minimal restore command is:
$ vbr --task restore --config-file config-file
You must log in using the database administrator's account (not root).
For full restores, the database must be DOWN. For object restores, the database must be UP.
Usually you restore to the cluster that you backed up, but you can also restore to an alternate cluster if the original one is no longer available.
Restoring must be done on the same architecture as the backup from which you are restoring. You cannot back up an Enterprise Mode database and restore it in Eon Mode or vice versa.
You can perform restore tasks on Permanent node types. You cannot restore data on Ephemeral, Execute, or Standby nodes. To restore or replicate to these nodes, you must first change the destination node type to PERMANENT. For more information, refer to Setting node type.
Restoring objects to a higher Vertica version
Vertica supports restoration to a database that is no more than one minor version higher than the current database version. For example, you can restore objects from a 12.0.x database to a 12.1.x database.
If restored objects require a UDx library that is not present in the later-version database, Vertica displays the following error:
ERROR 2858: Could not find function definition
You can resolve this issue by installing compatible libraries in the target database.
Restoring HDFS storage locations
If your Vertica cluster uses HDFS storage locations, you must do some additional configuration before you can restore. See Requirements for backing up and restoring HDFS storage locations.
HDFS storage locations support only full backup and restore. You cannot perform object backup or restore on a cluster that uses HDFS storage locations.
7.1 - Restoring a database from a full backup
You can restore a full database backup to the database that was backed up, or to an alternate cluster with the same architecture.
You can restore a full database backup to the database that was backed up, or to an alternate cluster with the same architecture. One reason to restore to an alternate cluster is to set up a test cluster to investigate a problem in your production cluster.
To restore a full database backup, you must verify that:
-
Database is DOWN. You cannot restore a full backup when the database is running.
-
All backup hosts are available.
-
Backup directory exists and contains backups of the data to restore.
-
Cluster to which you are restoring the backup has:
-
Same number of nodes as used to create the backup (Enterprise Mode), or at least as many nodes as the primary subclustes (Eon Mode)
-
Same architecture as the one used to create the backup
-
Identical node names
-
Target database already exists on the cluster where you are restoring data.
-
Database can be completely empty, without any data or schema.
-
Database name must match the name in the backup
-
All node names in the database must match the names of the nodes in the configuration file.
-
The user performing the restore is the database administrator.
-
If you are restoring an Eon Mode database, you have met the Eon Mode database requirements.
You can use only a full database backup to restore a complete database. If you have saved multiple backup archives, you can restore from either the last backup or a specific archive.
When your Eon Mode database has multiple communal storage locations, vbr
attempts to copy each database object to its associated storage location. If a storage location has been dropped since the backup was taken, the restore operation attempts to reinstate the dropped location before restoring the data. If the dropped storage location cannot be reinstated, its associated data is copied to the main communal storage location.
Restoring from a full database backup injects the OIDs from each backup into the restored catalog of the full database backup. The catalog also receives all archives. Additionally, the OID generator and current epoch are set to the current epoch.
You can also restore a full backup to a different database than the one you backed up. See Restoring a database to an alternate cluster.
Important
When you restore an Eon Mode database to another database, the restore operation copies the source database's communal storage. The original communal storage is unaffected.
Restoring the most recent backup
Usually, when a node or cluster is DOWN, you want to return the cluster to its most-recent state. Doing so requires restoring a full database backup. You can restore any full database backup from the archive by identifying the name in the configuration file.
To restore from the most recent backup, use the vbr restore task with the configuration file. If your password configuration file does not contain the database superuser password, vbr
prompts you to enter it.
The following example shows how you can use the db.ini
configuration file for restoration:
> vbr --task restore --config-file db.ini
Copying...
1871652633 out of 1871652633, 100%
All child processes terminated successfully.
restore done!
Restoring an archive
If you saved multiple backups, you can specify an archive to restore. To list the archives that exist to choose one to restore, use the vbr --listbackup
task, with a specific configuration file. See Viewing backups.
To restore from an archive, add the --archive
parameter to the command line. The value is the date_timestamp suffix of the directory name that identifies the archive to restore. For example:
$ vbr --task restore --config-file fullbak.ini --archive=20121111_205841
The --archive
parameter identifies the archive created on 11-11-2012 (_archive20121111
), at time 205841
(20:58:41). You need specify only the _archive
suffix, because the configuration file identifies the backup name of the subdirectory, and the OID identifier indicates the backup is an archive.
Restore failures in Eon Mode
When a restore operation fails, vbr
can leave extra files in the communal storage location. If you use communal storage in the cloud, those extra files cost you money. To remove them, restart the database and call CLEAN_COMMUNAL_STORAGE with an argument of true.
7.2 - Restoring a database to an alternate cluster
Vertica supports restoring a full backup to an alternate cluster.
Vertica supports restoring a full backup to an alternate cluster.
Requirements
The process is similar to the process for Restoring a database from a full backup, with the following additional requirements.
The destination database must:
-
Be DOWN.
-
Share the same name as the source database.
-
Have the same number of nodes as the source database.
-
Have the same names as the source nodes.
-
Use the same catalog directory location as the source database.
-
Use the same port numbers as the source database.
Procedure
-
Copy the vbr configuration file that you used to create the backup to any node on the destination cluster.
-
If you are using a stored password, copy the password configuration file to the same location as the vbr configuration file.
-
From the destination node, issue a vbr restore command, such as:
$ vbr -t restore -c full.ini
-
After the restore has completed, start the restored database.
7.3 - Restoring all objects from an object-level backup
To restore everything in an object-level backup to the database from which it was taken, use the vbr restore task with the configuration file you used to create the backup, as in the following example:.
To restore everything in an object-level backup to the database from which it was taken, use the vbr restore task with the configuration file you used to create the backup, as in the following example:
$ vbr --task restore --config-file MySchema.ini
Copying...
1871652633 out of 1871652633, 100%
All child processes terminated successfully.
restore done!
The database must be UP.
You can specify how Vertica reacts to duplicate objects by setting the objectRestoreMode
parameter in the configuration file.
Object-level backup and restore are not supported for HDFS storage locations.
Restoring objects to a changed cluster
Unlike restoring from a full database backup, vbr
supports restoring object-level backups after adding nodes to the cluster. Any nodes that were not in the cluster when you created the object-level backup do not participate in the restore. You can rebalance your cluster after the restore to distribute data among the new nodes.
You cannot restore an object-level backup after removing nodes, altering node names, or changing IP addresses. Trying to restore an object-level backup after such changes causes vbr
to fail and display this message:
Preparing...
Topology changed after backup; cannot restore.
restore failed!
Projection epoch after restore
All object-level backup and restore events are treated as DDL events. If a table does not participate in an object-level backup, possibly because a node is down, restoring the backup affects the projection in the following ways:
Catalog locks during restore
As with other databases, Vertica transactions follow strict locking protocols to maintain data integrity.
When restoring an object-level backup into a cluster that is UP, vbr
begins by copying data and managing storage containers. If necessary, vbr
splits the containers. This process does not require any database locks.
After completing data-copying tasks, vbr
first requires a table object lock (O-lock) and then a global catalog lock (GCLX).
In some circumstances, other database operations, such as DML statements, are in progress when the process attempts to get an O-lock on the table. In such cases, vbr
is blocked from progress until the DML statement completes and releases the lock. After securing an O-lock first, and then a GCLX lock, vbr
blocks other operations that require a lock on the same table.
While vbr
holds its locks, concurrent table modifications are blocked. Database system operations, such as the Tuple Mover (TM) transferring data from memory to disk, are canceled to permit the object-level restore to complete.
Catalog restore events
Each object-level backup includes a section of the database catalog, or a snippet. A snippet contains the selected objects, their dependent objects, and principal objects. A catalog snippet is similar in structure to the database catalog but consists of a subset representing the object information. Objects being restored can be read from the catalog snippet and used to update both global and local catalogs.
Each object from a restored backup is updated in the catalog. If the object no longer exists, vbr
drops the object from the catalog. Any dependent objects that are not in the backup are also dropped from the catalog.
vbr
uses existing dependency verification methods to check the catalog and adds a restore event to the catalog for each restored table. That event also includes the epoch at which the event occurred. If a node misses the restore table event, it recovers projections anchored on the given table.
Reverting object DDL changes
If you restore the database to an epoch that precedes changes to an object's DDL, the restore operation reverts the object to its earlier definition. For example, if you change a table column's data type from CHAR(8)
to CHAR(16)
in epoch 10, and then restore the database from epoch 5, the column reverts to CHAR(8)
data type.
Restoring objects to a higher Vertica version
Vertica supports restoration to a database that is no more than one minor version higher than the current database version. For example, you can restore objects from a 12.0.x database to a 12.1.x database.
If restored objects require a UDx library that is not present in the later-version database, Vertica displays the following error:
ERROR 2858: Could not find function definition
You can resolve this issue by installing compatible libraries in the target database.
Catalog size limitations
Object-level restores can fail if your catalog size is greater than five percent of the total memory available in the node performing the restore. In this situation, Vertica recommends restoring individual objects from the backup. For more information, refer to Restoring individual objects.
See also
7.4 - Restoring individual objects
You can use vbr to restore individual tables and schemas from a full or object-level backup: qualify the restore task with --restore-objects, and specify the objects to restore as a comma-delimited list:.
You can use vbr
to restore individual tables and schemas from a full or object-level backup: qualify the restore
task with --restore-objects
, and specify the objects to restore as a comma-delimited list:
Important
If your Eon Mode database has multiple
namespaces, you must specify the namespace to which the objects belong. For
vbr
tasks, namespace names are prefixed with a period. For example,
.n.s.t
refers to table
t
in schema
s
in namespace
n
. See
Eon Mode database requirements for more information.
$ vbr --task restore --config-file=filename --restore-objects='objectname[,...]' [--archive=archive-id] [--target-namespace=namespace-name]
The following requirements and restrictions apply:
-
The database must be running, and nodes must be UP.
-
Tables must include their schema names.
-
Do not embed spaces before or after comma delimiters of the --restore-objects
list; otherwise, vbr
interprets the space as part of the object name.
-
Object-level restore is not supported for HDFS storage locations. To restore an HDFS storage location you must do a full restore.
If the schema has a disk quota and restoring the table would exceed the quota, the operation fails.
By default, --restore-objects
restores the specified objects from the most recent backup. You can restore from an earlier backup with the --archive parameter.
The --target-namespace
parameter is only valid for Eon Mode databases with multiple namespaces. The parameter specifies the namespace in the target cluster to which objects are restored. For more information, see Eon Mode database requirements.
The following example uses the db.ini
configuration file, which includes the database administrator's password:
> vbr --task restore --config-file=db.ini --restore-objects=salesschema,public.sales_table,public.customer_info
Preparing...
Found Database port: 5433
Copying...
[==================================================] 100%
All child processes terminated successfully.
All extract object child processes terminated successfully.
Copying...
[==================================================] 100%
All child processes terminated successfully.
restore done!
Object dependencies
When you restore an object, Vertica does not always restore dependent objects. For example, if you restore a schema containing views, Vertica does not automatically restore the tables of those views. One exception applies: if database tables are linked through foreign keys, you must restore them together, unless
drop_foreign_constraints
is set in the vbr
configuration file to true.
Note
You must also set
objectRestoreMode
to
coexist
, otherwise Vertica ignores
drop_foreign_constraints
.
Duplicate objects
You can specify how restore operations handle duplicate objects by configuring
objectRestoreMode
. By default, it is set to createOrReplace
, so if a duplicate object exists, the restore operation overwrites it with the archived version.
Interactions with data loaders
When doing a restore with objectRestoreMode
set to coexist
, vbr
creates new data loaders and their corresponding state tables, but does not change the table names in the loader COPY clauses. After the restore, you can use ALTER DATA LOADER to update the COPY statement in the restored data loader to use the new table name.
Eon Mode considerations
Restoring objects to an Eon Mode database can leave unneeded files in cloud storage. These files have no effect on database performance or data integrity. However, they can incur extra cloud storage expenses. To remove these files, restart the database and call CLEAN_COMMUNAL_STORAGE with an argument of true.
See also
7.5 - Restoring objects to an alternate cluster
You can use the restore task to copy objects from one database to another.
You can use the restore task to copy objects from one database to another. You might do this to "promote" tables from a development environment to a production environment, for example. All restrictions described in Restoring individual objects apply when restoring to an alternate cluster.
To restore to an alternate database, you must make changes to a copy of the configuration file that was used to create the backup. The changes are in the [Mapping] and [NodeMapping] sections. Essentially, you create a configuration file for the restore operation that looks to vbr
like a backup of the target database, but it actually describes the backup from the source database. See Restore object from backup to an alternate cluster for an example configuration file.
The following example uses two databases, named source and target. The source database contains a table named sales. The following source_snapshot.ini configuration file is used to back up the source database:
[Misc]
snapshotName = source_snapshot
restorePointLimit = 2
objectRestoreMode = createOrReplace
[Database]
dbName = source
dbUser = dbadmin
dbPromptForPassword = True
[Transmission]
[Mapping]
v_source_node0001 = 192.168.50.168:/home/dbadmin/backups/
The target_snapshot.ini file starts as a copy of source_snapshot.ini. Because the [Mapping] section describes the database that vbr
operates on, we must change the node names to point to the target nodes. We must also add the [NodeMapping] section and change the database name:
[Misc]
snapshotName = source_snapshot
restorePointLimit = 2
objectRestoreMode = createOrReplace
[Database]
dbName = target
dbUser = dbadmin
dbPromptForPassword = True
[Transmission]
[Mapping]
v_target_node0001 = 192.168.50.151:/home/dbadmin/backups/
[NodeMapping]
v_source_node0001 = v_target_node0001
As far as vbr
is concerned, we are restoring objects from a backup of the target database. In reality, we are restoring from the source database.
The following command restores the sales table from the source backup into the target database:
$ vbr --task restore --config-file target_snapshot.ini --restore-objects sales
Starting object restore of database target.
Participating nodes: v_target_node0001.
Objects to restore: sales.
Enter vertica password:
Restoring from restore point: source_snapshot_20160204_191920
Loading snapshot catalog from backup.
Extracting objects from catalog.
Syncing data from backup to cluster nodes.
[==================================================] 100%
Finalizing restore.
Restore complete!
7.6 - Restoring hard-link local backups
You restore from hard-link local backups the same way that you restore from full backups, using the restore task.
You restore from hard-link local backups the same way that you restore from full backups, using the restore task. If you used hard-link local backups to back up to external media, you need to take some additional steps.
Transferring backups to and from remote storage
When a full hard-link local backup exists, you can transfer the backup to other storage media, such as tape or a locally-mounted NFS directory. Transferring hard-link local backups to other storage media may copy the data files associated with the hard file links.
You can use a different directory when you return the backup files to the hard-link local backup host. However, you must also change the backupDir
parameter value in the configuration file before restoring the backup.
Complete the following steps to restore hard-link local backups from external media:
-
If the original backup directory no longer exists on one or more local backup host nodes, re-create the directory.
The directory structure into which you restore hard-link backup files must be identical to what existed when the backup was created. For example, if you created hard-link local backups at the following backup directory, you can then re-create that directory structure:
/home/dbadmin/backups/localbak
-
Copy the backup files to their original backup directory, as specified for each node in the configuration file. For more information, refer to [Mapping].
-
Restore the backup, using one of three options:
-
To restore the latest version of the backup, move the backup files to the following directory:
/home/dbadmin/backups/localbak/node_name/snapshotname
-
To restore a different backup version, move the backup files to this directory:
/home/dbadmin/backups/localbak/node_name/snapshotname_archivedate_timestamp
-
When the backup files are returned to their original backup directory, use the original configuration file to invoke vbr
. Verify that the configuration file specifies hardLinkLocal = true
. Then restore the backup as follows:
$ vbr --task restore --config-file localbak.ini
7.7 - Ownership of restored objects
For a full restore, objects have the owners that they had in the backed-up database.
For a full restore, objects have the owners that they had in the backed-up database.
When performing an object restore, Vertica inserts data into existing database objects. By default, the restore does not affect the ownership, storage policies, or permissions of the restored objects. However, if the restored object does not already exist, Vertica re-creates it. In this situation, the restored object is owned by the user performing the restore. Vertica does not restore dependent grants, roles, or client authentications with restored objects.
If the storage policies of a restored object are not valid, vbr
applies the default storage policy. Restored storage policies can become invalid due to HDFS storage locations, table incompatibility, and unavailable min-max values at restore time.
Sometimes, Vertica encounters a catalog object that it does not need to restore. When this situation occurs, Vertica generates a warning message for that object and the restore continues.
Examples
Suppose you have a full backup, including Schema1, owned by the user Alice. Schema1 contains Table1, owned by Bob, who eventually passes ownership to Chris. The user dbadmin performs the restore. The following scenarios might occur that affect ownership of these objects.
Scenario 1:
Schema1.Table1 has been dropped at some point since the backup was created. When dbadmin performs the restore, Vertica re-creates Schema1.Table1. As the user performing the restore, dbadmin takes ownership of Schema1.Table1. Because Schema1 still exists, Alice retains ownership of the schema.
Scenario 2:
Schema1 is dropped, along with all contained objects. When dbadmin performs the restore, Vertica re-creates the schema and all contained objects. dbadmin takes ownership of Schema1 and Schema1.Table1.
Scenario 3:
Schema1 and Schema1.Table1 both exist in the current database. When dbadmin rolls back to an earlier backup, the ownership of the objects remains unchanged. Alice owns Schema1, and Bob owns Schema1.Table1.
Scenario 4:
Schema1.Table1 exists and dbadmin wants to roll back to an earlier version. In the time since the backup was made, ownership of Schema1.Table1 has changed to Chris. When dbadmin restores Schema1.Table1, Alice remains owner of Schema1 and Chris remains owner of Schema1.Table1. The restore does not revert ownership of Schema1.Table1 from Chris to Bob.
8 - Copying the database to another cluster
The vbr task copycluster combines two other vbr tasks—backup and restore—as a single operation, enabling you to back up an entire data from one Enterprise Mode database cluster and then restore it on another.
Important
Inadequate security on backups can compromise overall database security. Be sure to secure backup locations and strictly limit access to backups only to users who already have permissions to access all database data.
The vbr
task copycluster
combines two other vbr
tasks—
backup
and
restore
—as a single operation, enabling you to back up an entire data from one Enterprise Mode database cluster and then restore it on another. This can facilitate routine operations, such as copying a database between development and production environments.
Caution
copycluster
overwrites all existing data in the destination database. To preserve that data, back up the destination database before launching the copycluster
task.
Restrictions
copycluster
is invalid with Eon databases. It is also incompatible with HDFS storage locations; Vertica does not transfer data to a remote HDFS cluster as it does for a Linux cluster.
Prerequisites
copycluster
requires that the target and source database clusters be identical in the following respects:
-
Vertica hotfix version—for example, 12.0.1-1
-
Number of nodes and node names, as shown in the system table NODES:
=> SELECT node_name FROM nodes;
node_name
------------------
v_vmart_node0001
v_vmart_node0002
v_vmart_node0003
(3 rows)
-
Database name
-
Vertica catalog, data, and temp directory paths as shown in the system table DISK_STORAGE:
=> SELECT node_name,storage_path,storage_usage FROM disk_storage;
node_name | storage_path | storage_usage
------------------+------------------------------------------------------+---------------
v_vmart_node0001 | /home/dbadmin/VMart/v_vmart_node0001_catalog/Catalog | CATALOG
v_vmart_node0001 | /home/dbadmin/VMart/v_vmart_node0001_data | DATA,TEMP
v_vmart_node0001 | /home/dbadmin/verticadb | DEPOT
v_vmart_node0002 | /home/dbadmin/VMart/v_vmart_node0002_catalog/Catalog | CATALOG
...
Note
Directory paths for the catalog, data, and temp storage are the same on all nodes.
-
Database administrator accounts
The following requirements also apply:
-
The target cluster has adequate disk space for copycluster
to complete.
-
The source cluster's database administrator must be able to log in to all target cluster nodes through SSH without a password.
Note
Passwordless access within the cluster is not the same as passwordless access between clusters. The SSH ID of the administrator account on the source cluster and the target cluster are likely not the same. You must configure each host in the target cluster to accept the SSH authentication of the source cluster.
Copycluster procedure
-
Create a configuration file for the copycluster
operation. The Vertica installation includes a sample configuration file:
/opt/vertica/share/vbr/example_configs/copycluster.ini
For each node in the source database, create a [Mapping]
entry that specifies the host name of each destination database node. Unlike other vbr
tasks such as restore
and backup
, mappings for copycluster
only require the destination host name. copycluster
always stores backup data in the catalog and data directories of the destination database.
The following example configures vbr
to copy the vmart
database from its three-node v_vmart
cluster to the test-host
cluster:
[Misc]
snapshotName = CopyVmart
tempDir = /tmp/vbr
[Database]
dbName = vmart
dbUser = dbadmin
dbPassword = password
dbPromptForPassword = False
[Transmission]
encrypt = False
port_rsync = 50000
[Mapping]
; backupDir is not used for cluster copy
v_vmart_node0001= test-host01
v_vmart_node0002= test-host02
v_vmart_node0003= test-host03
-
Stop the target cluster.
-
As database administrator, invoke the vbr
task copycluster
from a source database node:
$ vbr -t copycluster -c copycluster.ini
Starting copy of database VMART.
Participating nodes: vmart_node0001, vmart_node0002, vmart_node0003, vmart_node0004.
Enter vertica password:
Snapshotting database.
Snapshot complete.
Determining what data to copy.
[==================================================] 100%
Approximate bytes to copy: 987394852 of 987394852 total.
Syncing data to destination cluster.
[==================================================] 100%
Reinitializing destination catalog.
Copycluster complete!
Important
If the copycluster
task is interrupted, the destination cluster retains data files that already transferred. If you retry the operation, Vertica does not resend these files.
9 - Replicating objects to another database cluster
The vbr task replicate supports replication of tables and schemas from one database cluster to another.
The vbr
task replicate
supports replication of tables and schemas from one database cluster to another. You might consider replication for the following reasons:
- Copy tables and schemas between test, staging, and production clusters.Replicate certain objects immediately after an important change, such as a large table data load, instead of waiting until the next scheduled backup.
In both cases, replicating objects is generally more efficient than exporting and importing them. The first replication of an object replicates the entire object. Subsequent replications copy only data that has changed since the last replication. Vertica replicates data as of the current epoch on the target database. Used with a cron job, you can replicate key objects to create a backup database.
Replicate versus copycluster
replicate
only supports tables, schemas, and—in Eon Mode databases—namespaces. In situations where the target database is down, or you plan to replicate the entire database, Vertica recommends that you use the copycluster task to copy the database to another cluster. Thereafter, you can use replicate
to update individual objects.
Replication procedure
To replicate objects to another database, perform these actions from the source database:
-
Verify replication requirements.
-
Identify the objects to replicate and target database in the vbr
configuration file.
-
Replicate objects.
Verify replication requirements
The following requirements apply to the source and target databases and their respective clusters:
-
All nodes in both databases are UP, else DOWN nodes are handled as described below.
-
Versions of the two databases must be compatible. Vertica supports object replication to a target database up to one minor version higher than the current database version. For example, you can replicate objects from a 12.0.x database to a 12.1.x database.
-
The same Linux user is associated with the dbadmin account of both databases.
-
The source cluster database administrator can log on to all target nodes through SSH without a password.
Note
The SSH ID of the administrator account on the source cluster and the target cluster are likely not the same. You must configure each host in the target cluster to accept the SSH authentication of the source cluster.
-
Enterprise Mode: The following requirements apply:
-
Both databases have the same number of nodes.
-
Clusters of both databases have the same number of fault groups, where corresponding fault groups in each cluster have the same number of nodes.
-
Eon Mode: The following requirements apply:
- The primary subclusters of both databases have the same node subscriptions.
- Primary subclusters of the target database have as many or more nodes as primary subclusters of the source database.
- For databases with multiple namespaces, the target and source namespaces must satisfy the requirements described in Eon Mode database requirements.
Edit vbr configuration file
Tip
As a best practice, create a separate configuration file for each replication task.
Edit the vbr
configuration file to use for the replicate
task as follows:
-
In the [misc] section, set the objects
parameter to the objects to be replicated:
; Identify the objects that you want to replicate
objects = schema.objectName
Important
If your Eon Mode database has multiple
namespaces, you must specify the namespace to which the objects belong. For
vbr
tasks, namespace names are prefixed with a period. For example,
.n.s.t
refers to table
t
in schema
s
in namespace
n
. See
Eon Mode database requirements for more information.
-
In the [misc] section, set the snapshotName
parameter to a unique snapshot identifier. Multiple replicate
tasks can run concurrently with each other and with backup
tasks, but only if their snapshot names are different.
snapshotName = name
-
In the [database] section, set the following parameters:
; parameters used to replicate objects between databases
dest_dbName =
dest_dbUser =
dest_dbPromptForPassword =
If you use a stored password, be sure to configure the dest_dbPassword
parameter in your password configuration file.
-
In the [mapping] section, map source nodes to target hosts:
[Mapping]
v_source_node0001 = targethost01
v_source_node0002 = targethost02
v_source_node0003 = targethost03
Replicate objects
Run vbr
with the replicate
task:
vbr -t replicate -c configfile.ini
The replicate
task can run concurrently with backup
and other replicate
tasks in either direction, provided all tasks have unique snapshot names. replicate
cannot run concurrently with other vbr
tasks.
Handling DOWN nodes
You can replicate objects if some nodes are down in either the source or target database, provided the nodes are visible on the network.
The effect of DOWN nodes on a replication task depends on whether they are present in the source or target database.
Location |
Effect on replication |
DOWN source nodes |
Vertica can replicate objects from a source database containing DOWN nodes. If nodes in the source database are DOWN, set the corresponding nodes in the target database to DOWN as well. |
DOWN target nodes |
Vertica can replicate objects when the target database has DOWN nodes. If nodes in the target database are DOWN, exclude the corresponding source database nodes using the --nodes parameter on the vbr command line. |
Monitoring object replication
You can monitor object replication in the following ways:
-
View vbr
logs on the source database
-
Check database logs on the source and target databases
-
Query REMOTE_REPLICATION_STATUS on the source database
10 - Including and excluding objects
You specify objects to include in backup, restore, and replicate operations with the vbr configuration and command-line parameters includeObjects and --include-objects, respectively.
You specify objects to include in backup, restore, and replicate operations with the vbr
configuration and command-line parameters includeObject
s and --include-objects
, respectively. You can optionally modify the set of included objects with the vbr
configuration and command line parameters excludeObjects
and --exclude-objects
, respectively. Both parameters support wildcard expressions to include and exclude groups of objects.
Important
If your Eon Mode database has multiple
namespaces, you must specify the namespace to which the objects belong. For
vbr
tasks, namespace names are prefixed with a period. For example,
.n.s.t
refers to table
t
in schema
s
in namespace
n
. See
Eon Mode database requirements for more information.
For example, you might back up all tables in the schema store
, and then exclude from the backup the table store.orders
and all tables in the same schema whose name includes the string account
:
vbr --task=backup --config-file=db.ini --include-objects 'store.*' --exclude-objects 'store.orders,store.*account*'
Wildcard characters
Character |
Description |
? |
Matches any single character. Case-insensitive. |
|
Matches 0 or more characters. Case-insensitive. |
\ |
Escapes the next character. To include a literal ? or * in your table or schema name, use the \ character immediately before the escaped character. To escape the \ character itself, use a double \. |
" |
Escapes the . character. To include a literal . in your table or schema name, wrap the character in double quotation marks. |
Matching schemas
Any string pattern without a period (.
) character represents a schema. For example, the following includeObjects
list can match any schema name that starts with the string customer
, and any two-character schema name that starts with the letter s
:
includeObjects = customer*,s?
When a vbr
operation specifies a schema that is unqualified by table references, the operation includes all tables of that schema. In this case, you cannot exclude individual tables from the same schema. For example, the following vbr.ini
entries are invalid:
; invalid:
includeObjects = VMart
excludeObjects = VMart.?table?
You can exclude tables from an included schema by identifying the schema with the pattern schemaname
.*. In this case, the pattern explicitly specifies to include all tables in that schema with the wildcard *. In the following example, the include-objects
parameter includes all tables in the VMart schema, and then excludes specific tables—specifically, the table VMart.sales
and all VMart tables that include the string account
:
--include-objects 'VMart.*'
--exclude-objects 'VMart.sales,VMart.*account*'
Matching tables
Any pattern that includes a period (.
) represents a table. For example, in a configuration file, the following includeObjects
list matches the table name sales.newclients
, and any two-character table name in the same schema:
includeObjects = sales.newclients,sales.??
You can also match all schemas and tables in a database or backup by using the pattern *.*. For example, you can restore all tables and schemas in a backup using this command:
--include-objects '*.*'
Because a vbr
parameter is evaluated on the command line, you must enclose the wildcards in single quote marks to prevent Linux from misinterpreting them.
Testing wildcard patterns
You can test the results of any pattern by using the --dry-run
parameter with a backup or restore command. Commands that include --dry-run
do not affect your database. Instead, vbr
displays the result of the command without executing it. For more information on --dry-run
, refer to the vbr reference.
Using wildcards with backups
You can identify objects to include in your object backup tasks using the includeObjects
and excludeObjects
parameters in your configuration file. A typical configuration file might include the following content:
[Misc]
snapshotName = dbobjects
restorePointLimit = 1
enableFreeSpaceCheck = True
includeObjects = VMart.*,online_sales.*
excludeObjects = *.*temp*
In this example, the backup would include all tables from the VMart and online_sales
schemas, while excluding any table containing the string 'temp' in its name belonging to any schema.
After it evaluates included objects, vbr
evaluates excluded objects and removes excluded objects from the included set. For example, if you included schema1.table1 and then excluded schema1.table1, that object would be excluded. If no other objects were included in the task, the task would fail. The same is true for wildcards. If an exclusion pattern removes all included objects, the task fails.
Using wildcards with restore
You can identify objects to include in your restore tasks using the --include-objects
and --exclude-objects
parameters.
Note
Take extra care when using wildcard patterns to restore database objects. Depending on your object restore mode settings, restored objects can overwrite existing objects. Test the impact of a wildcard restore with the --dry-run
vbr
parameter before performing the actual task.
As with backups, vbr
evaluates excluded objects after it evaluates included objects and removes excluded objects from the included set. If no objects remain, the task fails.
A typical restore command might include this content. (Line wrapped in the documentation for readability, but this is one command.)
$ vbr -t restore -c verticaconfig --include-objects 'customers.*,sales??'
--exclude-objects 'customers.199?,customers.200?'
This example includes the schema customers, minus any tables with names matching 199 and 200 plus one character, as well as all any schema matching 'sales' plus two characters.
Another typical restore command might include this content.
$ vbr -t restore -c replicateconfig --include-objects '*.transactions,flights.*'
--exclude-objects 'flights.DTW*,flights.LAS*,flights.LAX*'
This example includes any table named transactions, regardless of schema, and any tables beginning with DTW, LAS, or LAX belonging to the schema flights. Although these three-letter airport codes are capitalized in the example, vbr
is case-insensitive.
11 - Managing backups
vbr provides several tasks related to managing backups: listing them, checking their integrity, selectively deleting them, and more.
Important
Inadequate security on backups can compromise overall database security. Be sure to secure backup locations and strictly limit access to backups only to users who already have permissions to access all database data.
vbr
provides several tasks related to managing backups: listing them, checking their integrity, selectively deleting them, and more. In addition, vbr
has parameters to allow you to restrict its use of system resources.
11.1 - Viewing backups
You can view backups in three ways:.
You can view backups in three ways:
- vbr listbackup task: List backups on the local or remote backup host.
- DATABASE_BACKUPS system table: Query for historical information about backups.
- vbr log file: Check the status of a backup. The log file resides on the node where you ran
vbr
, in the directory specified by the vbr
configuration parameter tempDir, by default set to /tmp/vbr
.
vbr listbackup
The vbr
task listbackup
returns a list of all backups on backup hosts, whether local or remote. If unqualified by task options, listbackup
returns the list to standard output in columnar format.
The following example lists two full backups of a three-node cluster, where each node is mapped to the same backup host, bkhost
. Backups are listed in reverse chronological order:
$ vbr -t listbackup -c fullbackup.ini
backup backup_type epoch objects include_patterns exclude_patterns nodes(hosts) version file_system_type
backup_snapshot_20220912_131918 full 3915 v_vmart_node0001(10.20.100.247), v_vmart_node0002(10.20.100.248), v_vmart_node0003(10.20.100.249) v12.0.2-20220911 [Linux]
backup_snapshot_20220909_122300 full 3910 v_vmart_node0001(10.20.100.247), v_vmart_node0002(10.20.100.248), v_vmart_node0003(10.20.100.249) v12.0.2-20220911 [Linux]
The following table contains information about output columns that are returned from a vbr
listbackup
task:
Column |
Description |
backup |
Identifies a backup by concatenating the configured snapshot name with the backup timestamp:
snapshot-name _ YYYYMMDD _ HHMMSS
For example, the following identifier identifies a backup generated by the configuration file that sets snapshotName to monthlyBackup on April 14 2022, at 13:44:52.
monthlyBackup_20220414_134452
Use the timestamp portion of this identifier—20220414_134452 —to specify the archived backup you wish to restore.
|
backup_type |
Type of backup, full or object. |
epoch |
Epoch when the backup was created. |
objects |
Objects that were backed up, blank if a full backup. |
include_patterns |
Wildcard patterns included in object backup tasks using the includeObjects parameter in your configuration file, blank for full backups. |
exclude_patterns |
Wildcard patterns included in your object backup tasks using the excludeObjects parameter in your configuration file, blank for full backups. |
nodes (hosts) |
(Enterprise Mode only) Names of database nodes and hosts that received the backup. |
version |
Version of Vertica used to create the backup. |
file_system_type |
Storage location file system of the Vertica hosts that comprise this backup—for example, Linux or GCS. |
communal_storage |
(Eon Mode only) Communal storage location for the backup. |
Important
If you try to list backups on a local cluster with no database, the backup configuration node-host mappings must provide full paths. If the configuration maps to local hosts using the
[] shortcut, the
listbackup
task fails.
Listbackup options
You can qualify the listbackup
task with one or more options:
vbr --task listbackup [--list-all] [--json] [--list-output-file filepath] --config-file filepath
Option |
Description |
--list-all |
Generate a list of all snapshots stored on the hosts and paths listed in the specified configuration file. |
--json |
Use JSON delimited format. |
--list-output-file |
Redirect output to the specified file. |
The following example qualifies the listbackup
task with the --list-all
option. The output shows three nightly backups from nodes vmart_1
, vmart_2
, and v_mart3
, which the configuration file nightly.ini
maps to their respective hosts doca01
, doca02
, and doca03
. The listbackup
output shows that these locations contain not only object backups that were generated with nightly.ini
, but also full backups created with a second configuration file, weekly.ini
, which maps to the same nodes and host:
$ vbr --task listbackup --list-all --config-file /home/dbadmin/nightly.ini
backup backup_type epoch objects include_patterns exclude_patterns nodes(hosts) version file_system_type
weekly_20220508_183249 full 1720 vmart_1(doca01), vmart_2(doca02), vmart_3(doca03) v11.0.1 [Linux]
weekly_20220501_182816 full 1403 vmart_1(doca01), vmart_2(doca02), vmart_3(doca03) v11.0.1 [Linux]
weekly_20220424_192754 full 1109 vmart_1(doca01), vmart_2(doca02), vmart_3(doca03) v11.0.1 [Linux]
nightly_20220507_183034 object 1705 sales_schema vmart_1(doca01), vmart_2(doca02), vmart_3(doca03) v11.0.1 [Linux]
nightly_20220506_181808 object 1692 sales_schema vmart_1(doca01), vmart_2(doca02), vmart_3(doca03) v11.0.1 [Linux]
nightly_20220505_193906 object 1632 sales_schema vmart_1(doca01), vmart_2(doca02), vmart_3(doca03) v11.0.1 [Linux]
Query backup history
You can query the system table DATABASE_BACKUPS to get historical information about backups. The objects
column lists which objects were included in object-level backups.
Important
Do not use the
backup_timestamp
value to
restore an archive. Instead, use the values provided by vbr
listbackup
task.
=> SELECT * FROM v_monitor.database_backups;
-[ RECORD 1 ]----+------------------------------
backup_timestamp | 2013-05-10 14:41:12.673381-04
node_name | v_vmart_node0003
snapshot_name | schemabak
backup_epoch | 174
node_count | 3
file_system_type | [Linux]
objects | public, store, online_sales
-[ RECORD 2 ]----+------------------------------
backup_timestamp | 2013-05-13 11:17:30.913176-04
node_name | v_vmart_node0003
snapshot_name | kantibak
backup_epoch | 175
node_count | 3
file_system_type | [Linux]
objects |
-[ RECORD 13 ]---+------------------------------
backup_timestamp | 2013-05-16 07:02:23.721657-04
node_name | v_vmart_node0003
snapshot_name | objectbak
backup_epoch | 180
node_count | 3
file_system_type | [Linux]
objects | test, test2
-[ RECORD 14 ]---+------------------------------
backup_timestamp | 2013-05-16 07:19:44.952884-04
node_name | v_vmart_node0003
snapshot_name | table1bak
backup_epoch | 180
node_count | 3
file_system_type | [Linux]
objects | test
-[ RECORD 15 ]---+------------------------------
backup_timestamp | 2013-05-16 07:20:18.585076-04
node_name | v_vmart_node0003
snapshot_name | table2bak
backup_epoch | 180
node_count | 3
file_system_type | [Linux]
objects | test2
11.2 - Checking backup integrity
Vertica can confirm the integrity of your backup files and the manifest that identifies them.
Vertica can confirm the integrity of your backup files and the manifest that identifies them. By default, backup integrity checks output their results to the command line.
Quick check
The quick-check
task gathers all backup metadata from the backup location specified in the configuration file and compares that metadata to the backup manifest. A quick check does not verify the objects themselves. Instead, this task outputs an exceptions list of any discrepancies between objects in the backup location and objects listed in the backup manifest.
Use the following format to perform quick check task:
$ vbr -t quick-check -c configfile.ini
For example:
$ vbr -t quick-check -c backupconfig.ini
Full check
The full-check
task verifies all objects listed in the backup manifest against filesystem metadata. A full check includes the same steps as a quick check. You can include the optional --report-file
parameter to output results to a delimited JSON file. This task outputs an exceptions list that identifies the following inconsistencies:
Use the following template to perform a full check task:
$ vbr -t full-check -c configfile.ini --report-file=path/filename
For example:
$ vbr -t full-check -c backupconfig.ini --report-file=logging/fullintegritycheck.json
11.3 - Repairing backups
Vertica can reconstruct backup manifests and remove unneeded backup objects.
Vertica can reconstruct backup manifests and remove unneeded backup objects.
Quick repair
The quick-repair
task rebuilds the backup manifest, based on the manifests contained in the backup location.
Use the following template to perform a quick repair task:
$ vbr -t quick-repair -c configfile.ini
Garbage collection
The collect-garbage
task rebuilds your backup manifest and deletes any backup objects that do not appear in the manifest. You can include the optional --report-file
parameter to output results to a delimited JSON file.
Use the following template to perform a garbage collection task:
$ vbr -t collect-garbage -c configfile.ini --report-file=path/filename
11.4 - Removing backups
You can remove existing backups and restore points using vbr.
You can remove existing backups and restore points using vbr
. When you use the remove
task, vbr
updates the manifests affected by the removal and maintains their integrity. If the backup archive contains multiple restore points, removing one does not affect the others. When you remove the last restore point, vbr
removes the backup entirely.
Note
Vertica does not support removing backups through the file system.
Use the following template to perform a remove task:
$ vbr -t remove -c configfile.ini --archive timestamp
You can remove multiple restore points using the archive parameter. To obtain the timestamp for a particular restore point, use the listbackup task.
-
To remove multiple restore points, use a comma separator:
--archive="restore-point1,restore-point2"
-
To remove an inclusive range of restore points, use a colon:
--archive="oldest-restore-point:newest-restore-point"
-
To remove all restore points, specify an archive value of all
:
--archive all
The following example shows how you can remove a restore point from an existing backup:
$ vbr -t remove -c backup.ini --archive 20160414_134452
Removing restore points: 20160414_134452
Remove complete!
11.5 - Estimating log file disk requirements
One of the vbr configuration parameters is tempDir.
One of the vbr
configuration parameters is tempDir . This parameter specifies the database host location where vbr
writes its log files and some other temp files (of negligible size). The default location is the /tmp/vbr
directory on each database host. You can change the default location by specifying a different path in the configuration file.
The temporary storage directory also contains local log files describing the progress, throughput, and any errors encountered for each node. Each time you run vbr
, the script creates a separate log file, each named with a timestamp. When using default settings, the log file typically uses about 4KB of space per node per backup.
The vbr
log files are not removed automatically, so you must delete older log files manually, as necessary.
11.6 - Allocating resources
By default, vbr allows a single rsync connection (for Linux file systems), 10 concurrent threads (for cloud storage connections), and unlimited bandwidth for any backup or restore operation.
By default, vbr
allows a single rsync connection (for Linux file systems), 10 concurrent threads (for cloud storage connections), and unlimited bandwidth for any backup or restore operation. You can change these values in your configuration file. See vbr configuration file reference for details about these parameters.
Connections
You might want to increase the number of concurrent connections. If you have many Vertica files, more connections can provide a significant performance boost as each connection increases the number of concurrent file transfers.
For more information, refer to the following parameters in [transmission]:
-
total_bwlimit_backup
-
total_bwlimit_restore
-
concurrency_backup
-
concurrency_restore
and the following parameters in [CloudStorage]:
Bandwidth limits
You can limit network bandwidth use through the total_bwlimit_backup
and total_bwlimit_restore
data transmission parameters. For more information, refer to [transmission].
12 - Troubleshooting backup and restore
These tips can help you avoid issues related to backup and restore with Vertica and to troubleshoot any problems that occur.
These tips can help you avoid issues related to backup and restore with Vertica and to troubleshoot any problems that occur.
Check vbr log
The vbr
log is separate from the Vertica log. Its location is set by the vbr
configuration parameter tempDir, by default /tmp/vbr
.
If the log has no explanation for an error or unexpected results, try increasing the logging level with the vbr
option --debug
:
vbr -t backup -c config-file --debug debug-level
where debug-level
is an integer between 0 (default) and 3 (verbose), inclusive. As you increase the logging level, the file size of the log increases. For example:
$ vbr -t backup -c full_backup.ini --debug 3
Note
Scrutinize reports do not include vbr
logs.
Check status of backup nodes
Backups fail if you run out of disk space on the backup hosts or if vbr
cannot reach them all. Check that you have sufficient space on each backup host and that you can reach each host via ssh.
Sometimes vbr
leaves rsync processes running on the database or backup nodes. These processes can interfere with new ones. If you get an rsync error in the console, look for runaway processes and kill them.
Common errors
Object replication fails
If you do not exclude the DOWN node, replication fails with the following error:
Error connecting to a destination database node on the host <hostname> : <error> ...
Confirm that you excluded all DOWN nodes from the object replication operation.
Error restoring an archive
You might see an error like the following when restoring an archive:
$ vbr --task restore --archive prd_db_20190131_183111 --config-file /home/dbadmin/backup.ini
IOError: [Errno 2] No such file or directory: '/tmp/vbr/vbr_20190131_183111_s0rpYR/prd_db.info'
The problem is that the archive name is not in the correct format. Specify only the date/timestamp suffix of the directory name that identifies the archive to restore, as described in Restoring an Archive. For example:
$ vbr --task restore --archive 20190131_183111 --config-file /home/dbadmin/backup.ini
Backup or restore fails when using an HDFS storage location
When performing a backup of a cluster that includes HDFS storage locations, you might see an error like the following:
ERROR 5127: Unable to create snapshot No such file /usr/bin/hadoop:
check the HadoopHome configuration parameter
This error is caused by the backup script not being able to back up the HDFS storage locations. You must configure Vertica and Hadoop to enable the backup script to back up these locations. See Requirements for backing up and restoring HDFS storage locations.
Object-level backup and restore are not supported with HDFS storage locations. You must use full backup and restore.
Could not connect to endpoint URL
(Eon Mode) When performing a cross-endpoint operation, you can see a connection error if you failed to specify the endpoint URL for your communal storage (VBR_COMMUNAL_STORAGE_ENDPOINT_URL
). When the endpoint is missing but you specify credentials for communal storage, vbr
tries to use those credentials to access AWS. This access fails, because those credentials are for your on-premises storage, not AWS. When performing cross-endpoint operations, check that all environment variables described in Cross-Endpoint Backups in Eon Mode are set correctly.
13 - vbr reference
vbr can back up and restore the full database, or specific schemas and tables.
vbr
can back up and restore the full database, or specific schemas and tables. It also supports a number of other backup-related tasks—for example, list the history of all backups.
vbr
is located in the Vertica binary directory—typically,
/opt/vertica/bin/vbr
.
Syntax
vbr { --help | -h }
| { --task | -t } task { --config-file | -c } configfile [ option[...] ]
Global options
The following options apply to all vbr
tasks. For additional options, see Task-Specific Options.
Option |
Description |
--help | -h |
Display a brief vbr usage guide. |
{--task | -t} task |
The vbr task to execute, one of the following:
-
backup: create a full or object-level backup
-
collect-garbage: rebuild the backup manifest and delete any unreferenced objects in the backup location
-
copycluster: copy the database to another cluster (Enterprise Mode only, invalid for HDFS)
-
full-check: verify all objects in the backup manifest and report missing or unreferenced objects
-
init: prepare a new backup location
-
listbackup: show available backups
-
quick-check: confirm that all backed-up objects are in the backup manifest and report discrepancies between objects in the backup location and objects listed in the backup manifest
-
quick-repair: build a replacement backup manifest based on storage locations and objects
-
remove: remove specified restore points
-
replicate: copy objects from one cluster to another
-
restore: restore a full or object-level backup
Note
In general, tasks cannot run concurrently, with one exception: multiple replicate tasks can run concurrently with each other, and with backup .
|
{--config-file | -c} path |
File path of the configuration file to use for the given task. |
--debug level |
Level of debug messaging to the vbr log, an integer from 0 to 3 inclusive, where 0 (default) turns off debug messaging, and 3 is the most verbose level of messaging. |
--nodes nodeslist |
(Enterprise Mode only) Comma-delimited list of nodes on which to perform a vbr task. Listed nodes must match names in the Mapping section of the configuration file. Use this option to exclude DOWN nodes from a task, so vbr does not return with an error.
Caution
If you use --nodes with a backup task, be sure that the nodes list includes all UP nodes; omitting any UP node can cause data loss in that backup.
|
--showconfig |
Displays the configuration values used to perform a specific task, displayed in raw JSON format before vbr starts task execution:
vbr -t task -c configfile --showconfig
--showconfig can also show settings for a given configuration file:
vbr -c configfile --showconfig
|
Task-specific options
Some vbr
tasks support additional options, described in the sections that follow.
The following vbr
tasks have no task-specific options:
-
copycluster
-
quick-check
-
quick-repair
Backup
Create a full database or object-level backup, depending on configuration file settings.
Option |
Description |
--dry-run |
Perform a test run to evaluate impact of the backup operation—for example, its size and potential overhead. |
Collect-garbage
Rebuild the backup manifest and delete any unreferenced objects in the backup location.
Option |
Description |
--report-file |
Output results to a delimited JSON file. |
Full-check
Produce a full backup integrity check that verifies all objects in the backup manifest against file system metadata, and then outputs missing and unreferenced objects.
Option |
Description |
--report-file |
Output results to a delimited JSON file. |
Init
Create a backup directory or prepare an existing one for use, and create backup manifests. This task must precede the first vbr
backup operation.
Option |
Description |
--cloud-force-init |
Qualifies the --task init command to force the init task to succeed on S3 or GS storage targets when an identity/lock file mismatch occurs. |
--report-file |
Output results to a delimited JSON file. |
Listbackup
Displays backups associated with the specified configuration file. Use this task to get archive (restore point) identifiers for restore
and remove
tasks.
Option |
Description |
--list-all |
List all backups stored on the hosts and paths in the configuration file. |
--list-output-file filename |
Redirect output to the specified file. |
--json |
Use JSON delimited format. |
Remove
Remove the backup restore points specified by the --archive
option.
Option |
Description |
--archive |
Restore points to remove, one of the following:
-
timestamp : A single restore point to remove.
-
timestamp : timestamp : A range of contiguous restore points to remove.
-
all : Remove all restore points.
You obtain timestamp identifiers for the target restore points with the listbackup task. For details, see vbr listbackup.
|
Replicate
Copy objects from one cluster to an alternate cluster. This task can run concurrently with backup
and other replicate
tasks.
Option |
Description |
--archive |
Timestamp of the backup restore point to replicate, obtained from the listbackup task. |
--dry-run |
Perform a test run to evaluate impact of the replicate operation—for example, its size and potential overhead. |
--target-namespace |
Eon Mode only, the namespace in the target database to which objects are replicated.
vbr behaves differently depending on whether the target namespace exists:
- Exists:
vbr attempts to restore or replicate the objects to the existing namespace, which must have the same shard count, shard boundaries, and node subscriptions as the source namespace. If these conditions are not met, the vbr task fails.
- Nonexistent:
vbr creates a namespace in the target database with the name specified in --target-namespace and the shard count of the source namespace, and then replicates or restores the objects to that namespace.
If no target namespace is specified, vbr attempts to restore or replicate objects to a namespace with the same name as the source namespace.
|
Restore
Restore a full or object-level database backup.
Option |
Description |
--archive |
Timestamp of the backup to restore, obtained from the listbackup task. If omitted, vbr restores the latest backup of the specified configuration. |
--restore-objects |
Comma-delimited list of objects—tables and schemas—to restore from a given backup. |
--include-objects |
Comma-delimited list of database objects or patterns of objects to include from a full or object-level backup. |
--exclude-objects |
Comma-delimited list of database objects or patterns of objects to exclude from the set specified by --include-objects . This option can only be used together with --include-objects . |
--dry-run |
Perform a test run to evaluate impact of the restore operation—for example, its size and potential overhead. |
--target-namespace |
Eon Mode only, the namespace in the target database to which objects are restored.
vbr behaves differently depending on whether the target namespace exists:
- Exists:
vbr attempts to restore or replicate the objects to the existing namespace, which must have the same shard count, shard boundaries, and node subscriptions as the source namespace. If these conditions are not met, the vbr task fails.
- Nonexistent:
vbr creates a namespace in the target database with the name specified in --target-namespace and the shard count of the source namespace, and then replicates or restores the objects to that namespace.
If no target namespace is specified, vbr attempts to restore or replicate objects to a namespace with the same name as the source namespace.
|
Note
The --restore-objects
option and the --include-objects
/exclude-objects
options are mutually exclusive. You can use --include-objects
to specify a set of objects and combine it with --exclude-objects
to remove objects from the set.
Interrupting vbr
To cancel a backup, use Ctrl+C or send a SIGINT to the vbr
Python process. vbr
stops the backup process after it completes copying the data. Canceling a vbr
backup with Ctrl+C closes the session immediately.
The files generated by an interrupted backup process remain in the target backup location directory. The next backup process picks up where the interrupted process left off.
Backup operations are atomic, so interrupting a backup operation does not affect the previous backup. The latest backup replaces the previous backup only after all other backup steps are complete.
Caution
restore
or copycluster
operations overwrite the database catalog directory. Interrupting either of these processes leaves the database unusable until you restart the process and allow it to finish.
See also
14 - vbr configuration file reference
vbr configuration files divide backup settings into sections, under section-specific headings such as [Database] and [CloudStorage], which contain database access and cloud storage location settings, respectively.
vbr
configuration files divide backup settings into sections, under section-specific headings such as [Database]
and [CloudStorage]
, which contain database access and cloud storage location settings, respectively. Sections can appear in any order and can be repeated—for example, multiple [Database]
sections.
Important
Section headings are case-sensitive.
14.1 - [CloudStorage]
The [CloudStorage] section replaces the now-deprecated [S3] section of earlier releases.
Eon Mode only
Sets options for storing backup data on in a supported cloud storage location.
The [CloudStorage] and [Mapping] configuration sections are mutually exclusive. If you include both, the backup fails with this error message:
Config has conflicting sections (Mapping, CloudStorage), specify only one of them.
Important
The [CloudStorage] section replaces the now-deprecated [S3] section of earlier releases. Likewise, cloud storage-specific configuration variables replace the equivalent S3 configuration variables.
Do not include [S3] and [CloudStorage] sections in the same configuration file; otherwise, vbr will use [S3] configuration settings and ignore [CloudStorage] settings, which can yield unexpected results.
Options
cloud_storage_backup_file_system_path
- Host and path that you are using to handle file locking during the backup process. The format is
[
host
]:
path
. vbr must be able to create a passwordless ssh connection to the location that you specify here.
To use a local NFS file system, omit the host: []:
path
.
cloud_storage_backup_path
- Backup location. For S3-compatible or cloud locations, provide the bucket name and backup path. For HDFS locations, provide the appropriate protocol and backup path.
When you back up to cloud storage, all nodes back up to the same cloud storage bucket. You must create the backup location in the cloud storage before performing a backup. The following example specifies the backup path for S3 storage:
cloud_storage_backup_path = s3://
backup-bucket
/
database-backup-path
/
When you back up to an HDFS location, use the swebhdfs
protocol if you use wire encryption. Use the webhdfs
protocol if you do not use wire encryption. The following example uses encryption:
cloud_storage_backup_path = swebhdfs://
backup-nameservice
/
database-backup-path
/
cloud_storage_ca_bundle
-
Path to an SSL server certificate bundle.
Note
The key (*pem
) file must be on the same path on all nodes of the database cluster.
For example:
cloud_storage_ca_bundle = /
home
/
user
/
ssl-folder
/
ca-bundle
cloud_storage_concurrency_backup
-
The maximum number of concurrent backup threads for backup to cloud storage. For very large data volumes (greater than 10TB), you might need to reduce this value to avoid vbr failures.
Default: 10
cloud_storage_concurrency_delete
- The maximum number of concurrent delete threads for deleting files from cloud storage. If the vbr configuration file contains a [CloudStorage] section, this value is set to 10 by default.
Default: 10
cloud_storage_concurrency_restore
- The maximum number of concurrent restore threads for restoring from cloud storage. For very large data volumes (greater than 10TB), you might need to reduce this value to avoid vbr failures.
Default: 10
cloud_storage_encrypt_at_rest
- S3 storage only. To enable at-rest encryption of your backups to S3, specify a value of
sse
. For more information, see Encrypting Backups on Amazon S3.
This value takes the following form:
cloud_storage_encrypt_at_rest = sse
cloud_storage_encrypt_transport
- Boolean. If true, uses SSL encryption to encrypt data moving between your Vertica cluster and your cloud storage instance.
You must set this parameter to true if backing up or restoring from:
-
Amazon EC2 cluster
-
Google Cloud Storage (GCS)
-
Eon Mode on-premises database with communal storage on HDFS, to use wire encryption.
Default: true
cloud_storage_sse_kms_key_id
- S3 storage only. If you use Amazon Key Management Security, use this parameter to provide your key ID. If you enable encryption and do not include this parameter, vbr uses SSE-S3 encryption.
This value takes the following form:
cloud_storage_sse_kms_key_id =
key-id
14.2 - [database]
Sets options for accessing the database.
Sets options for accessing the database and, for replication, the destination.
Database options
dbName
- Name of the database to back up. If you do not supply a database name, vbr selects the current database to back up.
OpenText recommends that you provide a database name.
dbPromptForPassword
- Boolean, whether vbr prompts for a password. If set to false (no prompt at runtime), then the dbPassword parameter in the password configuration file must provide the password; otherwise, vbr prompts for one at runtime.
As a best practice, set dbPromptForPassword
to false if dbUseLocalConnection is set to true.
Default: true
dbUser
- Vertica user that performs vbr operations on the database operations. In the case of replicate tasks, this user is the source database user. You must be logged on as the database administrator to back up the database. The user password can be stored in the dbPassword parameter of the password configuration file; otherwise, vbr prompts for one at runtime.
Default: Current user name
dbUseLocalConnection
- Boolean, whether vbr accesses the target database over a local connection with the user's Vertica password. If dbUseLocalConnection is enabled, vbr can operate on a local database without the user password being set in the vbr configuration. vbr ignores the passwordFile parameter and any settings in the password configuration file, including dbPassword.
If dbUseLocalConnection is enabled, then an authentication method must be granted to vbr users—typically a dbadmin—where method type is set to trust, and access is set to local:
=> CREATE AUTHENTICATION h1 method 'trust' local;
=> GRANT AUTHENTICATION h1 to dbadmin;
Default: false
Destination options
Set destination database parameters only if replicating objects on alternate clusters:
dest_dbName
- Name of the destination database.
dest_dbPromptForPassword
- Boolean, whether vbr prompts for the destination database password. If set to false (no prompt at runtime), then dest_dbPassword parameter in the password configuration file must provide the password; otherwise, vbr prompts for one at runtime.
dest_dbUser
- Vertica user name in the destination database to use for loading replicated data. This user must have superuser privileges.
14.3 - [mapping]
Specifies all database nodes to include in an Enterprise Mode database backup.
Enterprise Mode only
Specifies all database nodes to include in an Enterprise Mode database backup. This section also specifies the backup host and directory of each node. If objects are replicated to an alternative database, the [Mapping] section maps target database nodes to the corresponding source database backup locations.
Note
[CloudStorage] and [Mapping] configuration sections are mutually exclusive. If you include both, the backup fails.
Unlike other configuration file sections, the [Mapping] section does not use named parameters. Instead, it contains entries of the following format:
dbNode = backupHost:backupDir
dbNode
- Name of the database node as recognized by Vertica. This value is not the node's host name; rather, it is the name Vertica uses internally to identify the node, typically in this format:
v_
dbname
_node000
int
To find database node names in your cluster, query the node_name
column of the NODES system table.
backupHost
- The target host name or IP address on which to store this node's backup.
backupHost
is different from dbNode
. The copycluster
command uses this value to identify the target database node host name.
IPv6 addresses must be enclosed by square brackets []
. For example:
v_backup_restore_node0001 = [fdfb:dbfa:0:2000::112]:/backupdir/backup_restore.2021-06-01T16:17:57
v_backup_restore_node0002 = [fdfb:dbfa:0:2000::113]:/backupdir/backup_restore.2021-06-01T16:17:57
v_backup_restore_node0003 = [fdfb:dbfa:0:2000::114]:/backupdir/backup_restore.2021-06-01T16:17:57
Important
Although supported, backups to an NFS host might perform poorly, particularly on networks shared with rsync operations.
backupDir
- The full path to the directory on the backup host or node where the backup will be stored. The following requirements apply this directory:
-
Already exists when you run vbr
with --task backup
-
Writable by the user account used to run vbr
.
-
Unique to the database you are backing up. Multiple databases cannot share the same backup directory.
-
File system at this location supports fcntl lockf
file locking.
For example:
[Mapping]
v_sec_node0001 = pri_bsrv01:/archive/backup
v_sec_node0002 = pri_bsrv02:/archive/backup
v_sec_node0003 = pri_bsrv03:/archive/backup
Mapping to the local host
vbr
does not support using localhost
to specify a backup host. To back up a database node to its own disk, specify the host name with empty square brackets. For example:
[Mapping]
NodeName = []:/backup/path
Mapping to the same database
The following example shows a [Mapping] section that specifies a single node to back up: v_vmart_node0001
. The node is assigned to backup host srv01
and backup directory /home/dbadmin/backups
. Although a single-node cluster is backed up, and the backup host and the database node are the same system, they are specified differently.
Specify the backup host and directory using a colon (:
) as a separator:
[Mapping]
v_vmart_node0001 = srv01:/home/dbadmin/backups
Mapping to an alternative database
Note
Replicating objects to an alternative database requires the
vbr
configuration file to include a
[NodeMapping] section. This section points source nodes to their target database nodes.
To restore an alternative database, add mapping information as follows:
[Mapping]
targetNode = backupHost:backupDir
For example:
[Mapping]
v_sec_node0001 = pri_bsrv01:/archive/backup
v_sec_node0002 = pri_bsrv02:/archive/backup
v_sec_node0003 = pri_bsrv03:/archive/backup
14.4 - [misc]
Configures basic backup settings.
Configures basic backup settings.
Options
passwordFile
- Path name of the password configuration file, ignored if dbUseLocalConnection (under [Database] is set to true.
restorePointLimit
- Number of earlier backups to retain with the most recent backup. If set to 1 (the default), Vertica maintains two backups: the latest backup and the one before it.
Note
vbr
saves multiple backups to the same location, which are shared through hard links. In such cases, the
listbackup task displays the common backup prefix with unique time and date suffixes:
my_archive20111111_205841
Default: 1
snapshotName
- Base name of the backup used in the directory tree structure that
vbr
creates for each node, containing up to 240 characters limited to the following:
-
a–z
-
A–Z
-
0–9
-
Hyphen (-)
-
Underscore (_)
Each iteration in this series (up to restorePointLimit) consists of snapshotName and the backup timestamp. Each series of backups should have a unique and descriptive snapshot name. Full and object-level backups cannot share names. For most vbr
tasks, snapshotName serves as a useful identifier in diagnostics and system tables. For object restore and replication tasks, snapshotName is used to build schema names in coexist mode operations.
Default: snapshotName
tempDir
- Absolute path to a temporary storage area on the cluster nodes. This path must be the same on all database cluster nodes.
vbr
uses this directory as temporary storage for log files, lock files, and other bookkeeping information while it copies files from the source cluster node to the destination backup location. vbr
also writes backup logs to this location.
The file system at this location must support fcntl lockf
(POSIX) file locking.
Caution
Do not use the same location as your database's data or catalog directory. Unexpected files and directories in your data or catalog location can cause errors during database startup or restore.
Default: /tmp/vbr
drop_foreign_constraints
- If true, all foreign key constraints are unconditionally dropped during object-level restore. You can then restore database objects independent of their foreign key dependencies.
Important
Vertica only uses this option if objectRestoreMode
is set to coexist
.
Default: false
enableFreeSpaceCheck
- If true (default) or omitted,
vbr
confirms that the specified backup locations contain sufficient free space to allow a successful backup. If a backup location has insufficient resources, vbr
displays an error message and cancels the backup. If vbr
cannot determine the amount of available space or number of nodes in the backup directory, it displays a warning and continues with the backup.
Default: true
excludeObjects
- Database objects and wildcard patterns to exclude from the set specified by includeObjects. Unicode characters are case-sensitive; others are not.
This parameter can be set only if includeObjects is also set.
hadoop_conf_dir
- (Eon Mode on HDFS with high availability (HA) nodes only) Directory path containing the XML configuration files copied from Hadoop.
If the vbr
operation includes more than one HA HDFS cluster, use a colon-separated list to provide the directory paths to the XML configuration files for each HA HDFS cluster. For example:
hadoop_conf_dir =
path
/
to
/
xml-config-hahdfs1
:
path
/
to
/
xml-config-hahdfs2
This value must match the HadoopConfDir value set in the bootstrapping file created during installation.
includeObjects
- Database objects and wildcard patterns to include with a backup task. You can use this parameter together with excludeObjects. Unicode characters are case-sensitive; others are not.
The includeObjects
and objects parameters are mutually exclusive.
kerberos_keytab_file
- (Eon Mode on HDFS only) Location of the keytab file that contains credentials for the Vertica Kerberos principal.
This value must match the KerberosKeytabFile value set in the bootstrapping file created during installation.
kerberos_realm
- (Eon Mode on HDFS only) Realm portion of the Vertica Kerberos principal.
This value must match the KerberosRealm value set in the bootstrapping file created during installation.
kerberos_service_name
- (Eon Mode on HDFS only) Service name portion of the Vertica Kerberos principal.
This value must match the KerberosServiceName value set in the bootstrapping file created during installation.
Default: vertica
objectRestoreMode
- How
vbr
handles objects of the same name when restoring schema or table backups, one of the following:
-
createOrReplace
: vbr
creates any objects that do not exist. If an object does exist, vbr
overwrites it with the version from the archive.
-
create
: vbr
creates any objects that do not exist and does not replace existing objects. If an object being restored does exist, the restore fails.
-
coexist
: vbr
creates the restored version of each object with a name formatted as follows:backup
_
timestamp
_
objectname
This approach allows existing and restored objects to exist simultaneously. If the appended information pushes the schema name past the maximum length of 128 characters, Vertica truncates the name. You can perform a reverse lookup of the original schema name by querying the system table TRUNCATED_SCHEMATA.
Tables named in the COPY clauses of data loaders are not changed. You can use ALTER DATA LOADER to rename target tables.
In all modes, vbr
restores data with the current epoch. Object restore mode settings do not apply to backups and full restores.
Default: createOrReplace
objects
- For an object-level backup or object replication, object (schema or table) names to include. To specify more than one object, enter multiple names in a comma-delimited list. If you specify no objects,
vbr
creates a full backup.
Important
If your Eon Mode database has multiple
namespaces, you must specify the namespace to which the objects belong. For
vbr
tasks, namespace names are prefixed with a period. For example,
.n.s.t
refers to table
t
in schema
s
in namespace
n
. See
Eon Mode database requirements for more information.
This parameter cannot be used together with the parameters includeObjects and excludeObjects.
You specify objects as follows:
-
Specify table names in the form schema
.
objectname
. For example, to make backups of the table customers
from the schema finance
, enter: finance.customers
If a public table and a schema have the same name, vbr
backs up only the schema. Use the schema
.
objectname
convention to avoid confusion.
-
Object names can include UTF-8 alphanumeric characters. Object names cannot include escape characters, single- ('
) or double-quote ("
) characters.
-
Specify non-alphanumeric characters with a backslash () followed by a hex value. For instance, if the table name is my table
(my
followed by a space character, then table
), enter the object name as follows:
objects=my\20table
-
If an object name includes a period, enclose the name with double quotes.
14.5 - [NodeMapping]
vbr uses the node mapping section exclusively to restore objects from a backup of one database to a different database.
vbr
uses the node mapping section exclusively to restore objects from a backup of one database to a different database. Be sure to update the [Mapping] section of your configuration file to point your target database nodes to their source backup locations. The target database must have at least as many UP nodes as the source database.
Use the following format to specify node mapping:
source_node = target_node
For example, you can use the following mapping to restore content from one 4-node database to an alternate 4-node database.
[NodeMapping]
v_sourcedb_node0001 = v_targetdb_node0001
v_sourcedb_node0002 = v_targetdb_node0002
v_sourcedb_node0003 = v_targetdb_node0003
v_sourcedb_node0004 = v_targetdb_node0004
See Restoring a database to an alternate cluster for a complete example.
14.6 - [transmission]
Sets options for transmitting data when using backup hosts.
Sets options for transmitting data when using backup hosts.
Options
concurrency_backup
- Maximum number of backup TCP rsync connection threads per node. To improve local and remote backup, replication, and copy cluster performance, you can increase the number of threads available to perform backups.
Increasing the number of threads allocates more CPU resources to the backup task and can, for remote backups, increase the amount of bandwidth used. The optimal value for this setting depends greatly on your specific configuration and requirements. Values higher than 16 produce no additional benefit.
Default: 1
concurrency_delete
- Maximum number of delete TCP rsync connections per node. To improve local and remote restore, replication, and copycluster performance, increase the number of threads available to delete files.
Increasing the number of threads allocates more CPU resources to the delete task and can increase the amount of bandwidth used for deletes on remote backups. The optimal value for this setting depends on your specific configuration and requirements.
Default: 16
concurrency_restore
- Maximum number of restore TCP rsync connections per node. To improve local and remote restore, replication, and copycluster performance, increase the number of threads available to perform restores.
Increasing the number of threads allocates more CPU resources to the restore task and can increase the amount of bandwidth used for restores of remote backups. The optimal value for this setting depends greatly on your specific configuration and requirements. Values higher than 16 produce no additional benefit.
Default: 1
copyOnHardLinkFailure
- If a hard-link local backup cannot create links, copy the data instead. Copying takes longer than linking, so the default behavior is to return an error if links cannot be created on any node.
Default: false
encrypt
- Whether transmitted data is encrypted while it is copied to the target backup location. Set this parameter to true only if performing a backup over an untrusted network—for example, backing up to a remote host across the Internet.
Important
Encrypting data transmission causes significant processing overhead and slows transfer. One of the processor cores of each database node is consumed during the encryption process. Use this option only if you are concerned about the security of the network used when transmitting backup data.
Omit this parameter from the configuration file for hard-link local backups. If you set both encrypt and hardLinkLocal to true in the same configuration file, vbr issues a warning and ignores encrypt.
Default: false
hardLinkLocal
- Whether to create a full- or object-level backup using hard file links on the local file system, rather than copying database files to a remote backup host. Add this configuration parameter manually to the Transaction section of the configuration file.
For details on usage, see Full Hardlink Backup/Restore.
Default: false
port_rsync
- Default port number for the rsync protocol. Change this value if the default rsync port is in use on your cluster, or you need rsync to use another port to avoid a firewall restriction.
Default: 50000
serviceAccessUser
- User name used for simple authentication of rsync connections. This user is neither a Linux nor Vertica user name, but rather an arbitrary identifier used by the rsync protocol. If you omit setting this parameter, rsync runs without authentication, which can create a potential security risk. If you choose to save the password, store it in the password configuration file.
total_bwlimit_backup
- Total bandwidth limit in KBps for backup connections. Vertica distributes this bandwidth evenly among the number of connections set in concurrency_backup. The default value of 0 allows unlimited bandwidth.
The total network load allowed by this value is the number of nodes multiplied by the value of this parameter. For example, a three node cluster and a total_bwlimit_backup value of 100 would allow 300Kbytes/sec of network traffic.
Default: 0
total_bwlimit_restore
- Total bandwidth limit in KBps for restore connections. distributes this bandwidth evenly among the number of connections set in concurrency_restore. The default value of 0 allows unlimited bandwidth.
The total network load allowed by this value is the number of nodes multiplied by the value of this parameter. For example, a three node cluster and a total_bwlimit_restore
value of 100 would allow 300Kbytes/sec of network traffic.
Default: 0
14.7 - Password configuration file
For improved security, store passwords in a password configuration file and then restrict read access to that file.
For improved security, store passwords in a password configuration file and then restrict read access to that file. Set the passwordFile parameter in your vbr configuration file to this file.
[passwords] password settings
All password configuration parameters are inside the file's [Passwords] section.
dbPassword
- Database administrator's Vertica password, used if the dbPromptForPassword parameter is false. This parameter is ignored if dbUseLocalConnection is set to true.
dest_dbPassword
- Password for the dest_dbuser Vertica account, for replication tasks only.
serviceAccessPass
- Password for the rsync user account.
Examples
See Password file.