This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Setting up backup locations

Full and object-level backups reside on backup hosts, the computer systems on which backups and archives are stored.

1: Configuring backup hosts and connections
2: Configuring hard-link local backup hosts
3: Configuring backups to and from cloud storage
4: Additional considerations for cloud storage
5: Configuring backups to and from HDFS

Caution

It is important to secure backup locations and strictly limit access to backups to users who are already permitted to access all data in the database. Compromising a backup means compromising the database.

Full and object-level backups reside on backup hosts, the computer systems on which backups and archives are stored. On the backup hosts, Vertica saves backups in a specific backup location (directory).

You must set up your backup hosts before you can create backups.

The storage format type at your backup locations must support fcntl lockf (POSIX) file locking.

1 - Configuring backup hosts and connections

You use vbr to back up your database to one or more hosts (known as backup hosts) that can be outside of your database cluster.

You use vbr to back up your database to one or more hosts (known as backup hosts) that can be outside of your database cluster.

You can use one or more backup hosts or a single cloud storage bucket to back up your database. Use the vbr configuration file to specify which backup host each node in your cluster should use.

Before you back up to hosts outside of the local cluster, configure the target backup locations to work with vbr. The backup hosts you use must:

Have sufficient backup disk space.
Be accessible from your database cluster through SSH.
Have passwordless SSH access for the Database Administrator account.
Have either the Vertica rpm or Python 3.7 and rsync 3.0.5 or later installed.
If you are using a stateful firewall, configure your tcp_keepalive_time and tcp_keepalive_intvl sysctl settings to use values less than your firewall timeout value.

Configuring TCP forwarding on database hosts

vbr depends on TCP forwarding to forward connections from database hosts to backup hosts. For copycluster and replication tasks, you must enable TCP forwarding on both sets of hosts. SSH connections to backup hosts do not require SSH forwarding.

If it is not already set by default, set AllowTcpForwarding = Yes in /etc/ssh/sshd_config and then send a SIGHUP signal to sshd on each host. See the Linux sshd documentation for more information.

If TCP forwarding is not enabled, tasks requiring it fail with the following message: "Errors connecting to remote hosts: Check SSH settings, and that the same Vertica version is installed on all nodes."

On a single-node cluster, vbr uses a random high-number port to create a local ssh tunnel. This fails if PermitOpen is set to restrict the port. Comment out the PermitOpen line in sshd_config.

Creating configuration files for backup hosts

Create separate configuration files for full or object-level backups, using distinct names for each configuration file. Also, use the same node, backup host, and directory location pairs. Specify different backup directory locations for each database.

Note

For optimal network performance when creating a backup, Vertica recommends that you give each node in the cluster its own dedicated backup host.

Preparing backup host directories

Before vbr can back up a database, you must prepare the target backup directory. Run vbr with a task type of init to create the necessary manifests for the backup process. You need to perform the init process only once. After that, Vertica maintains the manifests automatically.

Estimating backup host disk requirements

Wherever you plan to save data backups, consider the disk requirements for historical backups at your site. Also, if you use more than one archive, multiple archives potentially require more disk space. Vertica recommends that each backup host have space for at least twice the database node footprint size. Follow this recommendation regardless of the specifics of your site's backup schedule and retention requirements.

To estimate the database size, use the used_bytes column of the storage_containers system table as in the following example:

=> SELECT SUM(used_bytes) FROM storage_containers WHERE node_name='v_mydb_node0001';
total_size
------------
  302135743
(1 row)

Making backup hosts accessible

You must verify that any firewalls between the source database nodes and the target backup hosts allow connections for SSH and rsync on port 50000.

The backup hosts must be running identical versions of rsync and Python as those supplied in the Vertica installation package.

Setting up passwordless SSH access

For vbr to access a backup host, the database superuser must meet two requirements:

Have an account on each backup host, with write permissions to the backup directory.
Have passwordless SSH access from each database cluster host to the corresponding backup host.

How you fulfill these requirements depends on your platform and infrastructure.

SSH access among the backup hosts and access from the backup host to the database node is not necessary.

If your site does not use a centralized login system (such as LDAP), you can usually add a user with the useradd command or through a GUI administration tool. See the documentation for your Linux distribution for details.

If your platform supports it, you can enable passwordless SSH logins using the ssh-copy-id command to copy a database administrator's SSH identity file to the backup location from one of your database nodes. For example, to copy the SSH identity file from a node to a backup host named backup01:

$ ssh-copy-id -i dbadmin@backup01|
Password:

Try logging into the machine with "ssh dbadmin@backup01". Then, check the contents of the ~/.ssh/authorized_keysfile to verify that you have not added extra keys that you did not intend to include.

$ ssh backup01
Last login: Mon May 23 11:44:23 2011 from host01

Repeat the steps to copy a database administrator's SSH identity to all backup hosts you use to back up your database.

After copying a database administrator's SSH identity, you should be able to log in to the backup host from any of the nodes in the cluster without being prompted for a password.

Increasing the SSH maximum connection settings for a backup host

If your configuration requires backing up multiple nodes to one backup host (n:1), increase the number of concurrent SSH connections to the SSH daemon (sshd). By default, the number of concurrent SSH connections on each host is 10, as set in the sshd_config file with the MaxStartups keyword. The MaxStartups value for each backup host should be greater than the total number of hosts being backed up to this backup host. For more information on configuring MaxStartups, refer to the man page for that parameter.

2 - Configuring hard-link local backup hosts

When specifying the backupHost parameter for your hard-link local configuration files, use the database host names (or IP addresses) as known to admintools.

When specifying the backupHost parameter for your hard-link local configuration files, use the database host names (or IP addresses) as known to admintools. Do not use the node names. Host names (or IP addresses) are what you used when setting up the cluster. Do not use localhost for the backupHost parameter.

Listing host names

To query node names and host names:

=> SELECT node_name, host_name FROM node_resources;
    node_name     |   host_name 
------------------+----------------
 v_vmart_node0001 | 192.168.223.11
 v_vmart_node0002 | 192.168.223.22
 v_vmart_node0003 | 192.168.223.33
(3 rows)

Because you are creating a local backup, use square brackets [ ] to map the host to the local host. For more information, refer to [mapping].

[Mapping]
v_vmart_node0001 = []:/home/dbadmin/data/backups
v_vmart_node0002 = []:/home/dbadmin/data/backups
v_vmart_node0003 = []:/home/dbadmin/data/backups

3 - Configuring backups to and from cloud storage

Backing up an Enterprise Mode or Eon Mode database to a supported cloud storage location requires that you add parameters to the backup configuration file.

Backing up an Enterprise Mode or Eon Mode database to a supported cloud storage location requires that you add parameters to the backup configuration file. You can create these backups from the local cluster or from your cloud provider's virtual servers. Some additional configuration is required.

See Additional considerations for cloud storage for information about configuring authentication and encryption.

Configuration file requirements

To back up any Eon Mode or Enterprise Mode cluster to a cloud storage destination, the backup configuration file must include a [CloudStorage] section. Vertica provides a sample cloud storage configuration file that you can copy and edit.

Environment variable requirements

Environment variables securely pass credentials for backup locations. Eon and Enterprise Mode databases require environment variables in the following backup scenarios:

Vertica on Google Cloud Platform (GCP) to Google Cloud Storage (GCS).

For backups to GCS, you must have a hash-based message authentication code (HMAC) key that contains an access ID and a secret. See Eon Mode on GCP prerequisites for instructions on how to create your HMAC key.
On-premises databases to any of the following storage locations:
- Amazon Web Services (AWS)
- Any S3-compatible storage
- Azure Blob Storage (Enterprise Mode only)
On-premises database backups require you to pass your credentials with environment variables. You cannot use other methods of credentialing with cross-endpoint backups.
Any Azure user environment that does not manage resources with Azure managed identities.

The vbr log captures when you sent an environment variable. For security purposes, the value that the environment variable represents is not logged. For details about checking vbr logs, see Troubleshooting backup and restore.

Enterprise Mode and Eon Mode

All Enterprise Mode and Eon Mode databases require the environment variables described in the following table:

Environment Variable	Description
`VBR_BACKUP_STORAGE_ACCESS_KEY_ID`	Credentials for the backup location.
`VBR_BACKUP_STORAGE_SECRET_ACCESS_KEY`	Credentials for the backup location.
`VBR_BACKUP_STORAGE_ENDPOINT_URL`	The endpoint for the on-premises S3 backup location, includes the scheme HTTP or HTTPS. Important Do not set this variable for backup locations on AWS or GCS.

Eon Mode only

Eon Mode databases require the environment variables described in the following table:

Environment Variable	Description
`VBR_COMMUNAL_STORAGE_ACCESS_KEY_ID`	Credentials for the communal storage location.
`VBR_COMMUNAL_STORAGE_SECRET_ACCESS_KEY`	Credentials for the communal storage location.
`VBR_COMMUNAL_STORAGE_ENDPOINT_URL`	The endpoint for the communal storage, includes the scheme HTTP or HTTPS. Important Do not set this variable for backup locations on GCS.

Azure Blob Storage only

If the user environment does not manage resources with Azure-managed identities, you must provide credentials with environment variables. If you set environment variables in an environment that uses Azure-managed identities, credentials set with environment variables take precedence over Azure-managed identity credentials.

You can back up and restore between two separate Azure accounts. Cross-account operations require a credential configuration JSON object and an endpoint configuration JSON object for each account. Each environment variable accepts a collection of one or more comma-separated JSON objects.

Cross-account and cross-region backup and restore operations might result in decreased performance. For details about performance and cost, see the Azure documentation.

The Azure Blob Storage environment variables are described in the following table:

Environment Variable Description

Environment Variable	Description
`VbrCredentialConfig`	Credentials for the backup location. Each JSON object requires values for the following keys: `accountName`: Name of the storage account. `blobEndpoint`: Host address and optional port for the endpoint to use as the backup location. `accountKey`: Access key for the account. `sharedAccessSignature`: A token that provides access to the backup endpoint.
`VbrEndpointConfig`	The endpoint for the backup location. To backup and restore between two separate Azure accounts, provide each set of endpoint information as a JSON object. Each JSON object requires values for the following keys: `accountName`: Name of the storage account. `blobEndpoint`: Host address and optional port for the endpoint to use as the backup location. `protocol`: HTTPS (default) or HTTP. `isMultiAccountEndpoint`: Boolean (by default false), indicates whether `blobEndpoint` supports multiple accounts

VbrCredentialConfig

Credentials for the backup location. Each JSON object requires values for the following keys:

accountName: Name of the storage account.
blobEndpoint: Host address and optional port for the endpoint to use as the backup location.
accountKey: Access key for the account.
sharedAccessSignature: A token that provides access to the backup endpoint.

VbrEndpointConfig

The endpoint for the backup location. To backup and restore between two separate Azure accounts, provide each set of endpoint information as a JSON object.

Each JSON object requires values for the following keys:

accountName: Name of the storage account.
blobEndpoint: Host address and optional port for the endpoint to use as the backup location.
protocol: HTTPS (default) or HTTP.
isMultiAccountEndpoint: Boolean (by default false), indicates whether blobEndpoint supports multiple accounts

The following commands export the Azure Blob Storage environment variables to the current shell session:

$ export VbrCredentialConfig=[{"accountName": "account1","blobEndpoint": "host[:port]","accountKey": "account-key1","sharedAccessSignature": "sas-token1"}]
$ export VbrEndpointConfig=[{"accountName": "account1", "blobEndpoint": "host[:port]", "protocol": "http"}]

4 - Additional considerations for cloud storage

If you are backing up to a supported cloud storage location, you need to do some additional one-time configuration.

If you are backing up to a supported cloud storage location, you need to do some additional one-time configuration. You must also take additional steps if the cluster you are backing up is running on instances in the cloud. For Amazon Web Services (AWS), you might choose to encrypt your backups, which requires additional steps.

By default, bucket access is restricted to the communal storage bucket. For one-time operations with other buckets like backing up and restoring the database, use the appropriate credentials. See Google Cloud Storage parameters and S3 parameters for additional information.

Configuring cloud storage for backups

As with any storage location, you must initialize a cloud storage location with the vbr task init.

Because cloud storage does not support file locking, Vertica uses either your local file system or the cloud storage file system to handle file locks during a backup. You identify this location using the cloud_storage_backup_file_system_path parameter in your vbr configuration file. During a backup, Vertica creates a locked identity file on your local or cloud instance, and a duplicate file in your cloud storage backup location. As long at the files match, Vertica proceeds with the backup, releasing the lock when the backup is complete. As long as the files remain identical, you can use the cloud storage location for backup and restore tasks.

If the files in your locking location become out of sync with the files in your backup location, backup and restore tasks fail with an error message. You can resolve locking inconsistencies by rerunning the init task with the --cloud-force-init parameter:

$ /opt/vertica/bin/vbr --task init --cloud-force-init -c filename.ini

Note

If a backup fails, confirm that your Vertica cluster has permission to access your cloud storage location.

Configuring authentication for Google Cloud Storage

If you are backing up to Google Cloud Storage (GCS) from a Google Cloud Platform-based cluster, you must provide authentication to the GCS communal storage location. Set the environment variables as detailed in Configuring backups to and from cloud storage to authenticate to your GCS storage.

See Eon Mode on GCP prerequisites for additional authentication information, including how to create your hash-based message authentication code (HMAC) key.

Configuring EC2 authentication for Amazon S3

If you are backing up to S3 from an EC2-based cluster, you must provide authentication to your S3 host. Regardless of the authentication type you choose, your credentials do not leave your EC2 cluster. Vertica supports the following authentication types:

AWS credential file
Environment variables
IAM role

AWS credential file - You can manually create a configuration file on your EC2 initiator host at ~/.aws/credentials.

[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY

For more information on credential files, refer to Amazon Web Services documentation.

Environment variables - Amazon Web Services provides the following environment variables:

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY

Use these variables on your initiator to provide authentication to your S3 host. When your session ends, AWS deletes these variables. For more information, refer to the AWS documentation.

IAM role - Create an AWS IAM role and grant that role permission to access your EC2 cluster and S3 resources. This method is recommended for managing long-term access. For more information, refer to Amazon Web Services documentation.

Encrypting backups on Amazon S3

Backups made to Amazon S3 can be encrypted using native server-side S3 encryption capability. For more information on Amazon S3 encryption, refer to Amazon documentation.

Note

Vertica supports server-side encryption only. Client-side encryption is not supported.

Vertica supports the following forms of S3 encryption:

Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)
- Encrypts backups with AES-256
- Amazon manages encryption keys
Server-Side Encryption with AWS KMS-Managed Keys (SSE-KMS)
- Encrypts backups with AES-256
- Requires an encryption key from Amazon Key Management Service
- Your S3 bucket must be from the same region as your encryption key
- Allows auditing of user activity

When you enable encryption of your backups, Vertica encrypts backups as it creates them. If you enable encryption after creating an initial backup, only increments added after you enabled encryption are encrypted. To ensure that your backup is entirely encrypted, create new backups after enabling encryption.

To enable encryption, add the following settings to your configuration file:

cloud_storage_encrypt_transport: Encrypts your backups during transmission. You must enable this parameter if you are using SSE-KMS encryption.
cloud_storage_encrypt_at_rest: Enables encryption of your backups. If you enable encryption and do not provide a KMS key, Vertica uses SSE-S3 encryption.
cloud_storage_sse_kms_key_id: If you are using KMS encryption, use this parameter to provide your key ID.

See [CloudStorage] for more information on these settings.

The following example shows a typical configuration for KMS encryption of backups.


[CloudStorage]
cloud_storage_encrypt_transport = True
cloud_storage_encrypt_at_rest = sse
cloud_storage_sse_kms_key_id = 6785f412-1234-4321-8888-6a774ba2aaaa

5 - Configuring backups to and from HDFS

To back up an Eon Mode database that uses HDFS on-premises storage, the communal storage and backup location must use the same HDFS credentials and domain.

Eon Mode only

To back up an Eon Mode database that uses HDFS on-premises storage, the communal storage and backup location must use the same HDFS credentials and domain. All vbr operations are supported, except copycluster.

Vertica supports Kerberos authentication, High Availability Name Node, and TLS (wire encryption) for vbr operations.

Creating a cloud storage configuration file

To back up Eon Mode on-premises with communal storage on HDFS, you must provide a backup configuration file. In the [CloudStorage] section, provide the cloud_storage_backup_path and cloud_storage_backup_file_system_path values.

If you use Kerberos authentication or High Availability NameNode with your Hadoop cluster, the vbr utility requires access to the same values set in the bootstrapping file that you created during the database install. Include these values in the [misc] section of the backup file.

The following table maps the vbr configuration option to its associated bootstrap file parameter:

vbr Configuration Option	Bootstrap File Parameter
kerberos_service_name	KerberosServiceName
kerberos_realm	KerberosRealm
kerberos_keytab_file	KerberosKeytabFile
hadoop_conf_dir	HadoopConfDir

For example, if KerberosServiceName is set to principal-name in the bootstrap file, set kerberos_service_name to principal-name in the [Misc] section of your configuration file.

Encryption between communal storage and backup locations

Vertica supports vbr operations using wire encryption between your communal storage and backup locations. Use the cloud_storage_encrypt_transport parameter in the [CloudStorage] section of your backup configuration file to configure encryption.

To enable encryption:

Set cloud_storage_encrypt_transport to true.
Use the swebhdfs:// protocol for cloud_storage_backup_path.

If you do not use encryption:

Set cloud_storage_encrypt_transport to false.
Use the webhdfs:// protocol for cloud_storage_backup_path.

Vertica does not support at-rest encryption for Hadoop storage.

Setting up backup locations

Caution

1 - Configuring backup hosts and connections

Configuring TCP forwarding on database hosts

Creating configuration files for backup hosts

Note

Preparing backup host directories

Estimating backup host disk requirements

Making backup hosts accessible

Setting up passwordless SSH access

Increasing the SSH maximum connection settings for a backup host

See also

2 - Configuring hard-link local backup hosts

Listing host names

3 - Configuring backups to and from cloud storage

Configuration file requirements

Environment variable requirements

Enterprise Mode and Eon Mode

Important

Eon Mode only

Important

Azure Blob Storage only

4 - Additional considerations for cloud storage

Configuring cloud storage for backups

Note

Configuring authentication for Google Cloud Storage

Configuring EC2 authentication for Amazon S3

Encrypting backups on Amazon S3

Note

5 - Configuring backups to and from HDFS

Creating a cloud storage configuration file

Encryption between communal storage and backup locations