Server-based replication

Server-based replication copies shard data from one Eon Mode database to another.

Eon Mode Only

Server-based replication lets you copy data from one Eon Mode database to another. This replication method copies shard data directly from the source communal storage location to the target communal storage location. Copying native data means there is less overhead than other data copying methods.

You can replicate data between Eon Mode databases in different clouds and on-premises.

Caution

Replication overwrites any data in the target table. You do not receive any notification that the data in the target database will be overwritten, even if the target table's design does not match the source table. Verify that the tables you want to replicate either do not exist in the target database or do not contain data that you need. Be especially cautious when replicating entire schemas to prevent overwriting data in the target database.

Similarly, changes to a replicated table in the target database are overwritten each time you replicate the table. When performing scheduled replications, only grant read access to replicated tables in the target database. Limiting these tables to read-only access will help prevent confusion if a user makes changes to a replicated table.

Requirements

The Eon Mode databases you want to replicate data between must meet the following requirements:

The two databases must have the same shard segmentation. If you need to replicate data between databases with different shard counts, consider changing the shard count in one of the databases. See Change the number of shards in the database.
The replication commands must be run on the target database.
The user running the commands must have superuser privileges in the source and target databases.
The target database must be able to access the source database and its communal storage. If there are firewalls between the source and target database, configure them to give the target access to the source. Additionaly, the target database must have credentials to access the source database's communal storage. See Configuration.
The two databases must be running compatible versions of the Vertica server. The target database's version of the server has to be the same version or up to one major version higher than the source database.
The table or tables you replicate must not have columns with foreign constraints that are not part of the replication. For example, if you try to replicate the store.store_sales_fact table of the VMart sample database without also replicating the public.date_dimension table, Vertica returns this error:
```
ERROR 3437:  [database_name] Foreign constraint fk_store_sales_date 
of Table store. store_sales_fact referencing public.date_dimension 
detected, but only one table is included in the snapshot
```

Configuration

As mentioned above, your target database must be able to access the source database and its communal storage. The necessary configuration steps depend on the type of communal storage used by the source database:

For S3-based storage (including S3-compatible systems such as FlashBlade), create a separate configuration for the source communal storage. See Per-bucket S3 configurations for more information.
For other systems, such as Google Cloud Storage (GCS), that use identities to control access, give the target database's identity access to the source database's communal storage. For example, if both databases use GCS storage, grant the identity the target database uses to access its own communal storage access to the source communal storage. If the databases do not use the same type of storage, configure the target database with the credentials necessary to access the source storage. See Azure Blob Storage object store and Google Cloud Storage (GCS) object store for instructions on how to set up authentication for these object stores.

If you are attempting to replicate data from an on-premises database to a cloud-based database, you must create a connection that grants your cloud access to your on-premises data store.

To secure the connections between your databases, you must secure two separate connections:

The target database connects to the source database to issue commands before transferring data using both client-server and command channels. See Configuring client-server TLS and Control channel Spread TLS for instructions on encrypting these connections.
You can encrypt the connections that transfer the data between the communal data stores by setting parameters in the target database for the object store used by the source database. These parameters are AWSEnableHttps for S3 (and compatible), AzureStorageEndpointConfig for Azure, and GCSEnableHttps for GCS.

Replication steps

Follow these steps to replicate data between two Eon Mode databases:

Connect to the target database and log in as a user with superuser privileges.
Connect to the source database using the CONNECT TO VERTICA statement.
Use the REPLICATE statement, passing it the name of the table to replicate, or use the INCLUDE and EXCLUDE options to select several tables or schemas to replicate.
Optionally, use the DISCONNECT statement to disconnect.

This example replicates a table named customers from the database named verticadb:

=> CONNECT TO VERTICA verticadb USER dbadmin 
           PASSWORD 'mypassword' ON 'vertica_node01', 5433;
 CONNECT
=> REPLICATE customers FROM verticadb;
REPLICATE

To replicate more than one table at a time, use the INCLUDE option to include tables or schemas using wildcards. This example replicates all tables in the public schema starting with the letter t:

=> REPLICATE INCLUDE "public.t*" FROM verticadb;

Use the EXCLUDE option with INCLUDE to limit the tables that Vertica replicates. This example replicates all tables in the public schema except those that start with the string customer_.

=> REPLICATE INCLUDE "public.*" EXCLUDE '*.customer_*' FROM verticadb;

Scheduled replication

You can use replication to keep one or more tables in the target database up-to-date with tables in the source database by scheduling automatic replications. When you call the REPLICATE statement repeatedly, it automatically determines what has changed in the source table since the last replication and only transfers the data that has changed in the source database.

Note

Frequent calls to REPLICATE can increase the size of the database's catalog. A larger catalog can increase the amount of metadata your database must process for each query. This added overhead can affect performance.

When determining the interval between scheduled replications, you should consider the amount of data being transferred, the resources assigned to the Tuple Mover, and data transfer concurrency. If replicating across platforms, you should also take into account your network bandwidth. Based on these factors, you can tune your resources and environment to find an optimal interval between scheduled replications.

In many cases, the best option is to balance the time sensitivity of the queries in the target database that use the replicated data with the amount of data that each replication must process. After you decide how often to run replications, create a script to perform the replications. The script simply needs to follow the steps in Replication steps to replicate the table a single time. Then set up a scheduled task to run the script.

Monitoring replication

The V_MONITOR.REPLICATION_STATUS system table in the target database shows information about database replications. You can use it to monitor ongoing replications and the results of prior replications:

=> SELECT node_name, status, transaction_id FROM REPLICATION_STATUS 
   ORDER BY start_time ASC;
       node_name       |      status       |  transaction_id   
-----------------------+-------------------+-------------------
 v_verticadb2_node0001 | completed         | 45035996273706044
 v_verticadb2_node0002 | completed         | 45035996273706044
 v_verticadb2_node0003 | completed         | 45035996273706044
 v_verticadb2_node0001 | completed         | 45035996273706070
 v_verticadb2_node0002 | completed         | 45035996273706070
 v_verticadb2_node0003 | completed         | 45035996273706070
 v_verticadb2_node0002 | data transferring | 45035996273706136
 v_verticadb2_node0003 | data transferring | 45035996273706136
 v_verticadb2_node0001 | data transferring | 45035996273706136
(9 rows)