Server-based replication

Server-based replication copies shard data from one Eon Mode database to another.

Eon Mode Only

Server-based replication lets you copy data from one Eon Mode database to another. This replication method copies shard data directly from the source communal storage location to the target communal storage location. Copying native data means there is less overhead than other data copying methods.

You can replicate data between Eon Mode databases in different clouds and on-premises.

Caution

Replication overwrites any data in the target table. You do not receive any notification that the data in the target database will be overwritten, even if the target table's design does not match the source table. Verify that the tables you want to replicate either do not exist in the target database or do not contain data that you need. Be especially cautious when replicating entire schemas to prevent overwriting data in the target database.

Similarly, changes to a replicated table in the target database are overwritten each time you replicate the table. When performing scheduled replications, only grant read access to replicated tables in the target database. Limiting these tables to read-only access will help prevent confusion if a user makes changes to a replicated table.

Requirements

The Eon Mode databases you want to replicate data between must meet the following requirements:

The two databases must have the same shard segmentation. If you need to replicate data between databases with different shard counts, consider changing the shard count in one of the databases. See Change the number of shards in the database.
The user running the commands must have superuser privileges in the source and target databases.
The target database must be able to access the source database and its communal storage. If there are firewalls between the source and target database, configure them to give the target access to the source. Additionaly, the target database must have credentials to access the source database's communal storage. See Configuration.
The two databases must be running compatible versions of the Vertica server. The target database's version of the server has to be the same version or up to one major version higher than the source database.
The table or tables you replicate must not have columns with foreign constraints that are not part of the replication. For example, if you try to replicate the store.store_sales_fact table of the VMart sample database without also replicating the public.date_dimension table, Vertica returns this error:
```
ERROR 3437:  [database_name] Foreign constraint fk_store_sales_date 
of Table store. store_sales_fact referencing public.date_dimension 
detected, but only one table is included in the snapshot
```

Configuration

As mentioned above, your target database must be able to access the source database's communal storage. The necessary configuration steps depend on the type of communal storage used by the source database:

For S3-based storage (including S3-compatible systems such as FlashBlade), create a separate configuration for the source communal storage. See Per-bucket S3 configurations for more information.
For other systems, such as Google Cloud Storage (GCS), that use identities to control access, give the target database's identity access to the source database's communal storage. For example, if both databases use GCS storage, grant the identity the target database uses to access its own communal storage access to the source communal storage. If the databases do not use the same type of storage, configure the target database with the credentials necessary to access the source storage. See Azure Blob Storage object store and Google Cloud Storage (GCS) object store for instructions on how to set up authentication for these object stores.

If you are attempting to replicate data from an on-premises database to a cloud-based database, you must create a connection that grants your cloud access to your on-premises data store.

To secure the connections between your databases, you must secure two separate connections:

Before transferring data using client-server and command channels, you must either connect the target database to the source database or the source database to the target database. See Configuring client-server TLS and Control channel Spread TLS for instructions on encrypting these connections.
You can encrypt the connections that transfer the data between the communal data stores by setting parameters in the target database for the object store used by the source database. These parameters are AWSEnableHttps for S3 (and compatible), AzureStorageEndpointConfig for Azure, and GCSEnableHttps for GCS.

Replication steps

Follow these steps to replicate data between two Eon Mode databases:

Connect to the target or source database and log in as a user with superuser privileges.
Connect to the other database using the CONNECT TO VERTICA statement.
Use the REPLICATE statement, passing it the name of the table to replicate, or use the INCLUDE and EXCLUDE options to select several tables or schemas. The format of the replication call depends on whether you connected to the target or source database in step 1:
- Connected to target: specify the source database in the REPLICATE statement using a FROM source_db clause.
- Connected to source: specify the target database in the REPLICATE statement using a TO target_db clause.
Both of these methods copy the shard data directly from the source communal storage location to the target communal storage location.
Optionally, use the DISCONNECT statement to disconnect.

This example, initiated on the target database, replicates a table named customers from the source database named verticadb:

=> CONNECT TO VERTICA source_db USER dbadmin 
           PASSWORD 'mypassword' ON 'vertica_node01', 5433;
 CONNECT

=> REPLICATE customers FROM source_db;
REPLICATE

The following statements execute the same data replication as the previous example, but this replication is initiated from the source database instead of the target database:

=> CONNECT TO VERTICA target_db USER dbadmin 
           PASSWORD 'mypassword' ON 'vertica_node01', 5433;
 CONNECT

=> REPLICATE customers TO target_db;
REPLICATE

To replicate more than one table at a time, use the INCLUDE option to include tables or schemas using wildcards. This example replicates all tables in the public schema starting with the letter t:

=> REPLICATE INCLUDE "public.t*" FROM source_db;

Use the EXCLUDE option with INCLUDE to limit the tables that Vertica replicates. This example replicates all tables in the public schema except those that start with the string customer_.

=> REPLICATE INCLUDE "public.*" EXCLUDE '*.customer_*' FROM source_db;

Scheduled replication

You can use replication to keep one or more tables in the target database up-to-date with tables in the source database by scheduling automatic replications. When you call the REPLICATE statement repeatedly, it automatically determines what has changed in the source table since the last replication and only transfers the data that has changed in the source database.

Important

Scheduled replication between the main cluster and a sandboxed cluster is non-incremental. Each REPLICATE statement between these clusters runs the full replication, regardless of previous REPLICATE calls.

When determining the interval between scheduled replications, you should consider the amount of data being transferred, the resources assigned to the Tuple Mover, and data transfer concurrency. If replicating across platforms, you should also take into account your network bandwidth. Based on these factors, you can tune your resources and environment to find an optimal interval between scheduled replications.

Note

Frequent calls to REPLICATE can increase the size of the database's catalog. A larger catalog can increase the amount of metadata your database must process for each query. This added overhead can affect performance.

In many cases, the best option is to balance the time sensitivity of the queries in the target database that use the replicated data with the amount of data that each replication must process. After you decide how often to run replications, create a script to perform the replications. The script simply needs to follow the steps in Replication steps to replicate the table a single time. Then set up a scheduled task to run the script.

Monitoring replication

The V_MONITOR.REPLICATION_STATUS system table in the target database shows information about database replications. You can use it to monitor ongoing replications and the results of prior replications:

=> SELECT node_name, status, transaction_id FROM REPLICATION_STATUS 
   ORDER BY start_time ASC;
       node_name       |      status       |  transaction_id   
-----------------------+-------------------+-------------------
 v_target_db_node0001  | completed         | 45035996273706044
 v_target_db_node0002  | completed         | 45035996273706044
 v_target_db_node0003  | completed         | 45035996273706044
 v_target_db_node0001  | completed         | 45035996273706070
 v_target_db_node0002  | completed         | 45035996273706070
 v_target_db_node0003  | completed         | 45035996273706070
 v_target_db_node0002  | data transferring | 45035996273706136
 v_target_db_node0003  | data transferring | 45035996273706136
 v_target_db_node0001  | data transferring | 45035996273706136
(9 rows)