This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Replication

Replication copies data in native format between Vertica databases.

Replication copies data from one Vertica database to another in native format. Usually, you use replication to keep a table or schema in the target database up to date with corresponding table or schema in the source database. Replication automatically determines what has changed in the source database and only transfers those changes to the target database.

Replication is also useful for disaster recovery (DR) and isolating workloads. For example:

  • Keep a "hot spare" Vertica database that can fill in if the primary database goes down.
  • Give geographically separated teams their own local databases.
  • Fully isolate ETL workloads from analytics workloads by giving each their own Vertica database.

Vertica supports two forms of replication:

  • vbr-based replication that uses the backup script to copy ROS containers from one database to another.
  • Server-based replication that copies shards directly from one database's communal storage to another.

vbr-based replication

The vbr-replication method works between either Eon Mode or Enterprise Mode databases. You cannot replicate data between databases running in different modes or cloud platforms. The databases must have the same number of nodes (when in Enterprise Mode) or primary nodes (in Eon Mode). If replicating between Eon Mode databases, the databases must have the same shard segmentation and primary subcluster node subscriptions.

For more information about vbr-based replication, see Replicating objects to another database cluster.

Server-based replication

Server-based replication transfers data between Eon Mode databases by copying shard files one communal storage location to another. This method supports Eon Mode databases in different clouds or on-premises. The target database must have access to the source database's communal storage, and the shard count and shard segmentation of the source and target namespaces must be the same.

Choosing between vbr-based and server-based replication

As server-based replication only works between Eon Mode databases, you only have to choose between the two types of replication if both the source and target database run in Eon Mode.

When choosing between the two modes, consider:

  • Server-based replication is usually more efficient because it works within Vertica. Also, unlike vbr-based replication, server-based replication does not use SSH tunneling for encryption, avoiding potential security risks.
  • For vbr-based replication, your databases must have more in common than databases using server-based replication. For example, in vbr-based replication, the source and target databases must have the same shard segmentation, primary subcluster node subscriptions, communal storage type, and number of nodes in their primary subcluster. For server-based replication, the only requirement is that the two databases have the same shard segmentation.

1 - Server-based replication

Server-based replication copies shard data from one Eon Mode database to another.

Eon Mode Only

Server-based replication lets you copy data from one Eon Mode database to another. This replication method copies shard data directly from the source communal storage location to the target communal storage location. Copying native data means there is less overhead than other data copying methods.

You can replicate data between Eon Mode databases in different clouds and on-premises.

Requirements

The Eon Mode databases you want to replicate data between must meet the following requirements:

  • The source and target namespaces of the replicated tables must have the same shard count and segmentation. The REPLICATE command accepts an optional target namespace parameter that specifies the namespace to which the tables are replicated in the target cluster. If you do not specify a target namespace, objects are replicated to a namespace with the same name as the source namespace. If no such namespace exists in the target cluster, it is created with the same name and shard count as the source namespace.

  • The user running the commands must have superuser privileges in the source and target databases.

  • The target database must be able to access the source database and its communal storage. If there are firewalls between the source and target database, configure them to give the target access to the source. Additionaly, the target database must have credentials to access the source database's communal storage. See Configuration.

  • The two databases must be running compatible versions of the Vertica server. The target database's version of the server has to be the same version or up to one major version higher than the source database.

  • The table or tables you replicate must not have columns with foreign constraints that are not part of the replication. For example, if you try to replicate the store.store_sales_fact table of the VMart sample database without also replicating the public.date_dimension table, Vertica returns this error:

    ERROR 3437:  [database_name] Foreign constraint fk_store_sales_date 
    of Table store. store_sales_fact referencing public.date_dimension 
    detected, but only one table is included in the snapshot
    

Configuration

As mentioned above, your target database must be able to access the source database's communal storage. The necessary configuration steps depend on the type of communal storage used by the source database:

  • For S3-based storage (including S3-compatible systems such as FlashBlade), create a separate configuration for the source communal storage. See Per-bucket S3 configurations for more information.
  • For other systems, such as Google Cloud Storage (GCS), that use identities to control access, give the target database's identity access to the source database's communal storage. For example, if both databases use GCS storage, grant the identity the target database uses to access its own communal storage access to the source communal storage. If the databases do not use the same type of storage, configure the target database with the credentials necessary to access the source storage. See Azure Blob Storage object store and Google Cloud Storage (GCS) object store for instructions on how to set up authentication for these object stores.

If you are attempting to replicate data from an on-premises database to a cloud-based database, you must create a connection that grants your cloud access to your on-premises data store.

To secure the connections between your databases, you must secure two separate connections:

  • Before transferring data using client-server and command channels, you must either connect the target database to the source database or the source database to the target database. See Configuring client-server TLS and Control channel Spread TLS for instructions on encrypting these connections.
  • You can encrypt the connections that transfer the data between the communal data stores by setting parameters in the target database for the object store used by the source database. These parameters are AWSEnableHttps for S3 (and compatible), AzureStorageEndpointConfig for Azure, and GCSEnableHttps for GCS.

Replication steps

Follow these steps to replicate data between two Eon Mode databases:

  1. Connect to the target or source database and log in as a user with superuser privileges.
  2. Connect to the other database using the CONNECT TO VERTICA statement.
  3. Use the REPLICATE statement, passing it the name of the table to replicate, or use the INCLUDE and EXCLUDE options to select several tables or schemas. The format of the replication call depends on whether you connected to the target or source database in step 1:
* Connected to target: specify the source database in the REPLICATE statement using a `FROM` *`source_db`* clause.
* Connected to source: specify the target database in the REPLICATE statement using a `TO` *`target_db`* clause.

Both of these methods copy the shard data directly from the source communal storage location to the target communal storage location.
  1. Optionally, use the DISCONNECT statement to disconnect.

This example, initiated on the target database, replicates a table named customers from the source database named verticadb:

=> CONNECT TO VERTICA source_db USER dbadmin 
           PASSWORD 'mypassword' ON 'vertica_node01', 5433;
 CONNECT

=> REPLICATE customers FROM source_db;
REPLICATE

The following statements execute the same data replication as the previous example, but this replication is initiated from the source database instead of the target database:

=> CONNECT TO VERTICA target_db USER dbadmin 
           PASSWORD 'mypassword' ON 'vertica_node01', 5433;
 CONNECT

=> REPLICATE customers TO target_db;
REPLICATE

To replicate more than one table at a time, use the INCLUDE option to include tables or schemas using wildcards. This example replicates all tables in the public schema starting with the letter t:

=> REPLICATE INCLUDE "public.t*" FROM source_db;

Use the EXCLUDE option with INCLUDE to limit the tables that Vertica replicates. This example replicates all tables in the public schema except those that start with the string customer_.

=> REPLICATE INCLUDE "public.*" EXCLUDE '*.customer_*' FROM source_db;

Scheduled replication

You can use replication to keep one or more tables in the target database up-to-date with tables in the source database by scheduling automatic replications. When you call the REPLICATE statement repeatedly, it automatically determines what has changed in the source table since the last replication and only transfers the data that has changed in the source database.

When determining the interval between scheduled replications, you should consider the amount of data being transferred, the resources assigned to the Tuple Mover, and data transfer concurrency. If replicating across platforms, you should also take into account your network bandwidth. Based on these factors, you can tune your resources and environment to find an optimal interval between scheduled replications.

In many cases, the best option is to balance the time sensitivity of the queries in the target database that use the replicated data with the amount of data that each replication must process. After you decide how often to run replications, create a script to perform the replications. The script simply needs to follow the steps in Replication steps to replicate the table a single time. Then set up a scheduled task to run the script.

Monitoring replication

The V_MONITOR.REPLICATION_STATUS system table in the target database shows information about database replications. You can use it to monitor ongoing replications and the results of prior replications:

=> SELECT node_name, status, transaction_id FROM REPLICATION_STATUS 
   ORDER BY start_time ASC;
       node_name       |      status       |  transaction_id   
-----------------------+-------------------+-------------------
 v_target_db_node0001  | completed         | 45035996273706044
 v_target_db_node0002  | completed         | 45035996273706044
 v_target_db_node0003  | completed         | 45035996273706044
 v_target_db_node0001  | completed         | 45035996273706070
 v_target_db_node0002  | completed         | 45035996273706070
 v_target_db_node0003  | completed         | 45035996273706070
 v_target_db_node0002  | data transferring | 45035996273706136
 v_target_db_node0003  | data transferring | 45035996273706136
 v_target_db_node0001  | data transferring | 45035996273706136
(9 rows)