This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Data export and replication

Vertica supports transferring data by exporting or replicating it.

Vertica supports transferring data by exporting or replicating it. These two methods differ slightly:

Exporting data
Transforms the data into a new format. You can export data to files to use with other tools or as Vertica external tables. You can also export data to another Vertica database using the COPY FROM VERTICA or EXPORT TO VERTICA statements.
Replicating data
Copies data in the Vertica native format. Replicating data is more efficient than exporting because Vertica does not need to decompress, sort, and recompress the data. However, unlike exporting, replication only copies data between Vertica databases running in the same mode (Eon or Enterprise).

Data export

Reasons you might want to export data from Vertica include:

  • To use the data in external tables; see Working with external data.

  • To share data with other clients or consumers in an ecosystem.

  • To copy data to another Vertica cluster.

Vertica provides two ways to export data:

  • Export data to files using data exporters. You can export the results of a SELECT query to files in Parquet, ORC, or delimited format. You can use partitioning in the export to reduce the file size and improve performance when reading the data in external tables. For more information, see File export.

  • Export data directly between Vertica clusters. Direct export lets you copy data between databases without having to save it to an intermediate file. You can perform the operation as either an import from another database or an export to another database; aside from the direction of data travel, these two operations are equal. Direct export lets you copy data between Vertica databases which have different configurations, such as clusters running in different modes or have a different number of nodes or shards. For more information, see Database export and import.

Data replication

Replication transfers data between Vertica databases in native format. Vertica supports two types of replication:

  • vbr-based replication uses the Vertica backup script to copy ROS containers from one database to another. This type of replication supports both Eon Mode and Enterprise Mode databases. However, the two databases must be running in the same mode and have the same number of nodes (for Enterprise Mode databases) or primary nodes (for Eon Mode databases). See Replicating objects to another database cluster for more information about this replication method.

  • Server-based replication is an Eon Mode-only method that copies data directly from the communal store of one database to another. Both databases must have the same shard segmentation layout and run compatible versions of the Vertica server. The replication feature is backwards compatible on the target side of the replication. The target database can run a later version of the database than the source database.

1 - File export

You might want to export a table or other query results from Vertica, either to share it with other applications or to move lower-priority data from ROS to less-expensive storage.

You might want to export a table or other query results from Vertica, either to share it with other applications or to move lower-priority data from ROS to less-expensive storage. Vertica supports several EXPORT TO statements for different file formats: EXPORT TO PARQUET, EXPORT TO ORC, EXPORT TO JSON, and EXPORT TO DELIMITED.

You can export data to HDFS, S3, Google Cloud Storage (GCS), Azure Blob Storage, or the Linux file system. Be careful to avoid concurrent exports to the same output destination. Doing so is an error on any file system and can produce incorrect results.

Not all exporters support all complex types. See the Data Types section on individual reference pages.

You can export ROS data or data that is readable through external tables. After exporting ROS data, you can drop affected ROS partitions to reclaim storage space.

After exporting data, you can define external tables to read that data in Vertica. Parquet and ORC are columnar formats that Vertica can take advantage of in query optimization, so consider using one of those formats for external tables. See Working with external data.

1.1 - Syntax

EXPORT TO PARQUET, EXPORT TO ORC, EXPORT TO JSON, and EXPORT TO DELIMITED have the same general syntax.

Use the EXPORT statements to export data specified by a SELECT statement, as in the following example:

=> EXPORT TO PARQUET(directory='webhdfs:///data/sales_data')
   AS SELECT * FROM public.sales;
 Rows Exported
---------------
         14336
(1 row)

The directory parameter specifies where to write the files and is required. You must have permission to write to the output directory. You can use the ifDirExists parameter to specify what to do if the directory already exists:

  • fail (default)

  • overwrite: replace the entire directory

  • append: export new files into the existing directory

You can export to HDFS, S3, GCS, Azure, or the Linux file system. For additional considerations specific to object stores, see Exporting to object stores. If you export to the local Linux file system (not a shared NFS mount) you must take some additional steps; see Exporting to the Linux file system.

You can use EXPORT statements to write queries across multiple tables in Vertica and export the results. With this approach you can take advantage of powerful, fast query execution in Vertica while making the results available to other clients:

=> EXPORT TO ORC(directory='webhdfs:///data/sales_by_region')
   AS SELECT sale.price, sale.date, store.region
   FROM public.sales sale
   JOIN public.vendor store ON sale.distribID = store.ID;
 Rows Exported
---------------
         23301
(1 row)

Restrictions

  • If you perform more than one concurrent export to the same directory, only one succeeds. If you perform concurrent exports with ifDirExists=overwrite, the results will be incorrect.

  • Exports cannot be run in fenced mode. If the ForceUDxFencedMode configuration parameter is set to true to support other UDxs, you can set it to false at the session level before performing the export.

Data types

All exporters can export scalar types. All except EXPORT TO DELIMITED also support primitive arrays. The JSON and Parquet exporters also support the ARRAY and ROW types in any combination:

=> SELECT * FROM restaurants;
       name        | cuisine |       location_city        |                                     menu
-------------------+---------+----------------------------+------------------------------------------------------------------------------
 Bob's pizzeria    | Italian | ["Cambridge","Pittsburgh"] | [{"item":"cheese pizza","price":null},{"item":"spinach pizza","price":10.5}]
 Bakersfield Tacos | Mexican | ["Pittsburgh"]             | [{"item":"veggie taco","price":9.95},{"item":"steak taco","price":10.95}]
(2 rows)

=> EXPORT TO JSON (directory='/output/json') AS SELECT * FROM restaurants;
 Rows Exported
---------------
             2
(1 row)

=> \! cat /output/json/*.json
{"name":"Bob's pizzeria","cuisine":"Italian","location_city":["Cambridge","Pittsburgh"],"menu":[{"item":"cheese pizza","price":null},{"item":"spinach pizza","price":10.5}]}
{"name":"Bakersfield Tacos","cuisine":"Mexican","location_city":["Pittsburgh"],"menu":[{"item":"veggie taco","price":9.95},{"item":"steak taco","price":10.95}]}

Query

EXPORT statements rewrite the query you specify, because the export is done by a user-defined transform function (UDTF). Because of this rewrite, there are some restrictions on the query you supply.

The query can contain only a single outer SELECT statement. For example, you cannot use UNION:

=> EXPORT TO PARQUET(directory = '/mnt/shared_nfs/accounts/rm')
   OVER(PARTITION BY hash)
   AS
   SELECT 1 as account_id, '{}' as json, 0 hash
   UNION ALL
   SELECT 2 as account_id, '{}' as json, 1 hash;
ERROR 8975:  Only a single outer SELECT statement is supported
HINT:  Please use a subquery for multiple outer SELECT statements

Instead, rewrite the query to use a subquery:

=> EXPORT TO PARQUET(directory = '/mnt/shared_nfs/accounts/rm')
   OVER(PARTITION BY hash)
   AS
   SELECT
    account_id,
    json
   FROM
   (
     SELECT 1 as account_id, '{}' as json, 0 hash
     UNION ALL
     SELECT 2 as account_id, '{}' as json, 1 hash
   ) a;
 Rows Exported
---------------
             2
(1 row)

To use composite statements such as UNION, INTERSECT, and EXCEPT, rewrite them as subqueries.

When exporting data you can use an OVER() clause to partition and sort the data as described in Partitioning and sorting data. Partitioning and sorting can improve query performance. If you partition the data, you cannot specify schema and table names in the SELECT statement. In this case, specify only the column name.

All exporters have a required directory parameter. All allow you to specify a compression type, and each exporter has format-specific parameters. See the parameter descriptions on the individual reference pages: EXPORT TO PARQUET, EXPORT TO ORC, EXPORT TO JSON, and EXPORT TO DELIMITED.

1.2 - Partitioning and sorting data

When exporting, you can use the optional OVER clause in the SELECT statement to specify how to partition and/or sort the exported data.

When exporting, you can use the optional OVER clause in the SELECT statement to specify how to partition and/or sort the exported data. Partitioning reduces the sizes of the output data files and canimprove performance when Vertica queries external tables containing thisdata. (See Partitioned data.) If you do not specify how to partition the data, Vertica optimizes the export for maximum parallelism.

To specify partition columns, use PARTITION BY in theOVER clause as in the following example:

=> EXPORT TO PARQUET(directory = 'webhdfs:///data/export')
  OVER(PARTITION BY date) AS SELECT transactionID, price FROM public.sales;
 Rows Exported
---------------
         28337
(1 row)

You can both partition by a column and include that column in the SELECT clause. Including the column allows you to sort it. Vertica still takes advantage of the partitioning during query execution.

You can sort values within a partition for a further performance improvement. Sort table columns based on the likelihood of their occurrence in query predicates; columns that most frequently occur in comparison or range predicates should be sorted first. You can sort values within each partition using ORDER BY in the OVER clause:

=> EXPORT TO PARQUET(directory = 'webhdfs:///data/export')
  OVER(PARTITION BY date ORDER BY price) AS SELECT date, price FROM public.sales;
 Rows Exported
---------------
         28337
(1 row)

You can use ORDER BY even without partitioning. Storing data in sorted order can improve data access and predicate evaluation performance.

Targets in the OVER clause must be column references; they cannot beexpressions.For more information about OVER, see SQL analytics.

If you are exporting data to a local file system, you might want to force a single node to write all of the files. To do so, use an empty OVER clause.

1.3 - Exporting to object stores

Object-store file systems (S3, Google Cloud Storage, and Azure Blob Storage) have some differences from other file systems that affect data export.

Object-store file systems (S3, Google Cloud Storage, and Azure Blob Storage) have some differences from other file systems that affect data export. You must set some additional configuration parameters for authentication and region, and there are some restrictions on the output.

URI formats and configuration parameters are described on the following reference pages:

Configuration parameters affect all access to the corresponding object stores, both reads and writes. Instead of setting them globally, you can limit the effects of your settings by setting them at the session level before exporting data.

Overwrites

By default, exports fail if the target directory already exists. You can use the ifDirExists parameter to instead append or overwrite. If you specify overwrite for an export to an object store, the existing directory is deleted recursively at the beginning of the operation and is not restored if the operation fails. Be careful not to export to a directory containing data you want to keep.

Output restrictions

Object-store file systems do not support renaming files in place; they implement a rename as a copy followed by a delete. On other file systems, the exporters support atomicity by writing output into a temporary directory and renaming it when complete. Such an approach is impractical for object stores, so the exporters write directly to the destination path. It is therefore possible to begin reading the exported data before the export has finished, which could lead to errors. Be careful to wait for the export to finish before using the data.

Vertica does not support simultaneous exports to the same path in object stores. The results are undefined.

S3 limits buckets to 5TB. You might need to divide very large exports.

1.4 - Exporting to the Linux file system

Exports to the local file system can be to an NFS mount (shared) or to the Linux file system on each node (non-shared).

Exports to the local file system can be to an NFS mount (shared) or to the Linux file system on each node (non-shared). If you export to an NFS mount on the Linux file system, the exporters behave the same way as for any other shared location: all of the exported files are written to the same (shared) destination.

If you export to the local Linux file system, each Vertica node writes its portion of the export to its local (non-shared) file system. Exports to non-shared local file systems have the following restrictions:

  • The output directory must not exist on any node.

  • You must have a USER storage location or superuser privileges.

  • You cannot override the permissions mode of 700 for directories and 600 for files.

To define an external table to read data exported to a non-shared local file system, use COPY...ON EACH NODE, as in the following example:

=> CREATE EXTERNAL TABLE sales (...)
   AS COPY FROM '/data/sales/*.parquet' ON EACH NODE PARQUET;

The path is the same path that you specified in the export statement and is the same on all nodes.

If you omit the ON clause and just specify a path, COPY only loads the data it finds on the initiator node.

1.5 - Monitoring exports

You can review information about exports, including numbers of row groups, file sizes, and file names.

You can review information about exports, including numbers of row groups, file sizes, and file names.

The export statements are UDxs. The UDX_EVENTS system table records events logged during UDx execution, including timestamps, node names, and session IDs. This table contains a column (__RAW__), which holds a VMap of whatever additional data an individual UDx logged. The export statements log details about the exported files in this table. While you can work with this table directly and materialize values from the VMap column, you might prefer to define a view to simplify your access.

The following statement defines a view showing only the events from EXPORT TO PARQUET, materializing the VMap values.

=> CREATE VIEW parquet_export_events AS
   SELECT
    report_time,
    node_name,
    session_id,
    user_id,
    user_name,
    transaction_id,
    statement_id,
    request_id,
    udx_name,
    event_type,
    file,
    created,
    closed,
    rows,
    row_groups,
    size_mb,
    row_group_file,
    row_group_n,
    row_group_rows,
    row_group_bytes,
    row_group_expected_size_mb, 
    row_groups_exceeding_target
   FROM
    v_monitor.udx_events
   WHERE
    udx_name ilike 'ParquetExport%';

The exporters report the following UDx-specific columns:

Column Name Data Type Description
EVENT_TYPE VARCHAR What was written. All exporters report FILE events. In addition, the Parquet exporter reports ROW_GROUP events. Row groups are only logged if debugging is enabled.
FILE VARCHAR For EVENT_TYPE of FILE, name of the output file.
CREATED TIMESTAMPTZ For EVENT_TYPE of FILE, when the file was created.
CLOSED TIMESTAMPTZ For EVENT_TYPE of FILE, when the file was closed after writing.
ROWS INTEGER For EVENT_TYPE of FILE, the total number of rows in the file.
ROW_GROUPS INTEGER For EVENT_TYPE of FILE, the number of row groups, for formats that use them (Parquet and ORC).
SIZE_MB FLOAT For EVENT_TYPE of FILE, file size.
ROW_GROUP_FILE VARCHAR For EVENT_TYPE of ROW_GROUP, name of the file containing the row group.
ROW_GROUP_N INTEGER For EVENT_TYPE of ROW_GROUP, row group number.
ROW_GROUP_ROWS INTEGER For EVENT_TYPE of ROW_GROUP, number of rows written in this row group.
ROW_GROUP_BYTES INTEGER For EVENT_TYPE of ROW_GROUP, number of bytes written in this row group.
ROW_GROUP_EXPECTED_SIZE_MB INTEGER For Parquet exports and an EVENT_TYPE of FILE, target size of each row group.
ROW_GROUPS_EXCEEDING_TARGET INTEGER For Parquet exports and an EVENT_TYPE of FILE, number of row groups exceeding the target size.

The following example shows the results of a single export.

=> SELECT file,rows,row_groups,size_mb FROM PARQUET_EXPORT_EVENTS;
                                  file                                 | rows  | row_groups | size_mb
-----------------------------------------------------------------------+-------+------------+----------
 /data/outgZxN3irt/450c4213-v_vmart_node0001-139770732459776-0.parquet | 29696 | 1          | 0.667203
 /data/outgZxN3irt/9df1c797-v_vmart_node0001-139770860660480-0.parquet | 29364 | 1          | 0.660922
(2 rows)

In this table, the output directory name (/data/out) is appended with a generated string (gZxN3irt). For exports to HDFS or to local file systems (including NFS), EXPORT TO PARQUET first writes data into a scratch directory and then renames it at the end of the operation. The events are logged during export and so show the temporary name. Some output destinations, such as AWS S3, do not support rename operations, so in those cases this table does not show generated names.

ROW_GROUP events are only recorded when debugging is enabled:

=> SELECT SET_DEBUG_LOG_ALLNODES('UDX', 'ALL');

2 - Database export and import

Vertica can easily import data from and export data to other Vertica databases.

Vertica can easily import data from and export data to other Vertica databases. Importing and exporting data is useful for common tasks such as moving data back and forth between a development or test database and a production database, or between databases that have different purposes but need to share data on a regular basis.

Moving data directly between databases

To move data between databases you first establish a connection using CONNECT TO VERTICA and then use one of the following statements to move data:

These statements are symmetric; copying from cluster A to cluster B is the same as exporting from cluster B to cluster A. The difference is only in which cluster drives the operation.

To configure TLS settings for the connection, see Configuring connection security between clusters.

Creating SQL scripts to export data

Three functions return a SQL script you can use to export database objects to recreate elsewhere:

While copying and exporting data is similar to Backing up and restoring the database, you should use them for different purposes, outlined below:

Task Backup and Restore COPY and EXPORT Statements
Back up or restore an entire database, or incremental changes YES NO
Manage database objects (a single table or selected table rows) YES YES
Use external locations to back up and restore your database YES NO
Use direct connections between two databases OBJECT RESTORE ONLY YES
Use external shell scripts to back up and restore your database YES NO
Use SQL commands to incorporate copy and export tasks into DB operations NO YES

The following sections explain how you import and export data between Vertica databases.

When importing from or exporting to a Vertica database, you can connect only to a database that uses trusted (username only) or password-based authentication, as described in Security and authentication. OAuth and Kerberos authentication methods are not supported.

Other exports

This section is about exporting data to another Vertica database. For information about exporting data to files, which can then be used in external tables or COPY statements, see File export.

2.1 - Configuring connection security between clusters

When copying data between clusters, Vertica can encrypt both data and plan metadata.

When copying data between clusters, Vertica can encrypt both data and plan metadata.

Data is encrypted if you configure internode encryption (see Internode TLS).

For metadata, by default Vertica tries TLS first and falls back to plaintext. You can configure Vertica to require TLS and to fail if the connection cannot be made. You can also have Vertica verify the certificate and hostname before connecting.

Enabling TLS between clusters

To use TLS between clusters, you must first configure TLS between nodes:

  1. Set the EncryptSpreadComms parameter.

  2. Configure the data_channel TLS Configuration.

  3. Set the ImportExportTLSMode parameter.

To specify the level of strictness when connecting to another cluster, set the ImportExportTLSMode configuration parameter. This parameter applies for both importing and exporting data. The possible values are:

  • PREFER: Try TLS but fall back to plaintext if TLS fails.

  • REQUIRE: Use TLS and fail if the server does not support TLS.

  • VERIFY_CA: Require TLS (as with REQUIRE), and also validate the other server's certificate using the CA specified by the "server" TLS Configuration's CA certificates (in this case, "ca_cert" and "ica_cert"):

    => SELECT name, certificate, ca_certificate, mode FROM tls_configurations WHERE name = 'server';
      name  |   certificate    |   ca_certificate   |   mode
    --------+------------------+--------------------+-----------
     server | server_cert      | ca_cert,ica_cert   | VERIFY_CA
    (1 row)
    
  • VERIFY_FULL: Require TLS and validate the certificate (as with VERIFY_CA), and also validate the server certificate's hostname.

  • REQUIRE_FORCE, VERIFY_CA_FORCE, and VERIFY_FULL_FORCE: Same behavior as REQUIRE, VERIFY_CA, and VERIFY_FULL, respectively, and cannot be overridden by CONNECT TO VERTICA.

ImportExportTLSMode is a global parameter that applies to all import and export connections you make using CONNECT TO VERTICA. You can override it for an individual connection.

For more information about these and other configuration parameters, see Security parameters.

2.2 - Exporting data to another database

EXPORT TO VERTICA exports table data from one Vertica database to another.

EXPORT TO VERTICA exports table data from one Vertica database to another. The following requirements apply:

  • You already opened a connection to the target database with CONNECT TO VERTICA.

  • The source database is no more than one major release behind the target database.

  • The table in the target database must exist.

  • Source and target table columns must have the same or compatible data types.

Each EXPORT TO VERTICA statement exports data from only one table at a time. You can use the same database connection for multiple export operations.

Export process

Exporting is a three-step process:

  1. Connect to the target database with CONNECT TO VERTICA.

    For example:

    => CONNECT TO VERTICA testdb USER dbadmin PASSWORD '' ON 'VertTest01', 5433;
    CONNECT
    
  2. Export the desired data with EXPORT TO VERTICA. For example, the following statement exports all table data in customer_dimension to a table of the same name in target database testdb:

    
    => EXPORT TO VERTICA testdb.customer_dimension FROM customer_dimension;
    Rows Exported
    ---------------
             23416
    (1 row)
    
  3. DISCONNECT disconnects from the target database when all export and import operations are complete:

    => DISCONNECT testdb;
    DISCONNECT
    

Mapping between source and target columns

If you export all table data from one database to another as in the previous example, EXPORT TO VERTICA can omit specifying column lists. This is possible only if column definitions in both tables comply with the following conditions:

  • Same number of columns

  • Identical column names

  • Same sequence of columns

  • Matching or compatible column data types

If any of these conditions is not true, the EXPORT TO VERTICA statement must include column lists that explicitly map source and target columns to each other, as follows:

  • Contain the same number of columns.

  • List source and target columns in the same order.

  • Pair columns with the same (or compatible) data types.

For example:

=> EXPORT TO VERTICA testdb.people (name, gender, age)
   FROM customer_dimension (customer_name, customer_gender, customer_age);

Exporting subsets of table data

In general, you can export a subset of table data in two ways:

  • Export data of specific source table columns.

  • Export the result set of a query (including historical queries) on the source table.

In both cases, the EXPORT TO VERTICA statement typically must specify column lists for the source and target tables.

The following example exports data from three columns in the source table to three columns in the target table. Accordingly, the EXPORT TO VERTICA statement specifies a column list for each table. The order of columns in each list determines how Vertica maps target columns to source columns. In this case, target columns name, gender, and age map to source columns customer_name, customer_gender, and customer_age, respectively:

=> EXPORT TO VERTICA testdb.people (name, gender, age) FROM customer_dimension
(customer_name, customer_gender, customer_age);
Rows Exported
---------------
         23416
(1 row)

The next example queries source table customer_dimension, and exports the result set to table ma_customers in target database testdb:

=> EXPORT TO VERTICA testdb.ma_customers(customer_key, customer_name, annual_income)
   AS SELECT customer_key, customer_name, annual_income FROM customer_dimension WHERE customer_state = 'MA';
Rows Exported
---------------
          3429
(1 row)

Exporting IDENTITY columns

You can export tables (or columns) that contain IDENTITY values, but the sequence values are not incremented automatically at the target table. You must use ALTER SEQUENCE to make updates.

Export IDENTITY columns as follows:

  • If both source and destination tables have an IDENTITY column and configuration parameter CopyFromVerticaWithIdentity is set to true (1), you do not need to list them.

  • If source table has an IDENTITY column, but target table does not, you must explicitly list the source and target columns.

By default, EXPORT TO VERTICA exports all IDENTITY columns . To disable this behavior globally, set the CopyFromVerticaWithIdentity configuration parameter.

2.3 - Copying data from another Vertica database

COPY FROM VERTICA imports table data from one Vertica database to another.

COPY FROM VERTICA imports table data from one Vertica database to another. The following requirements apply:

  • You already opened a connection to the target database with CONNECT TO VERTICA.

  • The source database is no more than one major release behind the target database.

  • The table in the target database must exist.

  • Source and target table columns must have the same or compatible data types.

Import process

Importing is a three-step process:

  1. Connect to the source database with CONNECT TO VERTICA. For example:

    => CONNECT TO VERTICA vmart USER dbadmin PASSWORD '' ON 'VertTest01',5433;
    CONNECT
    
  2. Import the desired data with COPY FROM VERTICA. For example, the following statement imports all table data in customer_dimension to a table of the same name:

    
    => COPY customer_dimension FROM  VERTICA vmart.customer_dimension;
     Rows Loaded
    -------------
          500000
    (1 row)
    => DISCONNECT vmart;
    DISCONNECT
    
  3. DISCONNECT disconnects from the source database when all import and export operations are complete:

    => DISCONNECT vmart;
    DISCONNECT
    

Importing IDENTITY columns

You can import IDENTITY columns as follows:

  • If both source and destination tables have an IDENTITY column and configuration parameter CopyFromVerticaWithIdentity is set to true (1), you do not need to list them.

  • If source table has an IDENTITY column, but target table does not, you must explicitly list the source and target columns.

After importing the columns, the IDENTITY column values do not increment automatically. Use ALTER SEQUENCE to make updates.

The default behavior for this statement is to import IDENTITY columns by specifying them directly in the source table. To disable this behavior globally, set the CopyFromVerticaWithIdentity configuration parameter.

2.4 - Copy and export data on AWS

There are common issues that occur when exporting or copying on AWS clusters, as described below.

There are common issues that occur when exporting or copying on AWS clusters, as described below. Except for these specific issues as they relate to AWS, copying and exporting data works as documented in Database export and import.

To copy or export data on AWS:

  1. Verify that all nodes in source and destination clusters have their own elastic IPs (or public IPs) assigned.

    If your destination cluster is located within the same VPC as your source cluster, proceed to step 3. Each node in one cluster must be able to communicate with each node in the other cluster. Thus, each source and destination node needs an elastic IP (or public IP) assigned.

  2. (For non-CloudFormation Template installs) Create an S3 gateway endpoint.

    If you aren't using a CloudFormation Template (CFT) to install Vertica, you must create an S3 gateway endpoint in your VPC. For more information, see the AWS documentation.

    For example, the Vertica CFT has the following VPC endpoint:

    "S3Enpoint" : {
        "Type" : "AWS::EC2::VPCEndpoint",
        "Properties" : {
        "PolicyDocument" : {
            "Version":"2012-10-17",
            "Statement":[{
            "Effect":"Allow",
            "Principal": "*",
            "Action":["*"],
            "Resource":["*"]
            }]
        },
        "RouteTableIds" : [ {"Ref" : "RouteTable"} ],
        "ServiceName" : { "Fn::Join": [ "", [ "com.amazonaws.", { "Ref": "AWS::Region" }, ".s3" ] ] },
        "VpcId" : {"Ref" : "VPC"}
    }
    

  3. Verify that your security group allows the AWS clusters to communicate.

    Check your security groups for both your source and destination AWS clusters. Verify that ports 5433 and 5434 are open. If one of your AWS clusters is on a separate VPC, verify that your network access control list (ACL) allows communication on port 5434.

  4. If there are one or more elastic load balancers (ELBs) between the clusters, verify that port 5433 is open between the ELBs and clusters.

  5. If you use the Vertica client to connect to one or more ELBs, the ELBs only distribute incoming connections. The data transmission path occurs between clusters.

2.5 - Changing node export addresses

You can change the export address for your Vertica cluster.

You can change the export address for your Vertica cluster. You might need to do so to export data between clusters in different network subnets.

  1. Create a subnet for importing and exporting data between Vertica clusters. The CREATE SUBNET statement identifies the public network IP addresses residing on the same subnet.

    => CREATE SUBNET kv_subnet with '10.10.10.0';
    
  2. Alter the database to specify the subnet name of a public network for import/export.

    => ALTER DATABASE DEFAULT EXPORT ON kv_subnet;
    
  3. Create network interfaces for importing and exporting data from individual nodes to other Vertica clusters. The CREATE NETWORK INTERFACE statement identifies the public network IP addresses residing on multiple subnets.

    => CREATE NETWORK INTERFACE kv_node1 on v_VMartDB_node0001 with '10.10.10.1';
    => CREATE NETWORK INTERFACE kv_node2 on v_VMartDB_node0002 with '10.10.10.2';
    => CREATE NETWORK INTERFACE kv_node3 on v_VMartDB_node0003 with '10.10.10.3';
    => CREATE NETWORK INTERFACE kv_node4 on v_VMartDB_node0004 with '10.10.10.4';
    

    For users on Amazon Web Services (AWS) or using Network Address Translation (NAT), refer to Vertica on Amazon Web Services.

  4. Alter the node settings to change the export address. When used with the EXPORT ON clause, the ALTER NODE specifies the network interface of the public network on individual nodes for importing and exporting data.

    => ALTER NODE v_VMartDB_node0001 export on kv_node1;
    => ALTER NODE v_VMartDB_node0002 export on kv_node2;
    => ALTER NODE v_VMartDB_node0003 export on kv_node3;
    => ALTER NODE v_VMartDB_node0004 export on kv_node4;
    
  5. Verify if the node address and the export address are different on different network subnets of the Vertica cluster.

    => SELECT node_name, node_address, export_address FROM nodes;
         node_name     | node_address    | export_address
    -------------------+-----------------+----------------
    v_VMartDB_node0001 | 192.168.100.101 | 10.10.10.1
    v_VMartDB_node0002 | 192.168.100.102 | 10.10.10.2
    v_VMartDB_node0003 | 192.168.100.103 | 10.10.10.3
    v_VMartDB_node0004 | 192.168.100.104 | 10.10.10.4
    

Creating a network interface and altering the node settings to change the export address takes precedence over creating a subnet and altering the database for import/export.

2.6 - Using public and private IP networks

In many configurations, Vertica cluster hosts use two network IP addresses as follows:.

In many configurations, Vertica cluster hosts use two network IP addresses as follows:

  • A private address for communication between the cluster hosts.

  • A public IP address for communication with client connections.

By default, importing from and exporting to another Vertica database uses the private network.

To use the public network address for copy and export activities, as well as moving large amounts of data, configure the system to use the public network to support exporting to or importing from another Vertica cluster:

Vertica encrypts data during transmission (if you have configured a certificate). Vertica attempts to also encrypt plan metadata but, by default, falls back to plaintext if needed. You can configure Vertica to require encryption for metadata too; see Configuring connection security between clusters.

In certain instances, both public and private addresses exceed the demand capacity of a single Local Area Network (LAN). If you encounter this type of scenario, then configure your Vertica cluster to use two LANs: one for public network traffic and one for private network traffic.

2.6.1 - Identify the public network to Vertica

To be able to import to or export from a public network, Vertica needs to be aware of the IP addresses of the nodes or clusters on the public network that will be used for import/export activities.

To be able to import to or export from a public network, Vertica needs to be aware of the IP addresses of the nodes or clusters on the public network that will be used for import/export activities. Your public network might be configured in either of these ways:

  • Public network IP addresses reside on the same subnet (create a subnet)

  • Public network IP addresses are on multiple subnets (create a network interface)

To identify public network IP addresses residing on the same subnet:

  • Use the CREATE SUBNET statement provide your subnet with a name and to identify the subnet routing prefix.

To identify public network IP addresses residing on multiple subnets:

After you've identified the subnet or network interface to be used for import/export, you must Identify the Database Or Nodes Used For Import/Export.

See also

2.6.2 - Identify the database or nodes used for import/export

After you identify the public network to Vertica, you can configure a database and its nodes to use it for import and export operations:.

After you identify the public network to Vertica, you can configure a database and its nodes to use it for import and export operations:

  • Use ALTER DATABASE to specify a subnet on the public network for the database. After doing so, all nodes in the database automatically use the network interface on the subnet for import/export operations.

  • On each database node, use ALTER NODE to specify a network interface of the public network.

See also

2.7 - Handling node failure during copy/export

When an export (EXPORT TO VERTICA) or import from Vertica (COPY FROM VERTICA) task is in progress, and a non-initiator node fails, Vertica does not complete the task automatically.

When an export (EXPORT TO VERTICA) or import from Vertica (COPY FROM VERTICA) task is in progress, and a non-initiator node fails, Vertica does not complete the task automatically. A non-initiator node is any node that is not the source or target node in your export or import statement. To complete the task, you must run the statement again.

You address the problem of a non-initiator node failing during an import or export as follows:

  1. You export or import from one cluster to another using the EXPORT TO VERTICA or COPY FROM VERTICA statement.

    During the export or import, a non-initiating node on the target or source cluster fails. Vertica issues an error message that indicates possible node failure, one of the following:

    • ERROR 4534: Receive on v_tpchdb1_node0002: Message receipt from v_tpchdb2_node0005 failed

    • WARNING 4539: Received no response from v_tpchdb1_node0004 in abandon plan

    • ERROR 3322: [tpchdb2] Execution canceled by operator

  2. Complete your import or export by running the statement again. The failed node does not need to be up for Vertica to successfully complete the export or import.

2.8 - Using EXPORT functions

Vertica provides several EXPORT_ functions that let you recreate a database, or specific schemas and tables, in a target database.

Vertica provides several EXPORT_ functions that let you recreate a database, or specific schemas and tables, in a target database. For example, you can use the EXPORT_ functions to transfer some or all of the designs and objects you create in a development or test environment to a production database.

The EXPORT_ functions create SQL scripts that you can run to generate the exported database designs or objects. These functions serve different purposes to the export statements, COPY FROM VERTICA (pull data) and EXPORT TO VERTICA (push data). These statements transfer data directly from source to target database across a network connection between both. They are dynamic actions and do not generate SQL scripts.

The EXPORT_ functions appear in the following table. Depending on what you need to export, you can use one or more of the functions. EXPORT_CATALOG creates the most comprehensive SQL script, while EXPORT_TABLES and EXPORT_OBJECTS are subsets of that function to narrow the export scope.

Use this function... To recreate...
EXPORT_CATALOG

These catalog items:

  • An existing schema design, tables, projections, constraints, views, and stored procedures.

  • The Database Designer-created schema design, tables, projections, constraints, and views

  • A design on a different cluster.

EXPORT_TABLES Non-virtual objects up to, and including, the schema of one or more tables.
EXPORT_OBJECTS Catalog objects in order dependency for replication.

The designs and object definitions that the script creates depend on the EXPORT_ function scope you specify. The following sections give examples of the commands and output for each function and the scopes it supports.

Saving scripts for export functions

All of the examples in this section were generated using the standard Vertica VMART database, with some additional test objects and tables. One output directory was created for all SQL scripts that the functions created:

/home/dbadmin/xtest

If you specify the destination argument as an empty string (''), the function writes the export results to STDOUT.

2.8.1 - Exporting the catalog

EXPORT_CATALOG generates a SQL script for copying a database design to another cluster.

The EXPORT_CATALOG function generates a SQL script that copies a database design to another cluster. This script replicates the physical schema design of the source database. For example:

=> SELECT EXPORT_CATALOG (
     '/home/dbadmin/xtest/sql_cat_tables.sql',
     'TABLES');
           EXPORT_CATALOG
-------------------------------------
 Catalog data exported successfully
(1 row)

The last argument specifies what to export:

  • TABLES: Tables, schemas, and table-dependent objects: constraints, and access policies.

  • DESIGN (default): All catalog objects.

  • DIRECTED_QUERIES: All directed queries that are stored in the catalog. For details, see Managing directed queries.

Use the DESIGN scope to export all design elements of a source database in order of their dependencies. This scope exports all catalog objects, including schemas, tables, constraints, projections, views, access policies, stored procedures, and data loaders:

=> SELECT EXPORT_CATALOG(
     '/home/dbadmin/xtest/sql_cat_design.sql',
     'DESIGN' );
           EXPORT_CATALOG
-------------------------------------
 Catalog data exported successfully
(1 row)

This function can generate the following types of statements:

The SQL script:

  • Only includes objects to which the user has access.

  • Orders CREATE statements based on object dependencies so they can be recreated in the correct sequence. For example, if a table is in a non-public schema, the required CREATE SCHEMA statement precedes the CREATE TABLE statement. Similarly, a table's CREATE ACCESS POLICY statement follows the table's CREATE TABLE statement.

  • If possible, creates projections with a KSAFE clause, if any, otherwise with an OFFSET clause.

Projection considerations

If a projection to export was created with no ORDER BY clause, the SQL script reflects the default behavior for projections. Vertica implicitly creates projections using a sort order based on the SELECT columns in the projection definition.

The EXPORT_CATALOG script is portable if all projections are created using UNSEGMENTED ALL NODES or SEGMENTED ALL NODES.

See also

2.8.2 - Exporting tables

Vertica function EXPORT_TABLES exports DDL for tables and related objects in the current database.

Vertica function EXPORT_TABLES exports DDL for tables and related objects in the current database. The generated SQL includes all non-virtual table objects to which you have access. You can use this SQL to recreate tables and related non-virtual objects on a different cluster.

You execute EXPORT_TABLES as follows:

EXPORT_TABLES( ['[destination]' [, '[scope]']] )

The SQL script conforms to the following requirements:

  • Only includes objects to which the user has access.

  • Orders CREATE statements according to object dependencies so they can be recreated in the correct sequence. For example, if a table references a named sequence, a CREATE SEQUENCE statement precedes the CREATE TABLE statement. Similarly, a table's CREATE ACCESS POLICY statement follows the table's CREATE TABLE statement.

Setting export scope

The EXPORT_TABLES scope argument specifies the scope of the export operation:

To export... Set scope to...
All tables to which you have access, including constraints Empty string (' ')
One or more named tables

Comma-delimited list of table objects. For example:

myschema.newtable, yourschema.oldtable

You can optionally qualify the schema with the name of the current database:

mydb.myschema.newtable

A named table object in the current search path: a schema, table, or sequence. If you specify a schema, EXPORT_TABLES exports all table objects in that schema to which you have access.

Table object's name and, optionally, its path:

VMart.myschema

Exporting all table objects

If you set the scope parameter to an empty string (''), EXPORT_TABLES exports all tables and their related objects. For example, the following call to EXPORT_TABLES exports all table objects in the VMart database to the specified output file.

 => SELECT EXPORT_TABLES(
      '/home/dbadmin/xtest/sql_tables_empty.sql', '');
            EXPORT_TABLES
-------------------------------------
 Catalog data exported successfully
(1 row)

The exported SQL includes the following types of statements, depending on what is required to recreate the tables and related objects, such as schemas, sequences and access policies:

  • CREATE SCHEMA

  • CREATE TABLE

  • CREATE SEQUENCE

  • CREATE ACCESS POLICY

  • ALTER TABLE (to add foreign key constraints)

Exporting individual table objects

EXPORT_TABLES can specify a comma-separated list of tables and table-related objects such as sequences or schemas to export. The generated SQL script includes the CREATE statements for the specified objects and their dependent objects:

  • CREATE SCHEMA

  • CREATE TABLE

  • CREATE SEQUENCE

  • CREATE ACCESS POLICY

  • ALTER TABLE (to add foreign keys)

For example, the following call to EXPORT_TABLES exports two VMart tables: store.store_sales_fact and store.store_dimension:

=> SELECT export_tables('','store.store_sales_fact, store.store_dimension');
                              export_tables
-------------------------------------------------------------------------

CREATE TABLE store.store_dimension
(
    store_key int NOT NULL,
    store_name varchar(64),
    store_number int,
    store_address varchar(256),
    store_city varchar(64),
    store_state char(2),
    store_region varchar(64),
    floor_plan_type varchar(32),
    photo_processing_type varchar(32),
    financial_service_type varchar(32),
    selling_square_footage int,
    total_square_footage int,
    first_open_date date,
    last_remodel_date date,
    number_of_employees int,
    annual_shrinkage int,
    foot_traffic int,
    monthly_rent_cost int,
    CONSTRAINT C_PRIMARY PRIMARY KEY (store_key) DISABLED
);

CREATE TABLE store.store_sales_fact
(
    date_key int NOT NULL,
    product_key int NOT NULL,
    product_version int NOT NULL,
    store_key int NOT NULL,
    promotion_key int NOT NULL,
    customer_key int NOT NULL,
    employee_key int NOT NULL,
    pos_transaction_number int NOT NULL,
    sales_quantity int,
    sales_dollar_amount int,
    cost_dollar_amount int,
    gross_profit_dollar_amount int,
    transaction_type varchar(16),
    transaction_time time,
    tender_type varchar(8)
);

ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_date FOREIGN KEY (date_key) references public.date_dimension (date_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_product FOREIGN KEY (product_key, product_version) references public.product_dimension (product_key, product_version);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_store FOREIGN KEY (store_key) references store.store_dimension (store_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_promotion FOREIGN KEY (promotion_key) references public.promotion_dimension (promotion_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_customer FOREIGN KEY (customer_key) references public.customer_dimension (customer_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_employee FOREIGN KEY (employee_key) references public.employee_dimension (employee_key);

The following call to EXPORT_TABLES specifies to export all tables in the VMart schema store :

=> select export_tables('','store');
                          export_tables
-----------------------------------------------------------------
CREATE SCHEMA store;

CREATE TABLE store.store_dimension
(
 ...
);

CREATE TABLE store.store_sales_fact
(
 ...
);

CREATE TABLE store.store_orders_fact
(
  ...
);

ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_date FOREIGN KEY (date_key) references public.date_dimension (date_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_product FOREIGN KEY (product_key, product_version) references public.product_dimension (product_key, product_version);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_store FOREIGN KEY (store_key) references store.store_dimension (store_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_promotion FOREIGN KEY (promotion_key) references public.promotion_dimension (promotion_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_customer FOREIGN KEY (customer_key) references public.customer_dimension (customer_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_employee FOREIGN KEY (employee_key) references public.employee_dimension (employee_key);
ALTER TABLE store.store_orders_fact ADD CONSTRAINT fk_store_orders_product FOREIGN KEY (product_key, product_version) references public.product_dimension (product_key, product_version);
ALTER TABLE store.store_orders_fact ADD CONSTRAINT fk_store_orders_store FOREIGN KEY (store_key) references store.store_dimension (store_key);
ALTER TABLE store.store_orders_fact ADD CONSTRAINT fk_store_orders_vendor FOREIGN KEY (vendor_key) references public.vendor_dimension (vendor_key);
ALTER TABLE store.store_orders_fact ADD CONSTRAINT fk_store_orders_employee FOREIGN KEY (employee_key) references public.employee_dimension (employee_key);

(1 row)

See also

2.8.3 - Exporting objects

EXPORT_OBJECTS generates a SQL script that recreates non-virtual catalog objects on a different cluster.

The EXPORT_OBJECTS function generates a SQL script that recreates non-virtual catalog objects on a different cluster. It takes arguments for the output file, scope, and K-safety enforcement. For example:

=> SELECT EXPORT_OBJECTS(
     '/home/dbadmin/xtest/sql_objects_all.sql',
     '',       --- what to export (all, in this case)
     'true');  --- enforce K-safety
           EXPORT_OBJECTS
-------------------------------------
 Catalog data exported successfully
(1 row)

If you set the scope to an empty string ('') as in this example, Vertica exports all non-virtual objects to which you have access in order of dependencies. To specify the objects to export, use a comma-delimited list. The function automatically exports objects on which the target objects depend. For example, if you export myschema.mytable, you do not also need to specify myschema.

If you specify a schema, EXPORT_OBJECTS exports all objects in that schema to which you have access. If you name a table that references a sequence, the generated script exports the sequence, then the table that references the sequence, and finally any projections of that table.

Running the generated SQL script on another cluster creates all referenced objects and their dependent objects.

The SQL script:

  • Only includes objects to which the user has access.

  • Orders CREATE statements based on object dependencies so they can be recreated in the correct sequence. For example, if a table is in a non-public schema, the required CREATE SCHEMA statement precedes the CREATE TABLE statement. Similarly, a table's CREATE ACCESS POLICY statement follows the table's CREATE TABLE statement.

  • If possible, creates projections with a KSAFE clause, if any, otherwise with an OFFSET clause.

This function can generate the following types of statements:

K-safety

By default, the function's KSAFE argument is set to true. In this case, the generated script calls MARK_DESIGN_KSAFE, which replicates K-safety of the original database.

The following example output includes the start and end of an output SQL file:

CREATE SCHEMA store;
CREATE SCHEMA online_sales;
CREATE SEQUENCE public.my_seq ;
CREATE TABLE public.customer_dimension
(
    customer_key int NOT NULL,
    customer_type varchar(16),
    customer_name varchar(256),
    customer_gender varchar(8),
    title varchar(8),
    household_id int,
...

);
...
SELECT MARK_DESIGN_KSAFE(1);

Exporting individual objects

You can specify one or more objects to export in a comma-delimited list. The names of objects in any non-public schema must include their respective schemas.

If listed objects have overlapping scopes—for example, the list includes a table and one of its projections—EXPORT_OBJECTS exports the projection only once:

=> SELECT EXPORT_OBJECTS ('','customer_dimension, customer_dimension_super');

                              export_objects
--------------------------------------------------------------------------
CREATE TABLE public.customer_dimension
(
    customer_key int NOT NULL,
    customer_type varchar(16),
    customer_name varchar(256),
    ...
    CONSTRAINT C_PRIMARY PRIMARY KEY (customer_key) DISABLED
);

CREATE ACCESS POLICY ON public.customer_dimension FOR COLUMN customer_address CASE WHEN enabled_role('administrator') THEN customer_address ELSE '**************' END ENABLE;

CREATE PROJECTION public.customer_dimension_super /*+basename(customer_dimension),createtype(L)*/
(
 customer_key,
 customer_type,
 customer_name,
  ...
)
AS
 SELECT customer_dimension.customer_key,
        customer_dimension.customer_type,
        customer_dimension.customer_name,
        ...
 FROM public.customer_dimension
 ORDER BY customer_dimension.customer_key
SEGMENTED BY hash(customer_dimension.customer_key) ALL NODES OFFSET 0;

SELECT MARK_DESIGN_KSAFE(0);

You can export stored procedures by specifying their names and the types of their formal parameters. For stored procedures with the same name but different formal parameters, you can export all implementations by exporting the parent schema.

To export a particular implementation of a stored procedure, specify either the types or both the names and types of the procedure's formal parameters. The following example specifies the types:

=> SELECT EXPORT_OBJECTS('','raiseXY(int, int)');
    EXPORT_OBJECTS
----------------------

CREATE PROCEDURE public.raiseXY(x int, y int)
LANGUAGE 'PL/vSQL'
SECURITY INVOKER
AS '
BEGIN
RAISE NOTICE ''x = %'', x;
RAISE NOTICE ''y = %'', y;
-- some processing statements
END
';

SELECT MARK_DESIGN_KSAFE(0);

(1 row)

To export all implementations of an overloaded stored procedure, export its parent schema:

=> SELECT EXPORT_OBJECTS('','public');
    EXPORT_OBJECTS
----------------------

...

CREATE PROCEDURE public.raiseXY(x int, y varchar)
LANGUAGE 'PL/vSQL'
SECURITY INVOKER
AS '
BEGIN
RAISE NOTICE ''x = %'', x;
RAISE NOTICE ''y = %'', y;
-- some processing statements
END
';

CREATE PROCEDURE public.raiseXY(x int, y int)
LANGUAGE 'PL/vSQL'
SECURITY INVOKER
AS '
BEGIN
RAISE NOTICE ''x = %'', x;
RAISE NOTICE ''y = %'', y;
-- some processing statements
END
';

SELECT MARK_DESIGN_KSAFE(0);

(1 row)

See also

Exporting tables

3 - Replication

Replication copies data in native format between Vertica databases.

Replication copies data from one Vertica database to another in native format. Usually, you use replication to keep a table or schema in the target database up to date with corresponding table or schema in the source database. Replication automatically determines what has changed in the source database and only transfers those changes to the target database.

Replication is also useful for disaster recovery (DR) and isolating workloads. For example:

  • Keep a "hot spare" Vertica database that can fill in if the primary database goes down.
  • Give geographically separated teams their own local databases.
  • Fully isolate ETL workloads from analytics workloads by giving each their own Vertica database.

Vertica supports two forms of replication:

  • vbr-based replication that uses the backup script to copy ROS containers from one database to another.
  • Server-based replication that copies shards directly from one database's communal storage to another.

vbr-based replication

The vbr-replication method works between either Eon Mode or Enterprise Mode databases. You cannot replicate data between databases running in different modes or cloud platforms. The databases must have the same number of nodes (when in Enterprise Mode) or primary nodes (in Eon Mode). If replicating between Eon Mode databases, the databases must have the same shard segmentation and primary subcluster node subscriptions.

For more information about vbr-based replication, see Replicating objects to another database cluster.

Server-based replication

Server-based replication transfers data between Eon Mode databases by copying shard files one communal storage location to another. This method supports Eon Mode databases in different clouds or on-premises. The target database must have access to the source database's communal storage, and the shard count and shard segmentation of the source and target namespaces must be the same.

Choosing between vbr-based and server-based replication

As server-based replication only works between Eon Mode databases, you only have to choose between the two types of replication if both the source and target database run in Eon Mode.

When choosing between the two modes, consider:

  • Server-based replication is usually more efficient because it works within Vertica. Also, unlike vbr-based replication, server-based replication does not use SSH tunneling for encryption, avoiding potential security risks.
  • For vbr-based replication, your databases must have more in common than databases using server-based replication. For example, in vbr-based replication, the source and target databases must have the same shard segmentation, primary subcluster node subscriptions, communal storage type, and number of nodes in their primary subcluster. For server-based replication, the only requirement is that the two databases have the same shard segmentation.

3.1 - Server-based replication

Server-based replication copies shard data from one Eon Mode database to another.

Eon Mode Only

Server-based replication lets you copy data from one Eon Mode database to another. This replication method copies shard data directly from the source communal storage location to the target communal storage location. Copying native data means there is less overhead than other data copying methods.

You can replicate data between Eon Mode databases in different clouds and on-premises.

Requirements

The Eon Mode databases you want to replicate data between must meet the following requirements:

  • The source and target namespaces of the replicated tables must have the same shard count and segmentation. The REPLICATE command accepts an optional target namespace parameter that specifies the namespace to which the tables are replicated in the target cluster. If you do not specify a target namespace, objects are replicated to a namespace with the same name as the source namespace. If no such namespace exists in the target cluster, it is created with the same name and shard count as the source namespace.

  • The user running the commands must have superuser privileges in the source and target databases.

  • The target database must be able to access the source database and its communal storage. If there are firewalls between the source and target database, configure them to give the target access to the source. Additionaly, the target database must have credentials to access the source database's communal storage. See Configuration.

  • The two databases must be running compatible versions of the Vertica server. The target database's version of the server has to be the same version or up to one major version higher than the source database.

  • The table or tables you replicate must not have columns with foreign constraints that are not part of the replication. For example, if you try to replicate the store.store_sales_fact table of the VMart sample database without also replicating the public.date_dimension table, Vertica returns this error:

    ERROR 3437:  [database_name] Foreign constraint fk_store_sales_date 
    of Table store. store_sales_fact referencing public.date_dimension 
    detected, but only one table is included in the snapshot
    

Configuration

As mentioned above, your target database must be able to access the source database's communal storage. The necessary configuration steps depend on the type of communal storage used by the source database:

  • For S3-based storage (including S3-compatible systems such as FlashBlade), create a separate configuration for the source communal storage. See Per-bucket S3 configurations for more information.
  • For other systems, such as Google Cloud Storage (GCS), that use identities to control access, give the target database's identity access to the source database's communal storage. For example, if both databases use GCS storage, grant the identity the target database uses to access its own communal storage access to the source communal storage. If the databases do not use the same type of storage, configure the target database with the credentials necessary to access the source storage. See Azure Blob Storage object store and Google Cloud Storage (GCS) object store for instructions on how to set up authentication for these object stores.

If you are attempting to replicate data from an on-premises database to a cloud-based database, you must create a connection that grants your cloud access to your on-premises data store.

To secure the connections between your databases, you must secure two separate connections:

  • Before transferring data using client-server and command channels, you must either connect the target database to the source database or the source database to the target database. See Configuring client-server TLS and Control channel Spread TLS for instructions on encrypting these connections.
  • You can encrypt the connections that transfer the data between the communal data stores by setting parameters in the target database for the object store used by the source database. These parameters are AWSEnableHttps for S3 (and compatible), AzureStorageEndpointConfig for Azure, and GCSEnableHttps for GCS.

Replication steps

Follow these steps to replicate data between two Eon Mode databases:

  1. Connect to the target or source database and log in as a user with superuser privileges.
  2. Connect to the other database using the CONNECT TO VERTICA statement.
  3. Use the REPLICATE statement, passing it the name of the table to replicate, or use the INCLUDE and EXCLUDE options to select several tables or schemas. The format of the replication call depends on whether you connected to the target or source database in step 1:
* Connected to target: specify the source database in the REPLICATE statement using a `FROM` *`source_db`* clause.
* Connected to source: specify the target database in the REPLICATE statement using a `TO` *`target_db`* clause.

Both of these methods copy the shard data directly from the source communal storage location to the target communal storage location.
  1. Optionally, use the DISCONNECT statement to disconnect.

This example, initiated on the target database, replicates a table named customers from the source database named verticadb:

=> CONNECT TO VERTICA source_db USER dbadmin 
           PASSWORD 'mypassword' ON 'vertica_node01', 5433;
 CONNECT

=> REPLICATE customers FROM source_db;
REPLICATE

The following statements execute the same data replication as the previous example, but this replication is initiated from the source database instead of the target database:

=> CONNECT TO VERTICA target_db USER dbadmin 
           PASSWORD 'mypassword' ON 'vertica_node01', 5433;
 CONNECT

=> REPLICATE customers TO target_db;
REPLICATE

To replicate more than one table at a time, use the INCLUDE option to include tables or schemas using wildcards. This example replicates all tables in the public schema starting with the letter t:

=> REPLICATE INCLUDE "public.t*" FROM source_db;

Use the EXCLUDE option with INCLUDE to limit the tables that Vertica replicates. This example replicates all tables in the public schema except those that start with the string customer_.

=> REPLICATE INCLUDE "public.*" EXCLUDE '*.customer_*' FROM source_db;

Scheduled replication

You can use replication to keep one or more tables in the target database up-to-date with tables in the source database by scheduling automatic replications. When you call the REPLICATE statement repeatedly, it automatically determines what has changed in the source table since the last replication and only transfers the data that has changed in the source database.

When determining the interval between scheduled replications, you should consider the amount of data being transferred, the resources assigned to the Tuple Mover, and data transfer concurrency. If replicating across platforms, you should also take into account your network bandwidth. Based on these factors, you can tune your resources and environment to find an optimal interval between scheduled replications.

In many cases, the best option is to balance the time sensitivity of the queries in the target database that use the replicated data with the amount of data that each replication must process. After you decide how often to run replications, create a script to perform the replications. The script simply needs to follow the steps in Replication steps to replicate the table a single time. Then set up a scheduled task to run the script.

Monitoring replication

The V_MONITOR.REPLICATION_STATUS system table in the target database shows information about database replications. You can use it to monitor ongoing replications and the results of prior replications:

=> SELECT node_name, status, transaction_id FROM REPLICATION_STATUS 
   ORDER BY start_time ASC;
       node_name       |      status       |  transaction_id   
-----------------------+-------------------+-------------------
 v_target_db_node0001  | completed         | 45035996273706044
 v_target_db_node0002  | completed         | 45035996273706044
 v_target_db_node0003  | completed         | 45035996273706044
 v_target_db_node0001  | completed         | 45035996273706070
 v_target_db_node0002  | completed         | 45035996273706070
 v_target_db_node0003  | completed         | 45035996273706070
 v_target_db_node0002  | data transferring | 45035996273706136
 v_target_db_node0003  | data transferring | 45035996273706136
 v_target_db_node0001  | data transferring | 45035996273706136
(9 rows)