This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Data export and replication
Vertica supports transferring data by exporting or replicating it.
Vertica supports transferring data by exporting or replicating it. These two methods differ slightly:
- Exporting data
- Transforms the data into a new format. You can export data to files to use with other tools or as Vertica external tables. You can also export data to another Vertica database using the COPY FROM VERTICA or EXPORT TO VERTICA statements.
- Replicating data
- Copies data in the Vertica native format. Replicating data is more efficient than exporting because Vertica does not need to decompress, sort, and recompress the data. However, unlike exporting, replication only copies data between Vertica databases running in the same mode (Eon or Enterprise).
Data export
Reasons you might want to export data from Vertica include:
-
To use the data in external tables; see Working with external data.
-
To share data with other clients or consumers in an ecosystem.
-
To copy data to another Vertica cluster.
Vertica provides two ways to export data:
-
Export data to files using data exporters. You can export the results of a SELECT query to files in Parquet, ORC, or delimited format. You can use partitioning in the export to reduce the file size and improve performance when reading the data in external tables. For more information, see File export.
-
Export data directly between Vertica clusters. Direct export lets you copy data between databases without having to save it to an intermediate file. You can perform the operation as either an import from another database or an export to another database; aside from the direction of data travel, these two operations are equal. Direct export lets you copy data between Vertica databases which have different configurations, such as clusters running in different modes or have a different number of nodes or shards. For more information, see Database export and import.
Data replication
Replication transfers data between Vertica databases in native format. Vertica supports two types of replication:
-
vbr
-based replication uses the Vertica backup script to copy ROS containers from one database to another. This type of replication supports both Eon Mode and Enterprise Mode databases. However, the two databases must be running in the same mode and have the same number of nodes (for Enterprise Mode databases) or primary nodes (for Eon Mode databases). See Replicating objects to another database cluster for more information about this replication method.
-
Server-based replication is an Eon Mode-only method that copies data directly from the communal store of one database to another. Both databases must have the same shard segmentation layout and run compatible versions of the Vertica server. The replication feature is backwards compatible on the target side of the replication. The target database can run a later version of the database than the source database.
1 - File export
You might want to export a table or other query results from Vertica, either to share it with other applications or to move lower-priority data from ROS to less-expensive storage.
You might want to export a table or other query results from Vertica, either to share itwith other applications or to move lower-priority data fromROS to less-expensive storage. Vertica supports several EXPORT TO statements for different file formats: EXPORT TO PARQUET, EXPORT TO ORC, EXPORT TO JSON, and EXPORT TO DELIMITED.
You can export data to HDFS, S3, Google Cloud Storage (GCS), Azure Blob Storage, or the Linux file system. Be careful to avoid concurrent exports to the same output destination. Doing so is an error on any file system and can produce incorrect results.
Not all exporters support all complex types. See the Data Types section on individual reference pages.
You can export ROS data or data that is readable through external tables. After exporting ROS data, you can drop affected ROS partitions to reclaim storage space.
After exporting data, you can define external tables to read that data in Vertica. Parquet and ORC are columnar formats that Vertica can take advantage of in query optimization, so consider using one of those formats for external tables. See Working with external data.
1.1 - Syntax
EXPORT TO PARQUET, EXPORT TO ORC, EXPORT TO JSON, and EXPORT TO DELIMITED have the same general syntax.
Use the EXPORT statements to export data specified by a SELECT statement, as in the following example:
=> EXPORT TO PARQUET(directory='webhdfs:///data/sales_data')
AS SELECT * FROM public.sales;
Rows Exported
---------------
14336
(1 row)
The directory
parameter specifies where to write the files and is required. You must have permission to write to the output directory. You can use the ifDirExists
parameter to specify what to do if the directory already exists:
You can export to HDFS, S3, GCS, Azure, or the Linux file system. For additional considerations specific to object stores, see Exporting to object stores. If you export to the local Linux file system (not a shared NFS mount) you must take some additional steps; see Exporting to the Linux file system.
Note
If you perform more than one concurrent export to the same directory, only one succeeds. If you perform concurrent exports with ifDirExists=overwrite
, the results will be incorrect.
You can use EXPORT statements to write queries across multiple tables in Vertica and export the results. With this approach you can take advantage of powerful, fast query execution in Vertica while making the results available to other clients:
=> EXPORT TO ORC(directory='webhdfs:///data/sales_by_region')
AS SELECT sale.price, sale.date, store.region
FROM public.sales sale
JOIN public.vendor store ON sale.distribID = store.ID;
Rows Exported
---------------
23301
(1 row)
Data types
All exporters can export scalar types. All except EXPORT TO DELIMITED also support primitive arrays. The JSON and Parquet exporters also support the ARRAY and ROW types in any combination:
=> SELECT * FROM restaurants;
name | cuisine | location_city | menu
-------------------+---------+----------------------------+------------------------------------------------------------------------------
Bob's pizzeria | Italian | ["Cambridge","Pittsburgh"] | [{"item":"cheese pizza","price":null},{"item":"spinach pizza","price":10.5}]
Bakersfield Tacos | Mexican | ["Pittsburgh"] | [{"item":"veggie taco","price":9.95},{"item":"steak taco","price":10.95}]
(2 rows)
=> EXPORT TO JSON (directory='/output/json') AS SELECT * FROM restaurants;
Rows Exported
---------------
2
(1 row)
=> \! cat /output/json/*.json
{"name":"Bob's pizzeria","cuisine":"Italian","location_city":["Cambridge","Pittsburgh"],"menu":[{"item":"cheese pizza","price":null},{"item":"spinach pizza","price":10.5}]}
{"name":"Bakersfield Tacos","cuisine":"Mexican","location_city":["Pittsburgh"],"menu":[{"item":"veggie taco","price":9.95},{"item":"steak taco","price":10.95}]}
Query
EXPORT statements rewrite the query you specify, because the export is done by a user-defined transform function (UDTF). Because of this rewrite, there are some restrictions on the query you supply.
The query can contain only a single outer SELECT statement. For example, you cannot use UNION:
=> EXPORT TO PARQUET(directory = '/mnt/shared_nfs/accounts/rm')
OVER(PARTITION BY hash)
AS
SELECT 1 as account_id, '{}' as json, 0 hash
UNION ALL
SELECT 2 as account_id, '{}' as json, 1 hash;
ERROR 8975: Only a single outer SELECT statement is supported
HINT: Please use a subquery for multiple outer SELECT statements
Instead, rewrite the query to use a subquery:
=> EXPORT TO PARQUET(directory = '/mnt/shared_nfs/accounts/rm')
OVER(PARTITION BY hash)
AS
SELECT
account_id,
json
FROM
(
SELECT 1 as account_id, '{}' as json, 0 hash
UNION ALL
SELECT 2 as account_id, '{}' as json, 1 hash
) a;
Rows Exported
---------------
2
(1 row)
To use composite statements such as UNION, INTERSECT, and EXCEPT, rewrite them as subqueries.
When exporting data you can use an OVER() clause to partition and sort the data as described in Partitioning and sorting data. Partitioning and sorting can improve query performance. If you partition the data, you cannot specify schema and table names in the SELECT statement. In this case, specify only the column name.
All exporters have a required directory
parameter. All allow you to specify a compression type, and each exporter has format-specific parameters. See the parameter descriptions on the individual reference pages: EXPORT TO PARQUET, EXPORT TO ORC, EXPORT TO JSON, and EXPORT TO DELIMITED.
1.2 - Partitioning and sorting data
When exporting, you can use the optional OVER clause in the SELECT statement to specify how to partition and/or sort the exported data.
When exporting, you can use the optional OVER clause in the SELECT statement to specify how to partition and/or sort the exported data. Partitioning reduces the sizes of the output data files and canimprove performance when Vertica queries external tables containing thisdata. (See Partitioned data.) If you do not specify how to partition the data, Vertica optimizes the export for maximum parallelism.
To specify partition columns, use PARTITION BY in theOVER clause as in the following example:
=> EXPORT TO PARQUET(directory = 'webhdfs:///data/export')
OVER(PARTITION BY date) AS SELECT transactionID, price FROM public.sales;
Rows Exported
---------------
28337
(1 row)
You can both partition by a column and include that column in the SELECT clause. Including the column allows you to sort it. Vertica still takes advantage of the partitioning during query execution.
You can sort values within a partition for a further performance improvement. Sort table columns based on the likelihood of their occurrence in query predicates; columns that most frequently occur in comparison or range predicates should be sorted first. You can sort values within each partition using ORDER BY in the OVER clause:
=> EXPORT TO PARQUET(directory = 'webhdfs:///data/export')
OVER(PARTITION BY date ORDER BY price) AS SELECT date, price FROM public.sales;
Rows Exported
---------------
28337
(1 row)
You can use ORDER BY even without partitioning. Storing data in sorted order can improve data access and predicate evaluation performance.
Targets in the OVER clause must be column references; they cannot beexpressions.For more information about OVER, see SQL analytics.
If you are exporting data to a local file system, you might want to force a single node to write all of the files. To do so, use an empty OVER clause.
1.3 - Exporting to object stores
Object-store file systems (S3, Google Cloud Storage, and Azure Blob Storage) have some differences from other file systems that affect data export.
Object-store file systems (S3, Google Cloud Storage, and Azure Blob Storage) have some differences from other file systems that affect data export. You must set some additional configuration parameters for authentication and region, and there are some restrictions on the output.
URI formats and configuration parameters are described on the following reference pages:
Configuration parameters affect all access to the corresponding object stores, both reads and writes. Instead of setting them globally, you can limit the effects of your settings by setting them at the session level before exporting data.
Overwrites
By default, exports fail if the target directory already exists. You can use the ifDirExists
parameter to instead append or overwrite. If you specify overwrite
for an export to an object store, the existing directory is deleted recursively at the beginning of the operation and is not restored if the operation fails. Be careful not to export to a directory containing data you want to keep.
Output restrictions
Object-store file systems do not support renaming files in place; they implement a rename as a copy followed by a delete. On other file systems, the exporters support atomicity by writing output into a temporary directory and renaming it when complete. Such an approach is impractical for object stores, so the exporters write directly to the destination path. It is therefore possible to begin reading the exported data before the export has finished, which could lead to errors. Be careful to wait for the export to finish before using the data.
Vertica does not support simultaneous exports to the same path in object stores. The results are undefined.
S3 limits buckets to 5TB. You might need to divide very large exports.
1.4 - Exporting to the Linux file system
Exports to the local file system can be to an NFS mount (shared) or to the Linux file system on each node (non-shared).
Exports to the local file system can be to an NFS mount (shared) or to the Linux file system on each node (non-shared). If you export to an NFS mount on the Linux file system, the exporters behave the same way as for any other shared location: all of the exported files are written to the same (shared) destination.
If you export to the local Linux file system, each Vertica node writes its portion of the export to its local (non-shared) file system. Exports to non-shared local file systems have the following restrictions:
-
The output directory must not exist on any node.
-
You must have a USER storage location or superuser privileges.
-
You cannot override the permissions mode of 700 for directories and 600 for files.
To define an external table to read data exported to a non-shared local file system, use COPY...ON EACH NODE, as in the following example:
=> CREATE EXTERNAL TABLE sales (...)
AS COPY FROM '/data/sales/*.parquet' ON EACH NODE PARQUET;
The path is the same path that you specified in the export statement and is the same on all nodes.
If you omit the ON clause and just specify a path, COPY only loads the data it finds on the initiator node.
1.5 - Monitoring exports
You can review information about exports, including numbers of row groups, file sizes, and file names.
You can review information about exports, including numbers of row groups, file sizes, and file names.
The export statements are UDxs. The UDX_EVENTS system table records events logged during UDx execution, including timestamps, node names, and session IDs. This table contains a column (RAW), which holds a VMap of whatever additional data an individual UDx logged. The export statements log details about the exported files in this table. While you can work with this table directly and materialize values from the VMap column, you might prefer to define a view to simplify your access.
The following statement defines a view showing only the events from EXPORT TO PARQUET
, materializing the VMap values.
=> CREATE VIEW parquet_export_events AS
SELECT
report_time,
node_name,
session_id,
user_id,
user_name,
transaction_id,
statement_id,
request_id,
udx_name,
file,
created,
closed,
rows,
row_groups,
size_mb
FROM
v_monitor.udx_events
WHERE
udx_name ilike 'ParquetExport%';
The exporters report the following UDx-specific columns:
Column Name |
Data Type |
Description |
FILE |
VARCHAR |
Name of the output file. |
CREATED |
TIMESTAMPTZ |
When the file was created. |
CLOSED |
TIMESTAMPTZ |
When the file was closed after writing. |
ROWS |
INTEGER |
The total number of rows in the file. |
ROW_GROUPS |
INTEGER |
The number of row groups, for formats that use them (Parquet and ORC). |
SIZE_MB |
FLOAT |
File size. |
The following example shows the results of a single export.
=> SELECT file,rows,row_groups,size_mb FROM PARQUET_EXPORT_EVENTS;
file | rows | row_groups | size_mb
-----------------------------------------------------------------------+-------+------------+----------
/data/outgZxN3irt/450c4213-v_vmart_node0001-139770732459776-0.parquet | 29696 | 1 | 0.667203
/data/outgZxN3irt/9df1c797-v_vmart_node0001-139770860660480-0.parquet | 29364 | 1 | 0.660922
(2 rows)
In this table, the output directory name (/data/out) is appended with a generated string (gZxN3irt). For exports to HDFS or to local file systems (including NFS), EXPORT TO PARQUET first writes data into a scratch directory and then renames it at the end of the operation. The events are logged during export and so show the temporary name. Some output destinations, such as AWS S3, do not support rename operations, so in those cases this table does not show generated names.
2 - Database export and import
Vertica can easily import data from and export data to other Vertica databases.
Vertica can easily import data from and export data to other Vertica databases. Importing and exporting data is useful for common tasks such as moving data back and forth between a development or test database and a production database, or between databases that have different purposes but need to share data on a regular basis.
Moving data directly between databases
To move data between databases you first establish a connection using CONNECT TO VERTICA and then use one of the following statements to move data:
These statements are symmetric; copying from cluster A to cluster B is the same as exporting from cluster B to cluster A. The difference is only in which cluster drives the operation.
To configure TLS settings for the connection, see Configuring connection security between clusters.
Creating SQL scripts to export data
Three functions return a SQL script you can use to export database objects to recreate elsewhere:
While copying and exporting data is similar to Backing up and restoring the database, you should use them for different purposes, outlined below:
Task |
Backup and Restore |
COPY and EXPORT Statements |
Back up or restore an entire database, or incremental changes |
YES |
NO |
Manage database objects (a single table or selected table rows) |
YES |
YES |
Use external locations to back up and restore your database |
YES |
NO |
Use direct connections between two databases |
OBJECT RESTORE ONLY |
YES |
Use external shell scripts to back up and restore your database |
YES |
NO |
Use SQL commands to incorporate copy and export tasks into DB operations |
NO |
YES |
The following sections explain how you import and export data between Vertica databases.
When importing from or exporting to a Vertica database, you can connect only to a database that uses trusted (username only) or password-based authentication, as described in Security and authentication. OAuth and Kerberos authentication methods are not supported.
Other exports
This section is about exporting data to another Vertica database. For information about exporting data to files, which can then be used in external tables or COPY statements, see File export.
2.1 - Configuring connection security between clusters
When copying data between clusters, Vertica can encrypt both data and plan metadata.
When copying data between clusters, Vertica can encrypt both data and plan metadata.
Data is encrypted if you configure internode encryption (see Internode TLS).
For metadata, by default Vertica tries TLS first and falls back to plaintext. You can configure Vertica to require TLS and to fail if the connection cannot be made. You can also have Vertica verify the certificate and hostname before connecting.
Enabling TLS between clusters
To use TLS between clusters, you must first configure TLS between nodes:
-
Set the EncryptSpreadComms parameter.
-
Configure the data_channel TLS Configuration.
-
Set the ImportExportTLSMode parameter.
To specify the level of strictness when connecting to another cluster, set the ImportExportTLSMode configuration parameter. This parameter applies for both importing and exporting data. The possible values are:
-
PREFER
: Try TLS but fall back to plaintext if TLS fails.
-
REQUIRE
: Use TLS and fail if the server does not support TLS.
-
VERIFY_CA
: Require TLS (as with REQUIRE), and also validate the other server's certificate using the CA specified by the "server" TLS Configuration's CA certificates (in this case, "ca_cert" and "ica_cert"):
=> SELECT name, certificate, ca_certificate, mode FROM tls_configurations WHERE name = 'server';
name | certificate | ca_certificate | mode
--------+------------------+--------------------+-----------
server | server_cert | ca_cert,ica_cert | VERIFY_CA
(1 row)
-
VERIFY_FULL
: Require TLS and validate the certificate (as with VERIFY_CA), and also validate the server certificate's hostname.
-
REQUIRE_FORCE
, VERIFY_CA_FORCE
, and VERIFY_FULL_FORCE
: Same behavior as REQUIRE
, VERIFY_CA
, and VERIFY_FULL
, respectively, and cannot be overridden by CONNECT TO VERTICA.
ImportExportTLSMode is a global parameter that applies to all import and export connections you make using CONNECT TO VERTICA. You can override it for an individual connection.
For more information about these and other configuration parameters, see Security parameters.
2.2 - Exporting data to another database
EXPORT TO VERTICA exports table data from one Vertica database to another.
EXPORT TO VERTICA exports table data from one Vertica database to another. The following requirements apply:
-
You already opened a connection to the target database with CONNECT TO VERTICA.
-
The source database is no more than one major release behind the target database.
-
The table in the target database must exist.
-
Source and target table columns must have the same or compatible data types.
Each EXPORT TO VERTICA statement exports data from only one table at a time. You can use the same database connection for multiple export operations.
Export process
Exporting is a three-step process:
-
Connect to the target database with CONNECT TO VERTICA.
For example:
=> CONNECT TO VERTICA testdb USER dbadmin PASSWORD '' ON 'VertTest01', 5433;
CONNECT
-
Export the desired data with EXPORT TO VERTICA. For example, the following statement exports all table data in customer_dimension
to a table of the same name in target database testdb
:
=> EXPORT TO VERTICA testdb.customer_dimension FROM customer_dimension;
Rows Exported
---------------
23416
(1 row)
-
DISCONNECT disconnects from the target database when all export and import operations are complete:
=> DISCONNECT testdb;
DISCONNECT
Note
Closing your session also closes the database connection. However, it is a good practice to explicitly close the connection to the other database, both to free up resources and to prevent issues with other SQL scripts that might be running in your session. Always closing the connection prevents potential errors if you run a script in the same session that attempts to open a connection to the same database, since each session can only have one connection to a given database at a time.
Mapping between source and target columns
If you export all table data from one database to another as in the previous example, EXPORT TO VERTICA can omit specifying column lists. This is possible only if column definitions in both tables comply with the following conditions:
If any of these conditions is not true, the EXPORT TO VERTICA statement must include column lists that explicitly map source and target columns to each other, as follows:
-
Contain the same number of columns.
-
List source and target columns in the same order.
-
Pair columns with the same (or compatible) data types.
For example:
=> EXPORT TO VERTICA testdb.people (name, gender, age)
FROM customer_dimension (customer_name, customer_gender, customer_age);
Exporting subsets of table data
In general, you can export a subset of table data in two ways:
In both cases, the EXPORT TO VERTICA statement typically must specify column lists for the source and target tables.
The following example exports data from three columns in the source table to three columns in the target table. Accordingly, the EXPORT TO VERTICA statement specifies a column list for each table. The order of columns in each list determines how Vertica maps target columns to source columns. In this case, target columns name
, gender
, and age
map to source columns customer_name
, customer_gender
, and customer_age
, respectively:
=> EXPORT TO VERTICA testdb.people (name, gender, age) FROM customer_dimension
(customer_name, customer_gender, customer_age);
Rows Exported
---------------
23416
(1 row)
The next example queries source table customer_dimension
, and exports the result set to table ma_customers
in target database testdb
:
=> EXPORT TO VERTICA testdb.ma_customers(customer_key, customer_name, annual_income)
AS SELECT customer_key, customer_name, annual_income FROM customer_dimension WHERE customer_state = 'MA';
Rows Exported
---------------
3429
(1 row)
Note
In this example, the source and target column names are identical, so specifying a columns list for target table ma_customers
is optional. If one or more of the queried source columns did not have a match in the target table, the statement would be required to include a columns list for the target table.
Exporting IDENTITY columns
You can export tables (or columns) that contain IDENTITY values, but the sequence values are not incremented automatically at the target table. You must use ALTER SEQUENCE to make updates.
Export IDENTITY columns as follows:
-
If both source and destination tables have an IDENTITY column and configuration parameter CopyFromVerticaWithIdentity is set to true (1), you do not need to list them.
-
If source table has an IDENTITY column, but target table does not, you must explicitly list the source and target columns.
Caution
Failure to list which IDENTITY columns to export can cause an error, because the IDENTITY column will be interpreted as missing in the destination table.
By default, EXPORT TO VERTICA exports all IDENTITY columns . To disable this behavior globally, set the CopyFromVerticaWithIdentity
configuration parameter.
2.3 - Copying data from another Vertica database
COPY FROM VERTICA imports table data from one Vertica database to another.
COPY FROM VERTICA imports table data from one Vertica database to another. The following requirements apply:
-
You already opened a connection to the target database with CONNECT TO VERTICA.
-
The source database is no more than one major release behind the target database.
-
The table in the target database must exist.
-
Source and target table columns must have the same or compatible data types.
Import process
Importing is a three-step process:
-
Connect to the source database with CONNECT TO VERTICA. For example:
=> CONNECT TO VERTICA vmart USER dbadmin PASSWORD '' ON 'VertTest01',5433;
CONNECT
-
Import the desired data with COPY FROM VERTICA. For example, the following statement imports all table data in customer_dimension
to a table of the same name:
=> COPY customer_dimension FROM VERTICA vmart.customer_dimension;
Rows Loaded
-------------
500000
(1 row)
=> DISCONNECT vmart;
DISCONNECT
Note
Successive COPY FROM VERTICA statements in the same session can import data from multiple tables over the same connection.
-
DISCONNECT disconnects from the source database when all import and export operations are complete:
=> DISCONNECT vmart;
DISCONNECT
Note
Closing your session also closes the database connection. However, it is a good practice to explicitly close the connection to the other database, both to free up resources and to prevent issues with other SQL scripts that might be running in your session. Always closing the connection prevents potential errors if you run a script in the same session that attempts to open a connection to the same database, since each session can only have one connection to a given database at a time.
Importing IDENTITY columns
You can import IDENTITY columns as follows:
-
If both source and destination tables have an IDENTITY column and configuration parameter CopyFromVerticaWithIdentity is set to true (1), you do not need to list them.
-
If source table has an IDENTITY column, but target table does not, you must explicitly list the source and target columns.
Caution
Failure to list which IDENTITY columns to export can cause an error, because the IDENTITY column will be interpreted as missing in the destination table.
After importing the columns, the IDENTITY column values do not increment automatically. Use ALTER SEQUENCE to make updates.
The default behavior for this statement is to import IDENTITY columns by specifying them directly in the source table. To disable this behavior globally, set the CopyFromVerticaWithIdentity configuration parameter.
2.4 - Copy and export data on AWS
There are common issues that occur when exporting or copying on AWS clusters, as described below.
There are common issues that occur when exporting or copying on AWS clusters, as described below. Except for these specific issues as they relate to AWS, copying and exporting data works as documented in Database export and import.
To copy or export data on AWS:
-
Verify that all nodes in source and destination clusters have their own elastic IPs (or public IPs) assigned.
If your destination cluster is located within the same VPC as your source cluster, proceed to step 3. Each node in one cluster must be able to communicate with each node in the other cluster. Thus, each source and destination node needs an elastic IP (or public IP) assigned.
-
(For non-CloudFormation Template installs) Create an S3 gateway endpoint.
If you aren't using a CloudFormation Template (CFT) to install Vertica, you must create an S3 gateway endpoint in your VPC. For more information, see the AWS documentation.
For example, the Vertica CFT has the following VPC endpoint:
"S3Enpoint" : {
"Type" : "AWS::EC2::VPCEndpoint",
"Properties" : {
"PolicyDocument" : {
"Version":"2012-10-17",
"Statement":[{
"Effect":"Allow",
"Principal": "*",
"Action":["*"],
"Resource":["*"]
}]
},
"RouteTableIds" : [ {"Ref" : "RouteTable"} ],
"ServiceName" : { "Fn::Join": [ "", [ "com.amazonaws.", { "Ref": "AWS::Region" }, ".s3" ] ] },
"VpcId" : {"Ref" : "VPC"}
}
-
Verify that your security group allows the AWS clusters to communicate.
Check your security groups for both your source and destination AWS clusters. Verify that ports 5433 and 5434 are open. If one of your AWS clusters is on a separate VPC, verify that your network access control list (ACL) allows communication on port 5434.
Note
Note:
This communication method exports and copies (imports) data across the Internet. You can alternatively use non-public IPs and gateways, or VPN to connect the source and destination clusters.
-
If there are one or more elastic load balancers (ELBs) between the clusters, verify that port 5433 is open between the ELBs and clusters.
-
If you use the Vertica client to connect to one or more ELBs, the ELBs only distribute incoming connections. The data transmission path occurs between clusters.
2.5 - Changing node export addresses
You can change the export address for your Vertica cluster.
You can change the export address for your Vertica cluster. You might need to do so to export data between clusters in different network subnets.
-
Create a subnet for importing and exporting data between Vertica clusters. The CREATE SUBNET statement identifies the public network IP addresses residing on the same subnet.
=> CREATE SUBNET kv_subnet with '10.10.10.0';
-
Alter the database to specify the subnet name of a public network for import/export.
=> ALTER DATABASE DEFAULT EXPORT ON kv_subnet;
-
Create network interfaces for importing and exporting data from individual nodes to other Vertica clusters. The CREATE NETWORK INTERFACE statement identifies the public network IP addresses residing on multiple subnets.
=> CREATE NETWORK INTERFACE kv_node1 on v_VMartDB_node0001 with '10.10.10.1';
=> CREATE NETWORK INTERFACE kv_node2 on v_VMartDB_node0002 with '10.10.10.2';
=> CREATE NETWORK INTERFACE kv_node3 on v_VMartDB_node0003 with '10.10.10.3';
=> CREATE NETWORK INTERFACE kv_node4 on v_VMartDB_node0004 with '10.10.10.4';
For users on Amazon Web Services (AWS) or using Network Address Translation (NAT), refer to Vertica on Amazon Web Services.
-
Alter the node settings to change the export address. When used with the EXPORT ON clause, the ALTER NODE specifies the network interface of the public network on individual nodes for importing and exporting data.
=> ALTER NODE v_VMartDB_node0001 export on kv_node1;
=> ALTER NODE v_VMartDB_node0002 export on kv_node2;
=> ALTER NODE v_VMartDB_node0003 export on kv_node3;
=> ALTER NODE v_VMartDB_node0004 export on kv_node4;
-
Verify if the node address and the export address are different on different network subnets of the Vertica cluster.
=> SELECT node_name, node_address, export_address FROM nodes;
node_name | node_address | export_address
-------------------+-----------------+----------------
v_VMartDB_node0001 | 192.168.100.101 | 10.10.10.1
v_VMartDB_node0002 | 192.168.100.102 | 10.10.10.2
v_VMartDB_node0003 | 192.168.100.103 | 10.10.10.3
v_VMartDB_node0004 | 192.168.100.104 | 10.10.10.4
Creating a network interface and altering the node settings to change the export address takes precedence over creating a subnet and altering the database for import/export.
2.6 - Using public and private IP networks
In many configurations, Vertica cluster hosts use two network IP addresses as follows:.
In many configurations, Vertica cluster hosts use two network IP addresses as follows:
By default, importing from and exporting to another Vertica database uses the private network.
Note
Ensure port 5433 or the port the Vertica database is using is not blocked.
To use the public network address for copy and export activities, as well as moving large amounts of data, configure the system to use the public network to support exporting to or importing from another Vertica cluster:
Vertica encrypts data during transmission (if you have configured a certificate). Vertica attempts to also encrypt plan metadata but, by default, falls back to plaintext if needed. You can configure Vertica to require encryption for metadata too; see Configuring connection security between clusters.
In certain instances, both public and private addresses exceed the demand capacity of a single Local Area Network (LAN). If you encounter this type of scenario, then configure your Vertica cluster to use two LANs: one for public network traffic and one for private network traffic.
2.6.1 - Identify the public network to Vertica
To be able to import to or export from a public network, Vertica needs to be aware of the IP addresses of the nodes or clusters on the public network that will be used for import/export activities.
To be able to import to or export from a public network, Vertica needs to be aware of the IP addresses of the nodes or clusters on the public network that will be used for import/export activities. Your public network might be configured in either of these ways:
To identify public network IP addresses residing on the same subnet:
- Use the CREATE SUBNET statement provide your subnet with a name and to identify the subnet routing prefix.
To identify public network IP addresses residing on multiple subnets:
After you've identified the subnet or network interface to be used for import/export, you must Identify the Database Or Nodes Used For Import/Export.
See also
2.6.2 - Identify the database or nodes used for import/export
After you identify the public network to Vertica, you can configure a database and its nodes to use it for import and export operations:.
After you identify the public network to Vertica, you can configure a database and its nodes to use it for import and export operations:
-
Use
ALTER DATABASE
to specify a subnet on the public network for the database. After doing so, all nodes in the database automatically use the network interface on the subnet for import/export operations.
-
On each database node, use
ALTER NODE
to specify a network interface of the public network.
See also
2.7 - Handling node failure during copy/export
When an export (EXPORT TO VERTICA) or import from Vertica (COPY FROM VERTICA) task is in progress, and a non-initiator node fails, Vertica does not complete the task automatically.
When an export (EXPORT TO VERTICA
) or import from Vertica (COPY FROM VERTICA
) task is in progress, and a non-initiator node fails, Vertica does not complete the task automatically. A non-initiator node is any node that is not the source or target node in your export or import statement. To complete the task, you must run the statement again.
You address the problem of a non-initiator node failing during an import or export as follows:
Note
Both Vertica databases must be running in a safe state.
-
You export or import from one cluster to another using the EXPORT TO VERTICA
or COPY FROM VERTICA
statement.
During the export or import, a non-initiating node on the target or source cluster fails. Vertica issues an error message that indicates possible node failure, one of the following:
-
ERROR 4534: Receive on v_tpchdb1_node0002: Message receipt from v_tpchdb2_node0005 failed
-
WARNING 4539: Received no response from v_tpchdb1_node0004 in abandon plan
-
ERROR 3322: [tpchdb2] Execution canceled by operator
-
Complete your import or export by running the statement again. The failed node does not need to be up for Vertica to successfully complete the export or import.
2.8 - Using EXPORT functions
Vertica provides several EXPORT_ functions that let you recreate a database, or specific schemas and tables, in a target database.
Vertica provides several EXPORT_ functions that let you recreate a database, or specific schemas and tables, in a target database. For example, you can use the EXPORT_ functions to transfer some or all of the designs and objects you create in a development or test environment to a production database.
The EXPORT_ functions create SQL scripts that you can run to generate the exported database designs or objects. These functions serve different purposes to the export statements, COPY FROM VERTICA (pull data) and EXPORT TO VERTICA (push data). These statements transfer data directly from source to target database across a network connection between both. They are dynamic actions and do not generate SQL scripts.
The EXPORT_ functions appear in the following table. Depending on what you need to export, you can use one or more of the functions. EXPORT_CATALOG creates the most comprehensive SQL script, while EXPORT_TABLES and EXPORT_OBJECTS are subsets of that function to narrow the export scope.
Use this function... |
To recreate... |
EXPORT_CATALOG |
These catalog items:
-
An existing schema design, tables, projections, constraints, views, and stored procedures.
-
The Database Designer-created schema design, tables, projections, constraints, and views
-
A design on a different cluster.
|
EXPORT_TABLES |
Non-virtual objects up to, and including, the schema of one or more tables. |
EXPORT_OBJECTS |
Catalog objects in order dependency for replication. |
The designs and object definitions that the script creates depend on the EXPORT_ function scope you specify. The following sections give examples of the commands and output for each function and the scopes it supports.
Saving scripts for export functions
All of the examples in this section were generated using the standard Vertica VMART database, with some additional test objects and tables. One output directory was created for all SQL scripts that the functions created:
/home/dbadmin/xtest
If you specify the destination argument as an empty string (''
), the function writes the export results to STDOUT.
Note
A superuser can export all available database output to a file with the EXPORT_ functions. For a non-superuser, the EXPORT_ functions generate a script containing only the objects to which the user has access.
2.8.1 - Exporting the catalog
Vertica function EXPORT_CATALOG generates a SQL script for copying a database design to another cluster.
Vertica function EXPORT_CATALOG generates a SQL script for copying a database design to another cluster. This script replicates the physical schema design of the source database. You call this function as follows:
EXPORT_CATALOG ( ['[destination]' [, '[scope]']] )
The SQL script conforms to the following requirements:
-
Only includes objects to which the user has access.
-
Orders CREATE statements according to object dependencies so they can be recreated in the correct sequence. For example, if a table is in a non-PUBLIC schema, the required CREATE SCHEMA statement precedes the CREATE TABLE statement. Similarly, a table's CREATE ACCESS POLICY statement follows the table's CREATE TABLE statement.
-
If possible, creates projections with their KSAFE clause, if any, otherwise with their OFFSET clause.
Setting export scope
If you omit specifying a scope, EXPORT_CATALOG exports all objects. You can set the scope of the export operation to one of the following levels:
Scope |
Exports... |
TABLES |
Tables, schemas, and table-dependent objects: constraints, and access policies |
DESIGN |
All catalog objects: schemas, tables, constraints, views, access policies, projections, SQL macros, and stored procedures. |
DIRECTED_QUERIES |
All directed queries that are stored in the catalog. For details, see Managing directed queries. |
Exporting table objects
Use the TABLES scope to generate a script that recreates all tables and the objects that they depend on: schemas, sequences, constraints, and access policies:
=> SELECT EXPORT_CATALOG (
'/home/dbadmin/xtest/sql_cat_tables.sql',
'TABLES');
EXPORT_CATALOG
-------------------------------------
Catalog data exported successfully
(1 row)
The SQL script can include the following statements:
Exporting all catalog objects
Use the DESIGN scope to export all design elements of a source database in order of their dependencies. This scope exports all catalog objects, including schemas, tables, constraints, projections, views, and access policies.
=> SELECT EXPORT_CATALOG(
'/home/dbadmin/xtest/sql_cat_design.sql',
'DESIGN' );
EXPORT_CATALOG
-------------------------------------
Catalog data exported successfully
(1 row)
The SQL script include statements that are required to recreate the database:
Projection considerations
If a projection to export was created with no ORDER BY clause, the SQL script reflects the default behavior for projections. Vertica implicitly creates projections using a sort order based on the SELECT columns in the projection definition.
The EXPORT_CATALOG script is portable if all projections are created using UNSEGMENTED ALL NODES or SEGMENTED ALL NODES.
See also
2.8.2 - Exporting tables
Vertica function EXPORT_TABLES exports DDL for tables and related objects in the current database.
Vertica function EXPORT_TABLES exports DDL for tables and related objects in the current database. The generated SQL includes all non-virtual table objects to which you have access. You can use this SQL to recreate tables and related non-virtual objects on a different cluster.
You execute EXPORT_TABLES as follows:
EXPORT_TABLES( ['[destination]' [, '[scope]']] )
The SQL script conforms to the following requirements:
-
Only includes objects to which the user has access.
-
Orders CREATE statements according to object dependencies so they can be recreated in the correct sequence. For example, if a table references a named sequence, a CREATE SEQUENCE statement precedes the CREATE TABLE statement. Similarly, a table's CREATE ACCESS POLICY statement follows the table's CREATE TABLE statement.
Setting export scope
The EXPORT_TABLES scope
argument specifies the scope of the export operation:
To export... |
Set scope to... |
All tables to which you have access, including constraints |
Empty string (' ') |
One or more named tables |
Comma-delimited list of table objects. For example:
myschema.newtable, yourschema.oldtable
You can optionally qualify the schema with the name of the current database:
mydb.myschema.newtable
|
A named table object in the current search path: a schema, table, or sequence. If you specify a schema, EXPORT_TABLES exports all table objects in that schema to which you have access. |
Table object's name and, optionally, its path:
VMart.myschema
|
Note
EXPORT_TABLES does not export views. If you specify a view name, Vertica silently ignores it and the view is omitted from the generated script. To export views, use
EXPORT_OBJECTS.
Exporting all table objects
If you set the scope parameter to an empty string (''
), EXPORT_TABLES exports all tables and their related objects. For example, the following call to EXPORT_TABLES exports all table objects in the VMart database to the specified output file.
=> SELECT EXPORT_TABLES(
'/home/dbadmin/xtest/sql_tables_empty.sql', '');
EXPORT_TABLES
-------------------------------------
Catalog data exported successfully
(1 row)
The exported SQL includes the following types of statements, depending on what is required to recreate the tables and related objects, such as schemas, sequences and access policies:
Exporting individual table objects
EXPORT_TABLES can specify a comma-separated list of tables and table-related objects such as sequences or schemas to export. The generated SQL script includes the CREATE statements for the specified objects and their dependent objects:
For example, the following call to EXPORT_TABLES exports two VMart tables: store.store_sales_fact
and store.store_dimension
:
=> SELECT export_tables('','store.store_sales_fact, store.store_dimension');
export_tables
-------------------------------------------------------------------------
CREATE TABLE store.store_dimension
(
store_key int NOT NULL,
store_name varchar(64),
store_number int,
store_address varchar(256),
store_city varchar(64),
store_state char(2),
store_region varchar(64),
floor_plan_type varchar(32),
photo_processing_type varchar(32),
financial_service_type varchar(32),
selling_square_footage int,
total_square_footage int,
first_open_date date,
last_remodel_date date,
number_of_employees int,
annual_shrinkage int,
foot_traffic int,
monthly_rent_cost int,
CONSTRAINT C_PRIMARY PRIMARY KEY (store_key) DISABLED
);
CREATE TABLE store.store_sales_fact
(
date_key int NOT NULL,
product_key int NOT NULL,
product_version int NOT NULL,
store_key int NOT NULL,
promotion_key int NOT NULL,
customer_key int NOT NULL,
employee_key int NOT NULL,
pos_transaction_number int NOT NULL,
sales_quantity int,
sales_dollar_amount int,
cost_dollar_amount int,
gross_profit_dollar_amount int,
transaction_type varchar(16),
transaction_time time,
tender_type varchar(8)
);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_date FOREIGN KEY (date_key) references public.date_dimension (date_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_product FOREIGN KEY (product_key, product_version) references public.product_dimension (product_key, product_version);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_store FOREIGN KEY (store_key) references store.store_dimension (store_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_promotion FOREIGN KEY (promotion_key) references public.promotion_dimension (promotion_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_customer FOREIGN KEY (customer_key) references public.customer_dimension (customer_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_employee FOREIGN KEY (employee_key) references public.employee_dimension (employee_key);
The following call to EXPORT_TABLES specifies to export all tables in the VMart schema store
:
=> select export_tables('','store');
export_tables
-----------------------------------------------------------------
CREATE SCHEMA store;
CREATE TABLE store.store_dimension
(
...
);
CREATE TABLE store.store_sales_fact
(
...
);
CREATE TABLE store.store_orders_fact
(
...
);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_date FOREIGN KEY (date_key) references public.date_dimension (date_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_product FOREIGN KEY (product_key, product_version) references public.product_dimension (product_key, product_version);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_store FOREIGN KEY (store_key) references store.store_dimension (store_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_promotion FOREIGN KEY (promotion_key) references public.promotion_dimension (promotion_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_customer FOREIGN KEY (customer_key) references public.customer_dimension (customer_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_employee FOREIGN KEY (employee_key) references public.employee_dimension (employee_key);
ALTER TABLE store.store_orders_fact ADD CONSTRAINT fk_store_orders_product FOREIGN KEY (product_key, product_version) references public.product_dimension (product_key, product_version);
ALTER TABLE store.store_orders_fact ADD CONSTRAINT fk_store_orders_store FOREIGN KEY (store_key) references store.store_dimension (store_key);
ALTER TABLE store.store_orders_fact ADD CONSTRAINT fk_store_orders_vendor FOREIGN KEY (vendor_key) references public.vendor_dimension (vendor_key);
ALTER TABLE store.store_orders_fact ADD CONSTRAINT fk_store_orders_employee FOREIGN KEY (employee_key) references public.employee_dimension (employee_key);
(1 row)
See also
2.8.3 - Exporting objects
The Vertica function EXPORT_OBJECTS generates a SQL script that you can use to recreate non-virtual catalog objects on a different cluster, as follows:.
The Vertica function EXPORT_OBJECTS generates a SQL script that you can use to recreate non-virtual catalog objects on a different cluster, as follows:
EXPORT_OBJECTS( ['[destination]' [, '[scope]'] [, 'mark-ksafe']] )
The SQL script conforms to the following requirements:
-
Only includes objects to which the user has access.
-
Orders CREATE statements according to object dependencies so they can be recreated in the correct sequence. For example, if a table is in a non-PUBLIC schema, the required CREATE SCHEMA statement precedes the CREATE TABLE statement. Similarly, a table's CREATE ACCESS POLICY statement follows the table's CREATE TABLE statement.
-
If possible, creates projections with their KSAFE clause, if any, otherwise with their OFFSET clause.
Setting export scope
The EXPORT_OBJECTS scope
argument specifies the scope of the export operation:
To export... |
Set scope to... |
All objects to which you have access |
Empty string ('') |
One or more named database objects and related objects |
Comma-delimited list of objects. For example:
myschema.newtable, yourschema.my-sequence
You can optionally qualify the schema with the name of the current database:
mydb.myschema.newtable
If you specify a schema, EXPORT_TABLES exports all objects in that schema to which you have access. If you name a table that references a sequence, the generated script shows the sequence, then the table that references the sequence, and finally any projections of that table.
|
Note
EXPORT_OBJECTS does not export grants. Preserving grants on libraries can be especially important when you upgrade your database: if the prototypes of UDx libraries change, Vertica drops the grants on them. In order to preserve grants on UDx libraries, back up the grants before upgrading, and then restore them in the upgraded database. For details, see
Backing up and restoring grants.
Exporting all objects
If you set the scope parameter to an empty string (''
), Vertica exports all non-virtual objects from the source database in order of dependencies. Running the generated SQL script on another cluster creates all referenced objects and their dependent objects.
By default, the function's KSAFE argument is set to true. In this case, the generated script calls MARK_DESIGN_KSAFE, which replicates K-safety of the original database.
=> SELECT EXPORT_OBJECTS(
'/home/dbadmin/xtest/sql_objects_all.sql',
'',
'true');
EXPORT_OBJECTS
-------------------------------------
Catalog data exported successfully
(1 row)
The SQL script includes the following types of statements:
The following output includes the start and end of the output SQL file, including the MARK_DESIGN_KSAFE statement:
CREATE SCHEMA store;
CREATE SCHEMA online_sales;
CREATE SEQUENCE public.my_seq ;
CREATE TABLE public.customer_dimension
(
customer_key int NOT NULL,
customer_type varchar(16),
customer_name varchar(256),
customer_gender varchar(8),
title varchar(8),
household_id int,
...
);
...
SELECT MARK_DESIGN_KSAFE(1);
Exporting individual objects
You can specify one or more objects as the function scope, where multiple objects are specified in a comma-delimited list. The names of objects in any non-PUBLIC schema must include their respective schemas. The objects to export can include schemas, tables, views, and sequences. Accordingly, the SQL script includes the following statements, depending on what objects you list and their dependencies:
If listed objects have overlapping scopes—for example, the list includes a table and one of its projections—EXPORT_OBJECTS exports the projection only once:
=> select export_objects ('','customer_dimension, customer_dimension_super');
export_objects
--------------------------------------------------------------------------
CREATE TABLE public.customer_dimension
(
customer_key int NOT NULL,
customer_type varchar(16),
customer_name varchar(256),
...
CONSTRAINT C_PRIMARY PRIMARY KEY (customer_key) DISABLED
);
CREATE ACCESS POLICY ON public.customer_dimension FOR COLUMN customer_address CASE WHEN enabled_role('administrator') THEN customer_address ELSE '**************' END ENABLE;
CREATE PROJECTION public.customer_dimension_super /*+basename(customer_dimension),createtype(L)*/
(
customer_key,
customer_type,
customer_name,
...
)
AS
SELECT customer_dimension.customer_key,
customer_dimension.customer_type,
customer_dimension.customer_name,
...
FROM public.customer_dimension
ORDER BY customer_dimension.customer_key
SEGMENTED BY hash(customer_dimension.customer_key) ALL NODES OFFSET 0;
SELECT MARK_DESIGN_KSAFE(0);
You can export stored procedures by specifying their name and the types of their formal parameters. For stored procedures with the same name but different formal parameters, you can export all implementations by exporting its parent schema.
To export a particular implementation, specify either the types or both the names and types of its formal parameters. The following example specifies the types:
=> SELECT EXPORT_OBJECTS('','raiseXY(int, int)');
EXPORT_OBJECTS
----------------------
CREATE PROCEDURE public.raiseXY(x int, y int)
LANGUAGE 'PL/vSQL'
SECURITY INVOKER
AS '
BEGIN
RAISE NOTICE ''x = %'', x;
RAISE NOTICE ''y = %'', y;
-- some processing statements
END
';
SELECT MARK_DESIGN_KSAFE(0);
(1 row)
To export all implementations of the overloaded stored procedure raiseXY, export its parent schema:
=> SELECT EXPORT_OBJECTS('','public');
EXPORT_OBJECTS
----------------------
...
CREATE PROCEDURE public.raiseXY(x int, y varchar)
LANGUAGE 'PL/vSQL'
SECURITY INVOKER
AS '
BEGIN
RAISE NOTICE ''x = %'', x;
RAISE NOTICE ''y = %'', y;
-- some processing statements
END
';
CREATE PROCEDURE public.raiseXY(x int, y int)
LANGUAGE 'PL/vSQL'
SECURITY INVOKER
AS '
BEGIN
RAISE NOTICE ''x = %'', x;
RAISE NOTICE ''y = %'', y;
-- some processing statements
END
';
SELECT MARK_DESIGN_KSAFE(0);
(1 row)
See also
Exporting tables
3 - Replication
Replication copies data in native format between Vertica databases.
Replication copies data from one Vertica database to another in native format. Usually, you use replication to keep a table or schema in the target database up to date with corresponding table or schema in the source database. Replication automatically determines what has changed in the source database and only transfers those changes to the target database.
Replication is also useful for disaster recovery (DR) and isolating workloads. For example:
- Keep a "hot spare" Vertica database that can fill in if the primary database goes down.
- Give geographically separated teams their own local databases.
- Fully isolate ETL workloads from analytics workloads by giving each their own Vertica database.
Vertica supports two forms of replication:
- vbr-based replication that uses the backup script to copy ROS containers from one database to another.
- Server-based replication that copies shards directly from one database's communal storage to another.
vbr-based replication
The vbr-replication method works between either Eon Mode or Enterprise Mode databases. You cannot replicate data between databases running in different modes or cloud platforms. The databases must have the same number of nodes (when in Enterprise Mode) or primary nodes (in Eon Mode). If replicating between Eon Mode databases, the databases must have the same shard segmentation and primary subcluster node subscriptions.
For more information about vbr-based replication, see Replicating objects to another database cluster.
Server-based replication
Server-based replication transfers data between Eon Mode databases by copying shard files from one communal storage location to another. This method supports Eon Mode databases in different clouds or on-premises. The target database must have access to the source database's communal storage, and the shard count and shard segmentation in the two databases must be the same.
Choosing between vbr-based and server-based replication
As server-based replication only works between Eon Mode databases, you only have to choose between the two types of replication if both the source and target database run in Eon Mode.
When choosing between the two modes, consider:
- Server-based replication is usually more efficient because it works within Vertica. Also, unlike vbr-based replication, server-based replication does not use SSH tunneling for encryption, avoiding potential security risks.
- For vbr-based replication, your databases must have more in common than databases using server-based replication. For example, in vbr-based replication, the source and target databases must have the same shard segmentation, primary subcluster node subscriptions, communal storage type, and number of nodes in their primary subcluster. For server-based replication, the only requirement is that the two databases have the same shard segmentation.
3.1 - Server-based replication
Server-based replication copies shard data from one Eon Mode database to another.
Eon Mode Only
Server-based replication lets you copy data from one Eon Mode database to another. This replication method copies shard data directly from the source communal storage location to the target communal storage location. Copying native data means there is less overhead than other data copying methods.
You can replicate data between Eon Mode databases in different clouds and on-premises.
Caution
Replication overwrites any data in the target table. You do not receive any notification that the data in the target database will be overwritten, even if the target table's design does not match the source table. Verify that the tables you want to replicate either do not exist in the target database or do not contain data that you need. Be especially cautious when replicating entire schemas to prevent overwriting data in the target database.
Similarly, changes to a replicated table in the target database are overwritten each time you replicate the table. When performing scheduled replications, only grant read access to replicated tables in the target database. Limiting these tables to read-only access will help prevent confusion if a user makes changes to a replicated table.
Requirements
The Eon Mode databases you want to replicate data between must meet the following requirements:
-
The two databases must have the same shard segmentation. If you need to replicate data between databases with different shard counts, consider changing the shard count in one of the databases. See Change the number of shards in the database.
-
The user running the commands must have superuser privileges in the source and target databases.
-
The target database must be able to access the source database and its communal storage. If there are firewalls between the source and target database, configure them to give the target access to the source. Additionaly, the target database must have credentials to access the source database's communal storage. See Configuration.
-
The two databases must be running compatible versions of the Vertica server. The target database's version of the server has to be the same version or up to one major version higher than the source database.
-
The table or tables you replicate must not have columns with foreign constraints that are not part of the replication. For example, if you try to replicate the store.store_sales_fact table of the VMart sample database without also replicating the public.date_dimension table, Vertica returns this error:
ERROR 3437: [database_name] Foreign constraint fk_store_sales_date
of Table store. store_sales_fact referencing public.date_dimension
detected, but only one table is included in the snapshot
Configuration
As mentioned above, your target database must be able to access the source database's communal storage. The necessary configuration steps depend on the type of communal storage used by the source database:
- For S3-based storage (including S3-compatible systems such as FlashBlade), create a separate configuration for the source communal storage. See Per-bucket S3 configurations for more information.
- For other systems, such as Google Cloud Storage (GCS), that use identities to control access, give the target database's identity access to the source database's communal storage. For example, if both databases use GCS storage, grant the identity the target database uses to access its own communal storage access to the source communal storage. If the databases do not use the same type of storage, configure the target database with the credentials necessary to access the source storage. See Azure Blob Storage object store and Google Cloud Storage (GCS) object store for instructions on how to set up authentication for these object stores.
If you are attempting to replicate data from an on-premises database to a cloud-based database, you must create a connection that grants your cloud access to your on-premises data store.
To secure the connections between your databases, you must secure two separate connections:
- Before transferring data using client-server and command channels, you must either connect the target database to the source database or the source database to the target database. See Configuring client-server TLS and Control channel Spread TLS for instructions on encrypting these connections.
- You can encrypt the connections that transfer the data between the communal data stores by setting parameters in the target database for the object store used by the source database. These parameters are AWSEnableHttps for S3 (and compatible), AzureStorageEndpointConfig for Azure, and GCSEnableHttps for GCS.
Replication steps
Follow these steps to replicate data between two Eon Mode databases:
-
Connect to the target or source database and log in as a user with superuser privileges.
-
Connect to the other database using the CONNECT TO VERTICA statement.
-
Use the REPLICATE statement, passing it the name of the table to replicate, or use the INCLUDE and EXCLUDE options to select several tables or schemas. The format of the replication call depends on whether you connected to the target or source database in step 1:
- Connected to target: specify the source database in the REPLICATE statement using a
FROM
source_db
clause.
- Connected to source: specify the target database in the REPLICATE statement using a
TO
target_db
clause.
Both of these methods copy the shard data directly from the source communal storage location to the target communal storage location.
-
Optionally, use the DISCONNECT statement to disconnect.
This example, initiated on the target database, replicates a table named customers
from the source database named verticadb
:
=> CONNECT TO VERTICA source_db USER dbadmin
PASSWORD 'mypassword' ON 'vertica_node01', 5433;
CONNECT
=> REPLICATE customers FROM source_db;
REPLICATE
The following statements execute the same data replication as the previous example, but this replication is initiated from the source database instead of the target database:
=> CONNECT TO VERTICA target_db USER dbadmin
PASSWORD 'mypassword' ON 'vertica_node01', 5433;
CONNECT
=> REPLICATE customers TO target_db;
REPLICATE
To replicate more than one table at a time, use the INCLUDE option to include tables or schemas using wildcards. This example replicates all tables in the public
schema starting with the letter t:
=> REPLICATE INCLUDE "public.t*" FROM source_db;
Use the EXCLUDE option with INCLUDE to limit the tables that Vertica replicates. This example replicates all tables in the public
schema except those that start with the string customer_
.
=> REPLICATE INCLUDE "public.*" EXCLUDE '*.customer_*' FROM source_db;
Scheduled replication
You can use replication to keep one or more tables in the target database up-to-date with tables in the source database by scheduling automatic replications. When you call the REPLICATE statement repeatedly, it automatically determines what has changed in the source table since the last replication and only transfers the data that has changed in the source database.
Important
Scheduled replication between the main cluster and a
sandboxed cluster is non-incremental. Each REPLICATE statement between these clusters runs the full replication, regardless of previous REPLICATE calls.
When determining the interval between scheduled replications, you should consider the amount of data being transferred, the resources assigned to the Tuple Mover, and data transfer concurrency. If replicating across platforms, you should also take into account your network bandwidth. Based on these factors, you can tune your resources and environment to find an optimal interval between scheduled replications.
Note
Frequent calls to REPLICATE can increase the size of the database's catalog. A larger catalog can increase the amount of metadata your database must process for each query. This added overhead can affect performance.
In many cases, the best option is to balance the time sensitivity of the queries in the target database that use the replicated data with the amount of data that each replication must process. After you decide how often to run replications, create a script to perform the replications. The script simply needs to follow the steps in Replication steps to replicate the table a single time. Then set up a scheduled task to run the script.
Monitoring replication
The V_MONITOR.REPLICATION_STATUS system table in the target database shows information about database replications. You can use it to monitor ongoing replications and the results of prior replications:
=> SELECT node_name, status, transaction_id FROM REPLICATION_STATUS
ORDER BY start_time ASC;
node_name | status | transaction_id
-----------------------+-------------------+-------------------
v_target_db_node0001 | completed | 45035996273706044
v_target_db_node0002 | completed | 45035996273706044
v_target_db_node0003 | completed | 45035996273706044
v_target_db_node0001 | completed | 45035996273706070
v_target_db_node0002 | completed | 45035996273706070
v_target_db_node0003 | completed | 45035996273706070
v_target_db_node0002 | data transferring | 45035996273706136
v_target_db_node0003 | data transferring | 45035996273706136
v_target_db_node0001 | data transferring | 45035996273706136
(9 rows)