This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Data export
You can export data from Vertica, which you might do for the following reasons:.
You can export data from Vertica, which you might do for the following reasons:
-
To instead use the data in external tables; see Working with external data.
-
To share data with other clients or consumers in an ecosystem.
-
To copy data to another Vertica cluster.
Vertica provides two ways to export data. With the data exporters, you can export the results of a SELECT query to files in Parquet, ORC, or delimited format. You can use partitioning in the export, which can decrease output file size and improve performance when reading the data in external tables. For more information, see File export.
Vertica also provides a way to move data directly between Vertica clusters, without having to export from one and load in the other. You can perform the operation as either an import from another database or an export to another database; aside from the direction of data travel, these two operations are quivalent. For more information, see Database export and import.
1 - File export
You might want to export a table or other query results from Vertica, either to share it with other applications or to move lower-priority data from ROS to less-expensive storage.
You might want to export a table or other query results from Vertica, either to share itwith other applications or to move lower-priority data fromROS to less-expensive storage. Vertica supports several EXPORT TO statements for different file formats: EXPORT TO PARQUET, EXPORT TO ORC, EXPORT TO JSON, and EXPORT TO DELIMITED.
You can export data to HDFS, S3, Google Cloud Storage (GCS), Azure Blob Storage, or the Linux file system. You can export ROS data or data that is readable through external tables. After exporting ROS data, you can drop affected ROS partitions to reclaim storage space.
Be careful to avoid concurrent exports to the same output destination. Doing so is an error on any file system and can produce incorrect results.
Note
You cannot export data from an external table that contains external, non-native data types. See
Complex types.
After exporting data, you can define external tables to read that data in Vertica. Parquet and ORC are columnar formats that Vertica can take advantage of in query optimization, so consider using one of those formats for external tables. See Working with external data.
1.1 - Syntax
EXPORT TO PARQUET, EXPORT TO ORC, EXPORT TO JSON, and EXPORT TO DELIMITED have the same general syntax.
Use the EXPORT statements to export data specified by a SELECT statement, as in the following example:
=> EXPORT TO PARQUET(directory='webhdfs:///data/sales_data')
AS SELECT * FROM public.sales;
Rows Exported
---------------
14336
(1 row)
The directory argument specifies where to write the files and is required. You must have permission to write to the output directory. The directory must not already exist. You can export to HDFS, S3, GCS, Azure, or the Linux file system. For additional considerations specific to object stores, see Exporting to object stores. If you export to the local Linux file system (not a shared NFS mount) you must take some additional steps; see Exporting to the Linux file system.
Note
You can only perform one export per output directory. If you perform more than one concurrent export to the same directory, only one will succeed.
You can use EXPORT statements to write queries across multiple tables in Vertica and export the results. With this approach you can take advantage of powerful, fast query execution in Vertica while making the results available to other clients:
=> EXPORT TO ORC(directory='webhdfs:///data/sales_by_region')
AS SELECT sale.price, sale.date, store.region
FROM public.sales sale
JOIN public.vendor store ON sale.distribID = store.ID;
Rows Exported
---------------
23301
(1 row)
Data types
All exporters can export scalar types and primitive arrays. The JSON exporter also supports the ARRAY and ROW types in any combination:
=> SELECT * FROM restaurants;
name | cuisine | location_city | menu
-------------------+---------+----------------------------+------------------------------------------------------------------------------
Bob's pizzeria | Italian | ["Cambridge","Pittsburgh"] | [{"item":"cheese pizza","price":null},{"item":"spinach pizza","price":10.5}]
Bakersfield Tacos | Mexican | ["Pittsburgh"] | [{"item":"veggie taco","price":9.95},{"item":"steak taco","price":10.95}]
(2 rows)
=> EXPORT TO JSON (directory='/output/json') AS SELECT * FROM restaurants;
Rows Exported
---------------
2
(1 row)
=> \! cat /output/json/*.json
{"name":"Bob's pizzeria","cuisine":"Italian","location_city":["Cambridge","Pittsburgh"],"menu":[{"item":"cheese pizza","price":null},{"item":"spinach pizza","price":10.5}]}
{"name":"Bakersfield Tacos","cuisine":"Mexican","location_city":["Pittsburgh"],"menu":[{"item":"veggie taco","price":9.95},{"item":"steak taco","price":10.95}]}
Query
EXPORT statements rewrite the query you specify, because the export is done by a user-defined transform function (UDTF). Because of this rewrite, there are some restrictions on the query you supply.
The query can contain only a single outer SELECT statement. For example, you cannot use UNION as in the following example.
=> EXPORT TO PARQUET(directory = '/mnt/shared_nfs/accounts/rm')
OVER(PARTITION BY hash)
AS
SELECT 1 as account_id, '{}' as json, 0 hash
UNION ALL
SELECT 2 as account_id, '{}' as json, 1 hash;
ERROR 8975: Only a single outer SELECT statement is supported
HINT: Please use a subquery for multiple outer SELECT statements
Instead, rewrite the query to use a subquery:
=> EXPORT TO PARQUET(directory = '/mnt/shared_nfs/accounts/rm')
OVER(PARTITION BY hash)
AS
SELECT
account_id,
json
FROM
(
SELECT 1 as account_id, '{}' as json, 0 hash
UNION ALL
SELECT 2 as account_id, '{}' as json, 1 hash
) a;
Rows Exported
---------------
2
(1 row)
To use composite statements such as UNION, INTERSECT, and EXCEPT, rewrite them as subqueries.
When exporting data you can use an OVER() clause to partition and sort the data as described in Partitioning and sorting data. Partitioning and sorting can improve query performance.
All exporters have a required directory
parameter. All allow you to specify a compression type, and each exporter has format-specific parameters. See the parameter descriptions on the individual reference pages: EXPORT TO PARQUET, EXPORT TO ORC, EXPORT TO JSON, and EXPORT TO DELIMITED.
1.2 - Partitioning and sorting data
When exporting, you can use the optional OVER clause in the SELECT statement to specify how to partition and/or sort the exported data.
When exporting, you can use the optional OVER clause in the SELECT statement to specify how to partition and/or sort the exported data. Partitioning reduces the sizes of the output data files and canimprove performance when Vertica queries external tables containing thisdata. (See Using partition columns.) If you do not specify how to partition the data, Vertica optimizes the export for maximum parallelism.
To specify partition columns, use PARTITION BY in theOVER clause as in the following example:
=> EXPORT TO PARQUET(directory = 'webhdfs:///data/export')
OVER(PARTITION BY date) AS SELECT transactionID, price FROM public.sales;
Rows Exported
---------------
28337
(1 row)
You can both partition by a column and include that column in the SELECT clause. Including the column allows you to sort it. Vertica still takes advantage of the partitioning during query execution.
You can sort values within a partition for a further performance improvement. Sort table columns based on the likelihood of their occurrence in query predicates; columns that most frequently occur in comparison or range predicates should be sorted first. You can sort values within each partition using ORDER BY in the OVER clause:
=> EXPORT TO PARQUET(directory = 'webhdfs:///data/export')
OVER(PARTITION BY date ORDER BY price) AS SELECT date, price FROM public.sales;
Rows Exported
---------------
28337
(1 row)
You can use ORDER BY even without partitioning. Storing data in sorted order can improve data access and predicate evaluation performance.
Targets in the OVER clause must be column references; they cannot beexpressions.For more information about OVER, see SQL analytics.
If you are exporting data to a local file system, you might want to force a single node to write all of the files. To do so, use an empty OVER clause.
1.3 - Exporting to object stores
Object-store file systems (S3, Google Cloud Storage, and Azure Blob Storage) have some differences from other file systems that affect data export.
Object-store file systems (S3, Google Cloud Storage, and Azure Blob Storage) have some differences from other file systems that affect data export. You must set some additional configuration parameters for authentication and region, and there are some restrictions on the output.
URI formats and configuration parameters are described on the following reference pages:
Configuration parameters affect all access to the corresponding object stores, both reads and writes. Instead of setting them globally, you can limit the effects of your settings by setting them at the session level before exporting data.
Output restrictions
Object-store file systems do not support renaming files in place; they implement a rename as a copy followed by a delete. On other file systems, EXPORT TO PARQUET supports atomicity by writing its output into a temporary directory and renaming it when complete. Such an approach is impractical for object stores, so EXPORT TO PARQUET writes directly to the destination path. It is therefore possible to begin reading the exported data before the export has finished, which could lead to errors. Be careful to wait for the export to finish before using the data.
Vertica does not support simultaneous exports to the same path in object stores. The results are undefined.
S3 limits buckets to 5TB. You might need to divide very large exports.
1.4 - Exporting to the Linux file system
If you export to an NFS mount on the Linux file system, the exporters behave the same way as for any other shared location: all of the exported files are written to the same destination.
If you export to an NFS mount on the Linux file system, the exporters behave the same way as for any other shared location: all of the exported files are written to the same destination.
If you export to the local Linux file system, each Vertica node writes its portion of the export to its local file system. To define an external table to read the exported data, you must specify each node and path, as in the following example:
=> CREATE EXTERNAL TABLE sales (...)
AS COPY FROM '/data/sales/*.parquet' ON node01,
'/data/sales/*.parquet' ON node02,
'/data/sales/*.parquet' ON node03 PARQUET;
The path on each node is the same path that you specified in the export statement.
If you omit the ON clauses and just specify a path, COPY only loads the data it finds on the initiator node.
1.5 - Monitoring exports
You can review information about exports, including numbers of row groups, file sizes, and file names.
You can review information about exports, including numbers of row groups, file sizes, and file names.
The export statements are UDxs. The UDX_EVENTS system table records events logged during UDx execution, including timestamps, node names, and session IDs. This table contains a column (RAW), which holds a VMap of whatever additional data an individual UDx logged. The export statements log details about the exported files in this table. While you can work with this table directly and materialize values from the VMap column, you might prefer to define a view to simplify your access.
The following statement defines a view showing only the events from EXPORT TO PARQUET
, materializing the VMap values.
=> CREATE VIEW parquet_export_events AS
SELECT
report_time,
node_name,
session_id,
user_id,
user_name,
transaction_id,
statement_id,
request_id,
udx_name,
file,
created,
closed,
rows,
row_groups,
size_mb
FROM
v_monitor.udx_events
WHERE
udx_name ilike 'ParquetExport%';
The exporters report the following UDx-specific columns:
Column Name |
Data Type |
Description |
FILE |
VARCHAR |
Name of the output file. |
CREATED |
TIMESTAMPTZ |
When the file was created. |
CLOSED |
TIMESTAMPTZ |
When the file was closed after writing. |
ROWS |
INTEGER |
The total number of rows in the file. |
ROW_GROUPS |
INTEGER |
The number of row groups, for formats that use them (Parquet and ORC). |
SIZE_MB |
FLOAT |
File size. |
The following example shows the results of a single export.
=> SELECT file,rows,row_groups,size_mb FROM PARQUET_EXPORT_EVENTS;
file | rows | row_groups | size_mb
-----------------------------------------------------------------------+-------+------------+----------
/data/outgZxN3irt/450c4213-v_vmart_node0001-139770732459776-0.parquet | 29696 | 1 | 0.667203
/data/outgZxN3irt/9df1c797-v_vmart_node0001-139770860660480-0.parquet | 29364 | 1 | 0.660922
(2 rows)
In this table, the output directory name (/data/out) is appended with a generated string (gZxN3irt). For exports to HDFS or to local file systems (including NFS), EXPORT TO PARQUET first writes data into a scratch directory and then renames it at the end of the operation. The events are logged during export and so show the temporary name. Some output destinations, such as AWS S3, do not support rename operations, so in those cases this table does not show generated names.
2 - Database export and import
Vertica can easily import data from and export data to other Vertica databases.
Vertica can easily import data from and export data to other Vertica databases. Importing and exporting data is useful for common tasks such as moving data back and forth between a development or test database and a production database, or between databases that have different purposes but need to share data on a regular basis.
Moving data directly between databases
To move data between databases you first establish a connection using CONNECT TO VERTICA and then use one of the following statements to move data:
These statements are symmetric; copying from cluster A to cluster B is the same as exporting from cluster B to cluster A. The difference is only in which cluster drives the operation.
To configure TLS settings for the connection, see Configuring connection security between clusters.
Creating SQL scripts to export data
Three functions return a SQL script you can use to export database objects to recreate elsewhere:
While copying and exporting data is similar to Backing up and restoring the database, you should use them for different purposes, outlined below:
Task |
Backup and Restore |
COPY and EXPORT Statements |
Back up or restore an entire database, or incremental changes |
YES |
NO |
Manage database objects (a single table or selected table rows) |
YES |
YES |
Use external locations to back up and restore your database |
YES |
NO |
Use direct connections between two databases |
OBJECT RESTORE ONLY |
YES |
Use external shell scripts to back up and restore your database |
YES |
NO |
Use SQL commands to incorporate copy and export tasks into DB operations |
NO |
YES |
The following sections explain how you import and export data between Vertica databases.
When importing from or exporting to a Vertica database, you can connect only to a database that uses trusted (username only) or password-based authentication, as described in Security and authentication. SSL, Kerberos, and OAuth authentication methods are not supported.
Other exports
This section is about exporting data to another Vertica database. For information about exporting data to files, which can then be used in external tables or COPY statements, see File export.
2.1 - Configuring connection security between clusters
When copying data between clusters, Vertica can encrypt both data and plan metadata.
When copying data between clusters, Vertica can encrypt both data and plan metadata.
Data is encrypted if you configure internode encryption (see Internode TLS).
For metadata, by default Vertica tries TLS first and falls back to plaintext. You can configure Vertica to require TLS and to fail if the connection cannot be made. You can also have Vertica verify the certificate and hostname before connecting.
Enabling TLS between clusters
To use TLS between clusters, you must first configure TLS between nodes:
-
Set the EncryptSpreadComms parameter.
-
Configure the data_channel TLS CONFIGURATION.
-
Set the ImportExportTLSMode parameter.
To specify the level of strictness when connecting to another cluster, set the ImportExportTLSMode configuration parameter. This parameter applies for both importing and exporting data. The possible values are:
-
PREFER
: Try TLS but fall back to plaintext if TLS fails.
-
REQUIRE
: Use TLS and fail if the server does not support TLS.
-
VERIFY_CA
: Require TLS (as with REQUIRE), and also validate the other server's certificate using the CA specified by the "server" TLS CONFIGURATION's CA certificates (in this case, "ca_cert" and "ica_cert"):
=> SELECT name, certificate, ca_certificate, mode FROM tls_configurations WHERE name = 'server';
name | certificate | ca_certificate | mode
--------+------------------+--------------------+-----------
server | server_cert | ca_cert,ica_cert | VERIFY_CA
(1 row)
-
VERIFY_FULL
: Require TLS and validate the certificate (as with VERIFY_CA), and also validate the server certificate's hostname.
-
REQUIRE_FORCE
, VERIFY_CA_FORCE
, and VERIFY_FULL_FORCE
: Same behavior as REQUIRE
, VERIFY_CA
, and VERIFY_FULL
, respectively, and cannot be overridden by CONNECT TO VERTICA.
ImportExportTLSMode is a global parameter that applies to all import and export connections you make using CONNECT TO VERTICA. You can override it for an individual connection.
For more information about these and other configuration parameters, see Security parameters.
2.2 - Exporting data to another database
EXPORT TO VERTICA exports table data from one Vertica database to another.
EXPORT TO VERTICA exports table data from one Vertica database to another. The following requirements apply:
-
You already opened a connection to the target database with CONNECT TO VERTICA.
-
The source database is no more than one major release behind the target database.
-
The table in the target database must exist.
-
Source and target table columns must have the same or compatible data types.
Each EXPORT TO VERTICA statement exports data from only one table at a time. You can use the same database connection for multiple export operations.
Export process
Exporting is a three-step process:
-
Connect to the target database with CONNECT TO VERTICA.
For example:
=> CONNECT TO VERTICA testdb USER dbadmin PASSWORD '' ON 'VertTest01', 5433;
CONNECT
-
Export the desired data with EXPORT TO VERTICA. For example, the following statement exports all table data in customer_dimension
to a table of the same name in target database testdb
:
=> EXPORT TO VERTICA testdb.customer_dimension FROM customer_dimension;
Rows Exported
---------------
23416
(1 row)
-
DISCONNECT disconnects from the target database when all export and import operations are complete:
=> DISCONNECT testdb;
DISCONNECT
Note
Closing your session also closes the database connection. However, it is a good practice to explicitly close the connection to the other database, both to free up resources and to prevent issues with other SQL scripts that might be running in your session. Always closing the connection prevents potential errors if you run a script in the same session that attempts to open a connection to the same database, since each session can only have one connection to a given database at a time.
Mapping between source and target columns
If you export all table data from one database to another as in the previous example, EXPORT TO VERTICA can omit specifying column lists. This is possible only if column definitions in both tables comply with the following conditions:
If any of these conditions is not true, the EXPORT TO VERTICA statement must include column lists that explicitly map source and target columns to each other, as follows:
-
Contain the same number of columns.
-
List source and target columns in the same order.
-
Pair columns with the same (or compatible) data types.
For example:
=> EXPORT TO VERTICA testdb.people (name, gender, age)
FROM customer_dimension (customer_name, customer_gender, customer_age);
Exporting subsets of table data
In general, you can export a subset of table data in two ways:
In both cases, the EXPORT TO VERTICA statement typically must specify column lists for the source and target tables.
The following example exports data from three columns in the source table to three columns in the target table. Accordingly, the EXPORT TO VERTICA statement specifies a column list for each table. The order of columns in each list determines how Vertica maps target columns to source columns. In this case, target columns name
, gender
, and age
map to source columns customer_name
, customer_gender
, and customer_age
, respectively:
=> EXPORT TO VERTICA testdb.people (name, gender, age) FROM customer_dimension
(customer_name, customer_gender, customer_age);
Rows Exported
---------------
23416
(1 row)
The next example queries source table customer_dimension
, and exports the result set to table ma_customers
in target database testdb
:
=> EXPORT TO VERTICA testdb.ma_customers(customer_key, customer_name, annual_income)
AS SELECT customer_key, customer_name, annual_income FROM customer_dimension WHERE customer_state = 'MA';
Rows Exported
---------------
3429
(1 row)
Note
In this example, the source and target column names are identical, so specifying a columns list for target table ma_customers
is optional. If one or more of the queried source columns did not have a match in the target table, the statement would be required to include a columns list for the target table.
Exporting identity columns
You can export tables (or columns) that contain identity and auto-increment values, but the sequence values are not incremented automatically at the target table. You must use ALTER SEQUENCE to make updates.
Export identity and auto-increment columns as follows:
-
If both source and destination tables have an identity column and configuration parameter CopyFromVerticaWithIdentity is set to true (1), you do not need to list them.
-
If source table has an identity column, but target table does not, you must explicitly list the source and target columns.
Caution
Failure to list which identity columns to export can cause an error, because the identity column will be interpreted as missing in the destination table.
By default, EXPORT TO VERTICA exports all identity columns . To disable this behavior globally, set the CopyFromVerticaWithIdentity
configuration parameter.
2.3 - Copying data from another Vertica database
COPY FROM VERTICA imports table data from one Vertica database to another.
COPY FROM VERTICA imports table data from one Vertica database to another. The following requirements apply:
-
You already opened a connection to the target database with CONNECT TO VERTICA.
-
The source database is no more than one major release behind the target database.
-
The table in the target database must exist.
-
Source and target table columns must have the same or compatible data types.
Import process
Importing is a three-step process:
-
Connect to the source database with CONNECT TO VERTICA. For example:
=> CONNECT TO VERTICA vmart USER dbadmin PASSWORD '' ON 'VertTest01',5433;
CONNECT
-
Import the desired data with COPY FROM VERTICA. For example, the following statement imports all table data in customer_dimension
to a table of the same name:
=> COPY customer_dimension FROM VERTICA vmart.customer_dimension;
Rows Loaded
-------------
500000
(1 row)
=> DISCONNECT vmart;
DISCONNECT
Note
Successive COPY FROM VERTICA statements in the same session can import data from multiple tables over the same connection.
-
DISCONNECT disconnects from the source database when all import and export operations are complete:
=> DISCONNECT vmart;
DISCONNECT
Note
Closing your session also closes the database connection. However, it is a good practice to explicitly close the connection to the other database, both to free up resources and to prevent issues with other SQL scripts that might be running in your session. Always closing the connection prevents potential errors if you run a script in the same session that attempts to open a connection to the same database, since each session can only have one connection to a given database at a time.
Importing identity columns
You can import identity (and auto-increment) columns as follows:
-
If both source and destination tables have an identity column and configuration parameter CopyFromVerticaWithIdentity is set to true (1), you do not need to list them.
-
If source table has an identity column, but target table does not, you must explicitly list the source and target columns.
Caution
Failure to list which identity columns to export can cause an error, because the identity column will be interpreted as missing in the destination table.
After importing the columns, the identity column values do not increment automatically. Use ALTER SEQUENCE to make updates.
The default behavior for this statement is to import Identity (and Auto-increment) columns by specifying them directly in the source table. To disable this behavior globally, set the CopyFromVerticaWithIdentity configuration parameter.
2.4 - Changing node export addresses
You can change the export address for your Vertica cluster.
You can change the export address for your Vertica cluster. You might need to do so to export data between clusters in different network subnets.
-
Create a subnet for importing and exporting data between Vertica clusters. The CREATE SUBNET statement identifies the public network IP addresses residing on the same subnet.
=> CREATE SUBNET kv_subnet with '10.10.10.0';
-
Alter the database to specify the subnet name of a public network for import/export.
=> ALTER DATABASE DEFAULT EXPORT ON kv_subnet;
-
Create network interfaces for importing and exporting data from individual nodes to other Vertica clusters. The CREATE NETWORK INTERFACE statement identifies the public network IP addresses residing on multiple subnets.
=> CREATE NETWORK INTERFACE kv_node1 on v_VMartDB_node0001 with '10.10.10.1';
=> CREATE NETWORK INTERFACE kv_node2 on v_VMartDB_node0002 with '10.10.10.2';
=> CREATE NETWORK INTERFACE kv_node3 on v_VMartDB_node0003 with '10.10.10.3';
=> CREATE NETWORK INTERFACE kv_node4 on v_VMartDB_node0004 with '10.10.10.4';
For users on Amazon Web Services (AWS) or using Network Address Translation (NAT), refer to Vertica on Amazon Web Services.
-
Alter the node settings to change the export address. When used with the EXPORT ON clause, the ALTER NODE specifies the network interface of the public network on individual nodes for importing and exporting data.
=> ALTER NODE v_VMartDB_node0001 export on kv_node1;
=> ALTER NODE v_VMartDB_node0002 export on kv_node2;
=> ALTER NODE v_VMartDB_node0003 export on kv_node3;
=> ALTER NODE v_VMartDB_node0004 export on kv_node4;
-
Verify if the node address and the export address are different on different network subnets of the Vertica cluster.
=> SELECT node_name, node_address, export_address FROM nodes;
node_name | node_address | export_address
-------------------+-----------------+----------------
v_VMartDB_node0001 | 192.168.100.101 | 10.10.10.1
v_VMartDB_node0002 | 192.168.100.102 | 10.10.10.2
v_VMartDB_node0003 | 192.168.100.103 | 10.10.10.3
v_VMartDB_node0004 | 192.168.100.104 | 10.10.10.4
Creating a network interface and altering the node settings to change the export address takes precedence over creating a subnet and altering the database for import/export.
2.5 - Using public and private IP networks
In many configurations, Vertica cluster hosts use two network IP addresses as follows:.
In many configurations, Vertica cluster hosts use two network IP addresses as follows:
By default, importing from and exporting to another Vertica database uses the private network.
Note
Ensure port 5433 or the port the Vertica database is using is not blocked.
To use the public network address for copy and export activities, as well as moving large amounts of data, configure the system to use the public network to support exporting to or importing from another Vertica cluster:
Vertica encrypts data during transmission (if you have configured a certificate). Vertica attempts to also encrypt plan metadata but, by default, falls back to plaintext if needed. You can configure Vertica to require encryption for metadata too; see Configuring connection security between clusters.
In certain instances, both public and private addresses exceed the demand capacity of a single Local Area Network (LAN). If you encounter this type of scenario, then configure your Vertica cluster to use two LANs: one for public network traffic and one for private network traffic.
2.5.1 - Identify the public network to Vertica
To be able to import to or export from a public network, Vertica needs to be aware of the IP addresses of the nodes or clusters on the public network that will be used for import/export activities.
To be able to import to or export from a public network, Vertica needs to be aware of the IP addresses of the nodes or clusters on the public network that will be used for import/export activities. Your public network might be configured in either of these ways:
To identify public network IP addresses residing on the same subnet:
- Use the CREATE SUBNET statement provide your subnet with a name and to identify the subnet routing prefix.
To identify public network IP addresses residing on multiple subnets:
After you've identified the subnet or network interface to be used for import/export, you must Identify the Database Or Nodes Used For Import/Export.
See also
2.5.2 - Identify the database or nodes used for import/export
After you identify the public network to Vertica, you can configure a database and its nodes to use it for import and export operations:.
After you identify the public network to Vertica, you can configure a database and its nodes to use it for import and export operations:
-
Use
ALTER DATABASE
to specify a subnet on the public network for the database. After doing so, all nodes in the database automatically use the network interface on the subnet for import/export operations.
-
On each database node, use
ALTER NODE
to specify a network interface of the public network.
See also
2.6 - Handling node failure during copy/export
When an export (EXPORT TO VERTICA) or import from Vertica (COPY FROM VERTICA) task is in progress, and a non-initiator node fails, Vertica does not complete the task automatically.
When an export (EXPORT TO VERTICA
) or import from Vertica (COPY FROM VERTICA
) task is in progress, and a non-initiator node fails, Vertica does not complete the task automatically. A non-initiator node is any node that is not the source or target node in your export or import statement. To complete the task, you must run the statement again.
You address the problem of a non-initiator node failing during an import or export as follows:
Note
Both Vertica databases must be running in a safe state.
-
You export or import from one cluster to another using the EXPORT TO VERTICA
or COPY FROM VERTICA
statement.
During the export or import, a non-initiating node on the target or source cluster fails. Vertica issues an error message that indicates possible node failure, one of the following:
-
ERROR 4534: Receive on v_tpchdb1_node0002: Message receipt from v_tpchdb2_node0005 failed
-
WARNING 4539: Received no response from v_tpchdb1_node0004 in abandon plan
-
ERROR 3322: [tpchdb2] Execution canceled by operator
-
Complete your import or export by running the statement again. The failed node does not need to be up for Vertica to successfully complete the export or import.
2.7 - Using EXPORT functions
Vertica provides several EXPORT_ functions that let you recreate a database, or specific schemas and tables, in a target database.
Vertica provides several EXPORT_ functions that let you recreate a database, or specific schemas and tables, in a target database. For example, you can use the EXPORT_ functions to transfer some or all of the designs and objects you create in a development or test environment to a production database.
The EXPORT_ functions create SQL scripts that you can run to generate the exported database designs or objects. These functions serve different purposes to the export statements, COPY FROM VERTICA (pull data) and EXPORT TO VERTICA (push data). These statements transfer data directly from source to target database across a network connection between both. They are dynamic actions and do not generate SQL scripts.
The EXPORT_ functions appear in the following table. Depending on what you need to export, you can use one or more of the functions. EXPORT_CATALOG creates the most comprehensive SQL script, while EXPORT_TABLES and EXPORT_OBJECTS are subsets of that function to narrow the export scope.
Use this function... |
To recreate... |
EXPORT_CATALOG |
These catalog items:
-
An existing schema design, tables, projections, constraints, views, and stored procedures.
-
The Database Designer-created schema design, tables, projections, constraints, and views
-
A design on a different cluster.
|
EXPORT_TABLES |
Non-virtual objects up to, and including, the schema of one or more tables. |
EXPORT_OBJECTS |
Catalog objects in order dependency for replication. |
The designs and object definitions that the script creates depend on the EXPORT_ function scope you specify. The following sections give examples of the commands and output for each function and the scopes it supports.
Saving scripts for export functions
All of the examples in this section were generated using the standard Vertica VMART database, with some additional test objects and tables. One output directory was created for all SQL scripts that the functions created:
/home/dbadmin/xtest
If you specify the destination argument as an empty string (''
), the function writes the export results to STDOUT.
Note
A superuser can export all available database output to a file with the EXPORT_ functions. For a non-superuser, the EXPORT_ functions generate a script containing only the objects to which the user has access.
2.7.1 - Exporting the catalog
Vertica function EXPORT_CATALOG generates a SQL script for copying a database design to another cluster.
Vertica function EXPORT_CATALOG generates a SQL script for copying a database design to another cluster. This script replicates the physical schema design of the source database. You call this function as follows:
EXPORT_CATALOG ( ['[destination]' [, '[scope]']] )
The SQL script conforms to the following requirements:
-
Only includes objects to which the user has access.
-
Orders CREATE statements according to object dependencies so they can be recreated in the correct sequence. For example, if a table is in a non-PUBLIC schema, the required CREATE SCHEMA statement precedes the CREATE TABLE statement. Similarly, a table's CREATE ACCESS POLICY statement follows the table's CREATE TABLE statement.
-
If possible, creates projections with their KSAFE clause, if any, otherwise with their OFFSET clause.
Setting export scope
If you omit specifying a scope, EXPORT_CATALOG exports all objects. You can set the scope of the export operation to one of the following levels:
Scope |
Exports... |
TABLES |
Tables, schemas, and table-dependent objects: constraints, and access policies |
DESIGN |
All catalog objects: schemas, tables, constraints, views, access policies, projections, SQL macros, and stored procedures. |
DIRECTED_QUERIES |
All directed queries that are stored in the catalog. For details, see Managing directed queries. |
Exporting table objects
Use the TABLES scope to generate a script that recreates all tables and the objects that they depend on: schemas, sequences, constraints, and access policies:
=> SELECT EXPORT_CATALOG (
'/home/dbadmin/xtest/sql_cat_tables.sql',
'TABLES');
EXPORT_CATALOG
-------------------------------------
Catalog data exported successfully
(1 row)
The SQL script can include the following statements:
Exporting all catalog objects
Use the DESIGN scope to export all design elements of a source database in order of their dependencies. This scope exports all catalog objects, including schemas, tables, constraints, projections, views, and access policies.
=> SELECT EXPORT_CATALOG(
'/home/dbadmin/xtest/sql_cat_design.sql',
'DESIGN' );
EXPORT_CATALOG
-------------------------------------
Catalog data exported successfully
(1 row)
The SQL script include statements that are required to recreate the database:
Projection considerations
If a projection to export was created with no ORDER BY clause, the SQL script reflects the default behavior for projections. Vertica implicitly creates projections using a sort order based on the SELECT columns in the projection definition.
The EXPORT_CATALOG script is portable if all projections are created using UNSEGMENTED ALL NODES or SEGMENTED ALL NODES.
See also
2.7.2 - Exporting tables
Vertica function EXPORT_TABLES exports DDL for tables and related objects in the current database.
Vertica function EXPORT_TABLES exports DDL for tables and related objects in the current database. The generated SQL includes all non-virtual table objects to which you have access. You can use this SQL to recreate tables and related non-virtual objects on a different cluster.
You execute EXPORT_TABLES as follows:
EXPORT_TABLES( ['[destination]' [, '[scope]']] )
The SQL script conforms to the following requirements:
-
Only includes objects to which the user has access.
-
Orders CREATE statements according to object dependencies so they can be recreated in the correct sequence. For example, if a table references a named sequence, a CREATE SEQUENCE statement precedes the CREATE TABLE statement. Similarly, a table's CREATE ACCESS POLICY statement follows the table's CREATE TABLE statement.
Setting export scope
The EXPORT_TABLES scope
argument specifies the scope of the export operation:
To export... |
Set scope to... |
All tables to which you have access, including constraints |
Empty string (' ') |
One or more named tables |
Comma-delimited list of table objects. For example:
myschema.newtable, yourschema.oldtable
You can optionally qualify the schema with the name of the current database:
mydb.myschema.newtable
|
A named table object in the current search path: a schema, table, or sequence. If you specify a schema, EXPORT_TABLES exports all table objects in that schema to which you have access. |
Table object's name and, optionally, its path:
VMart.myschema
|
Note
EXPORT_TABLES does not export views. If you specify a view name, Vertica silently ignores it and the view is omitted from the generated script. To export views, use
EXPORT_OBJECTS.
Exporting all table objects
If you set the scope parameter to an empty string (''
), EXPORT_TABLES exports all tables and their related objects. For example, the following call to EXPORT_TABLES exports all table objects in the VMart database to the specified output file.
=> SELECT EXPORT_TABLES(
'/home/dbadmin/xtest/sql_tables_empty.sql', '');
EXPORT_TABLES
-------------------------------------
Catalog data exported successfully
(1 row)
The exported SQL includes the following types of statements, depending on what is required to recreate the tables and related objects, such as schemas, sequences and access policies:
Exporting individual table objects
EXPORT_TABLES can specify a comma-separated list of tables and table-related objects such as sequences or schemas to export. The generated SQL script includes the CREATE statements for the specified objects and their dependent objects:
For example, the following call to EXPORT_TABLES exports two VMart tables: store.store_sales_fact
and store.store_dimension
:
=> SELECT export_tables('','store.store_sales_fact, store.store_dimension');
export_tables
-------------------------------------------------------------------------
CREATE TABLE store.store_dimension
(
store_key int NOT NULL,
store_name varchar(64),
store_number int,
store_address varchar(256),
store_city varchar(64),
store_state char(2),
store_region varchar(64),
floor_plan_type varchar(32),
photo_processing_type varchar(32),
financial_service_type varchar(32),
selling_square_footage int,
total_square_footage int,
first_open_date date,
last_remodel_date date,
number_of_employees int,
annual_shrinkage int,
foot_traffic int,
monthly_rent_cost int,
CONSTRAINT C_PRIMARY PRIMARY KEY (store_key) DISABLED
);
CREATE TABLE store.store_sales_fact
(
date_key int NOT NULL,
product_key int NOT NULL,
product_version int NOT NULL,
store_key int NOT NULL,
promotion_key int NOT NULL,
customer_key int NOT NULL,
employee_key int NOT NULL,
pos_transaction_number int NOT NULL,
sales_quantity int,
sales_dollar_amount int,
cost_dollar_amount int,
gross_profit_dollar_amount int,
transaction_type varchar(16),
transaction_time time,
tender_type varchar(8)
);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_date FOREIGN KEY (date_key) references public.date_dimension (date_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_product FOREIGN KEY (product_key, product_version) references public.product_dimension (product_key, product_version);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_store FOREIGN KEY (store_key) references store.store_dimension (store_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_promotion FOREIGN KEY (promotion_key) references public.promotion_dimension (promotion_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_customer FOREIGN KEY (customer_key) references public.customer_dimension (customer_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_employee FOREIGN KEY (employee_key) references public.employee_dimension (employee_key);
The following call to EXPORT_TABLES specifies to export all tables in the VMart schema store
:
=> select export_tables('','store');
export_tables
-----------------------------------------------------------------
CREATE SCHEMA store;
CREATE TABLE store.store_dimension
(
...
);
CREATE TABLE store.store_sales_fact
(
...
);
CREATE TABLE store.store_orders_fact
(
...
);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_date FOREIGN KEY (date_key) references public.date_dimension (date_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_product FOREIGN KEY (product_key, product_version) references public.product_dimension (product_key, product_version);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_store FOREIGN KEY (store_key) references store.store_dimension (store_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_promotion FOREIGN KEY (promotion_key) references public.promotion_dimension (promotion_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_customer FOREIGN KEY (customer_key) references public.customer_dimension (customer_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_employee FOREIGN KEY (employee_key) references public.employee_dimension (employee_key);
ALTER TABLE store.store_orders_fact ADD CONSTRAINT fk_store_orders_product FOREIGN KEY (product_key, product_version) references public.product_dimension (product_key, product_version);
ALTER TABLE store.store_orders_fact ADD CONSTRAINT fk_store_orders_store FOREIGN KEY (store_key) references store.store_dimension (store_key);
ALTER TABLE store.store_orders_fact ADD CONSTRAINT fk_store_orders_vendor FOREIGN KEY (vendor_key) references public.vendor_dimension (vendor_key);
ALTER TABLE store.store_orders_fact ADD CONSTRAINT fk_store_orders_employee FOREIGN KEY (employee_key) references public.employee_dimension (employee_key);
(1 row)
See also
2.7.3 - Exporting objects
The Vertica function EXPORT_OBJECTS generates a SQL script that you can use to recreate non-virtual catalog objects on a different cluster, as follows:.
The Vertica function EXPORT_OBJECTS generates a SQL script that you can use to recreate non-virtual catalog objects on a different cluster, as follows:
EXPORT_OBJECTS( ['[destination]' [, '[scope]'] [, 'mark-ksafe']] )
The SQL script conforms to the following requirements:
-
Only includes objects to which the user has access.
-
Orders CREATE statements according to object dependencies so they can be recreated in the correct sequence. For example, if a table is in a non-PUBLIC schema, the required CREATE SCHEMA statement precedes the CREATE TABLE statement. Similarly, a table's CREATE ACCESS POLICY statement follows the table's CREATE TABLE statement.
-
If possible, creates projections with their KSAFE clause, if any, otherwise with their OFFSET clause.
Setting export scope
The EXPORT_OBJECTS scope
argument specifies the scope of the export operation:
To export... |
Set scope to... |
All objects to which you have access |
Empty string ('') |
One or more named database objects and related objects |
Comma-delimited list of objects. For example:
myschema.newtable, yourschema.my-sequence
You can optionally qualify the schema with the name of the current database:
mydb.myschema.newtable
If you specify a schema, EXPORT_TABLES exports all objects in that schema to which you have access. If you name a table that references a sequence, the generated script shows the sequence, then the table that references the sequence, and finally any projections of that table.
|
Note
EXPORT_OBJECTS does not export grants. Preserving grants on libraries can be especially important when you upgrade your database: if the prototypes of UDx libraries change, Vertica drops the grants on them. In order to preserve grants on UDx libraries, back up the grants before upgrading, and then restore them in the upgraded database. For details, see
Backing up and restoring grants.
Exporting all objects
If you set the scope parameter to an empty string (''
), Vertica exports all non-virtual objects from the source database in order of dependencies. Running the generated SQL script on another cluster creates all referenced objects and their dependent objects.
By default, the function's KSAFE argument is set to true. In this case, the generated script calls MARK_DESIGN_KSAFE, which replicates K-safety of the original database.
=> SELECT EXPORT_OBJECTS(
'/home/dbadmin/xtest/sql_objects_all.sql',
'',
'true');
EXPORT_OBJECTS
-------------------------------------
Catalog data exported successfully
(1 row)
The SQL script includes the following types of statements:
The following output includes the start and end of the output SQL file, including the MARK_DESIGN_KSAFE statement:
CREATE SCHEMA store;
CREATE SCHEMA online_sales;
CREATE SEQUENCE public.my_seq ;
CREATE TABLE public.customer_dimension
(
customer_key int NOT NULL,
customer_type varchar(16),
customer_name varchar(256),
customer_gender varchar(8),
title varchar(8),
household_id int,
...
);
...
SELECT MARK_DESIGN_KSAFE(1);
Exporting individual objects
You can specify one or more objects as the function scope, where multiple objects are specified in a comma-delimited list. The names of objects in any non-PUBLIC schema must include their respective schemas. The objects to export can include schemas, tables, views, and sequences. Accordingly, the SQL script includes the following statements, depending on what objects you list and their dependencies:
If listed objects have overlapping scopes—for example, the list includes a table and one of its projections—EXPORT_OBJECTS exports the projection only once:
=> select export_objects ('','customer_dimension, customer_dimension_super');
export_objects
--------------------------------------------------------------------------
CREATE TABLE public.customer_dimension
(
customer_key int NOT NULL,
customer_type varchar(16),
customer_name varchar(256),
...
CONSTRAINT C_PRIMARY PRIMARY KEY (customer_key) DISABLED
);
CREATE ACCESS POLICY ON public.customer_dimension FOR COLUMN customer_address CASE WHEN enabled_role('administrator') THEN customer_address ELSE '**************' END ENABLE;
CREATE PROJECTION public.customer_dimension_super /*+basename(customer_dimension),createtype(L)*/
(
customer_key,
customer_type,
customer_name,
...
)
AS
SELECT customer_dimension.customer_key,
customer_dimension.customer_type,
customer_dimension.customer_name,
...
FROM public.customer_dimension
ORDER BY customer_dimension.customer_key
SEGMENTED BY hash(customer_dimension.customer_key) ALL NODES OFFSET 0;
SELECT MARK_DESIGN_KSAFE(0);
See also
Exporting tables