This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Data export

You can export data from Vertica, which you might do for the following reasons:.

You can export data from Vertica, which you might do for the following reasons:

  • To instead use the data in external tables; see Working with external data.

  • To share data with other clients or consumers in an ecosystem.

  • To copy data to another Vertica cluster.

Vertica provides two ways to export data. With the data exporters, you can export the results of a SELECT query to files in Parquet, ORC, or delimited format. You can use partitioning in the export, which can decrease output file size and improve performance when reading the data in external tables. For more information, see File export.

Vertica also provides a way to move data directly between Vertica clusters, without having to export from one and load in the other. You can perform the operation as either an import from another database or an export to another database; aside from the direction of data travel, these two operations are quivalent. For more information, see Database export and import.

1 - File export

You might want to export a table or other query results from Vertica, either to share it with other applications or to move lower-priority data from ROS to less-expensive storage.

You might want to export a table or other query results from Vertica, either to share itwith other applications or to move lower-priority data fromROS to less-expensive storage. Vertica supports several EXPORT TO statements for different file formats: EXPORT TO PARQUET, EXPORT TO ORC, EXPORT TO JSON, and EXPORT TO DELIMITED.

You can export data to HDFS, S3, Google Cloud Storage (GCS), Azure Blob Storage, or the Linux file system. Be careful to avoid concurrent exports to the same output destination. Doing so is an error on any file system and can produce incorrect results.

Not all exporters support all complex types. See the Data Types section on individual reference pages.

You can export ROS data or data that is readable through external tables. After exporting ROS data, you can drop affected ROS partitions to reclaim storage space.

After exporting data, you can define external tables to read that data in Vertica. Parquet and ORC are columnar formats that Vertica can take advantage of in query optimization, so consider using one of those formats for external tables. See Working with external data.

1.1 - Syntax

EXPORT TO PARQUET, EXPORT TO ORC, EXPORT TO JSON, and EXPORT TO DELIMITED have the same general syntax.

Use the EXPORT statements to export data specified by a SELECT statement, as in the following example:

=> EXPORT TO PARQUET(directory='webhdfs:///data/sales_data')
   AS SELECT * FROM public.sales;
 Rows Exported
---------------
         14336
(1 row)

The directory argument specifies where to write the files and is required. You must have permission to write to the output directory. The directory must not already exist. You can export to HDFS, S3, GCS, Azure, or the Linux file system. For additional considerations specific to object stores, see Exporting to object stores. If you export to the local Linux file system (not a shared NFS mount) you must take some additional steps; see Exporting to the Linux file system.

You can use EXPORT statements to write queries across multiple tables in Vertica and export the results. With this approach you can take advantage of powerful, fast query execution in Vertica while making the results available to other clients:

=> EXPORT TO ORC(directory='webhdfs:///data/sales_by_region')
   AS SELECT sale.price, sale.date, store.region
   FROM public.sales sale
   JOIN public.vendor store ON sale.distribID = store.ID;
 Rows Exported
---------------
         23301
(1 row)

Data types

All exporters can export scalar types. All except EXPORT TO DELIMITED also support primitive arrays. The JSON and Parquet exporters also support the ARRAY and ROW types in any combination:

=> SELECT * FROM restaurants;
       name        | cuisine |       location_city        |                                     menu
-------------------+---------+----------------------------+------------------------------------------------------------------------------
 Bob's pizzeria    | Italian | ["Cambridge","Pittsburgh"] | [{"item":"cheese pizza","price":null},{"item":"spinach pizza","price":10.5}]
 Bakersfield Tacos | Mexican | ["Pittsburgh"]             | [{"item":"veggie taco","price":9.95},{"item":"steak taco","price":10.95}]
(2 rows)

=> EXPORT TO JSON (directory='/output/json') AS SELECT * FROM restaurants;
 Rows Exported
---------------
             2
(1 row)

=> \! cat /output/json/*.json
{"name":"Bob's pizzeria","cuisine":"Italian","location_city":["Cambridge","Pittsburgh"],"menu":[{"item":"cheese pizza","price":null},{"item":"spinach pizza","price":10.5}]}
{"name":"Bakersfield Tacos","cuisine":"Mexican","location_city":["Pittsburgh"],"menu":[{"item":"veggie taco","price":9.95},{"item":"steak taco","price":10.95}]}

Query

EXPORT statements rewrite the query you specify, because the export is done by a user-defined transform function (UDTF). Because of this rewrite, there are some restrictions on the query you supply.

The query can contain only a single outer SELECT statement. For example, you cannot use UNION:

=> EXPORT TO PARQUET(directory = '/mnt/shared_nfs/accounts/rm')
   OVER(PARTITION BY hash)
   AS
   SELECT 1 as account_id, '{}' as json, 0 hash
   UNION ALL
   SELECT 2 as account_id, '{}' as json, 1 hash;
ERROR 8975:  Only a single outer SELECT statement is supported
HINT:  Please use a subquery for multiple outer SELECT statements

Instead, rewrite the query to use a subquery:

=> EXPORT TO PARQUET(directory = '/mnt/shared_nfs/accounts/rm')
   OVER(PARTITION BY hash)
   AS
   SELECT
    account_id,
    json
   FROM
   (
     SELECT 1 as account_id, '{}' as json, 0 hash
     UNION ALL
     SELECT 2 as account_id, '{}' as json, 1 hash
   ) a;
 Rows Exported
---------------
             2
(1 row)

To use composite statements such as UNION, INTERSECT, and EXCEPT, rewrite them as subqueries.

When exporting data you can use an OVER() clause to partition and sort the data as described in Partitioning and sorting data. Partitioning and sorting can improve query performance. If you partition the data, you cannot specify schema and table names in the SELECT statement. In this case, specify only the column name.

All exporters have a required directory parameter. All allow you to specify a compression type, and each exporter has format-specific parameters. See the parameter descriptions on the individual reference pages: EXPORT TO PARQUET, EXPORT TO ORC, EXPORT TO JSON, and EXPORT TO DELIMITED.

1.2 - Partitioning and sorting data

When exporting, you can use the optional OVER clause in the SELECT statement to specify how to partition and/or sort the exported data.

When exporting, you can use the optional OVER clause in the SELECT statement to specify how to partition and/or sort the exported data. Partitioning reduces the sizes of the output data files and canimprove performance when Vertica queries external tables containing thisdata. (See Partitioned file paths.) If you do not specify how to partition the data, Vertica optimizes the export for maximum parallelism.

To specify partition columns, use PARTITION BY in theOVER clause as in the following example:

=> EXPORT TO PARQUET(directory = 'webhdfs:///data/export')
  OVER(PARTITION BY date) AS SELECT transactionID, price FROM public.sales;
 Rows Exported
---------------
         28337
(1 row)

You can both partition by a column and include that column in the SELECT clause. Including the column allows you to sort it. Vertica still takes advantage of the partitioning during query execution.

You can sort values within a partition for a further performance improvement. Sort table columns based on the likelihood of their occurrence in query predicates; columns that most frequently occur in comparison or range predicates should be sorted first. You can sort values within each partition using ORDER BY in the OVER clause:

=> EXPORT TO PARQUET(directory = 'webhdfs:///data/export')
  OVER(PARTITION BY date ORDER BY price) AS SELECT date, price FROM public.sales;
 Rows Exported
---------------
         28337
(1 row)

You can use ORDER BY even without partitioning. Storing data in sorted order can improve data access and predicate evaluation performance.

Targets in the OVER clause must be column references; they cannot beexpressions.For more information about OVER, see SQL analytics.

If you are exporting data to a local file system, you might want to force a single node to write all of the files. To do so, use an empty OVER clause.

1.3 - Exporting to object stores

Object-store file systems (S3, Google Cloud Storage, and Azure Blob Storage) have some differences from other file systems that affect data export.

Object-store file systems (S3, Google Cloud Storage, and Azure Blob Storage) have some differences from other file systems that affect data export. You must set some additional configuration parameters for authentication and region, and there are some restrictions on the output.

URI formats and configuration parameters are described on the following reference pages:

Configuration parameters affect all access to the corresponding object stores, both reads and writes. Instead of setting them globally, you can limit the effects of your settings by setting them at the session level before exporting data.

Output restrictions

Object-store file systems do not support renaming files in place; they implement a rename as a copy followed by a delete. On other file systems, the exporters support atomicity by writing output into a temporary directory and renaming it when complete. Such an approach is impractical for object stores, so the exporters write directly to the destination path. It is therefore possible to begin reading the exported data before the export has finished, which could lead to errors. Be careful to wait for the export to finish before using the data.

Vertica does not support simultaneous exports to the same path in object stores. The results are undefined.

S3 limits buckets to 5TB. You might need to divide very large exports.

1.4 - Exporting to the Linux file system

Exports to the local file system can be to an NFS mount (shared) or to the Linux file system on each node (non-shared).

Exports to the local file system can be to an NFS mount (shared) or to the Linux file system on each node (non-shared). If you export to an NFS mount on the Linux file system, the exporters behave the same way as for any other shared location: all of the exported files are written to the same (shared) destination.

If you export to the local Linux file system, each Vertica node writes its portion of the export to its local (non-shared) file system. Exports to non-shared local file systems have the following restrictions:

  • The output directory must not exist on any node.

  • You must have a USER storage location or superuser privileges.

  • You cannot override the permissions mode of 700 for directories and 600 for files.

To define an external table to read data exported to a non-shared local file system, use COPY...ON EACH NODE, as in the following example:

=> CREATE EXTERNAL TABLE sales (...)
   AS COPY FROM '/data/sales/*.parquet' ON EACH NODE PARQUET;

The path is the same path that you specified in the export statement and is the same on all nodes.

If you omit the ON clause and just specify a path, COPY only loads the data it finds on the initiator node.

1.5 - Monitoring exports

You can review information about exports, including numbers of row groups, file sizes, and file names.

You can review information about exports, including numbers of row groups, file sizes, and file names.

The export statements are UDxs. The UDX_EVENTS system table records events logged during UDx execution, including timestamps, node names, and session IDs. This table contains a column (RAW), which holds a VMap of whatever additional data an individual UDx logged. The export statements log details about the exported files in this table. While you can work with this table directly and materialize values from the VMap column, you might prefer to define a view to simplify your access.

The following statement defines a view showing only the events from EXPORT TO PARQUET, materializing the VMap values.

=> CREATE VIEW parquet_export_events AS
   SELECT
    report_time,
    node_name,
    session_id,
    user_id,
    user_name,
    transaction_id,
    statement_id,
    request_id,
    udx_name,
    file,
    created,
    closed,
    rows,
    row_groups,
    size_mb
   FROM
    v_monitor.udx_events
   WHERE
    udx_name ilike 'ParquetExport%';

The exporters report the following UDx-specific columns:

Column Name Data Type Description
FILE VARCHAR Name of the output file.
CREATED TIMESTAMPTZ When the file was created.
CLOSED TIMESTAMPTZ When the file was closed after writing.
ROWS INTEGER The total number of rows in the file.
ROW_GROUPS INTEGER The number of row groups, for formats that use them (Parquet and ORC).
SIZE_MB FLOAT File size.

The following example shows the results of a single export.

=> SELECT file,rows,row_groups,size_mb FROM PARQUET_EXPORT_EVENTS;
                                  file                                 | rows  | row_groups | size_mb
-----------------------------------------------------------------------+-------+------------+----------
 /data/outgZxN3irt/450c4213-v_vmart_node0001-139770732459776-0.parquet | 29696 | 1          | 0.667203
 /data/outgZxN3irt/9df1c797-v_vmart_node0001-139770860660480-0.parquet | 29364 | 1          | 0.660922
(2 rows)

In this table, the output directory name (/data/out) is appended with a generated string (gZxN3irt). For exports to HDFS or to local file systems (including NFS), EXPORT TO PARQUET first writes data into a scratch directory and then renames it at the end of the operation. The events are logged during export and so show the temporary name. Some output destinations, such as AWS S3, do not support rename operations, so in those cases this table does not show generated names.

2 - Database export and import

Vertica can easily import data from and export data to other Vertica databases.

Vertica can easily import data from and export data to other Vertica databases. Importing and exporting data is useful for common tasks such as moving data back and forth between a development or test database and a production database, or between databases that have different purposes but need to share data on a regular basis.

Moving data directly between databases

To move data between databases you first establish a connection using CONNECT TO VERTICA and then use one of the following statements to move data:

These statements are symmetric; copying from cluster A to cluster B is the same as exporting from cluster B to cluster A. The difference is only in which cluster drives the operation.

To configure TLS settings for the connection, see Configuring connection security between clusters.

Creating SQL scripts to export data

Three functions return a SQL script you can use to export database objects to recreate elsewhere:

While copying and exporting data is similar to Backing up and restoring the database, you should use them for different purposes, outlined below:

Task Backup and Restore COPY and EXPORT Statements
Back up or restore an entire database, or incremental changes YES NO
Manage database objects (a single table or selected table rows) YES YES
Use external locations to back up and restore your database YES NO
Use direct connections between two databases OBJECT RESTORE ONLY YES
Use external shell scripts to back up and restore your database YES NO
Use SQL commands to incorporate copy and export tasks into DB operations NO YES

The following sections explain how you import and export data between Vertica databases.

When importing from or exporting to a Vertica database, you can connect only to a database that uses trusted (username only) or password-based authentication, as described in Security and authentication. SSL, Kerberos, and OAuth authentication methods are not supported.

Other exports

This section is about exporting data to another Vertica database. For information about exporting data to files, which can then be used in external tables or COPY statements, see File export.

2.1 - Configuring connection security between clusters

When copying data between clusters, Vertica can encrypt both data and plan metadata.

When copying data between clusters, Vertica can encrypt both data and plan metadata.

Data is encrypted if you configure internode encryption (see Internode TLS).

For metadata, by default Vertica tries TLS first and falls back to plaintext. You can configure Vertica to require TLS and to fail if the connection cannot be made. You can also have Vertica verify the certificate and hostname before connecting.

Enabling TLS between clusters

To use TLS between clusters, you must first configure TLS between nodes:

  1. Set the EncryptSpreadComms parameter.

  2. Configure the data_channel TLS Configuration.

  3. Set the ImportExportTLSMode parameter.

To specify the level of strictness when connecting to another cluster, set the ImportExportTLSMode configuration parameter. This parameter applies for both importing and exporting data. The possible values are:

  • PREFER: Try TLS but fall back to plaintext if TLS fails.

  • REQUIRE: Use TLS and fail if the server does not support TLS.

  • VERIFY_CA: Require TLS (as with REQUIRE), and also validate the other server's certificate using the CA specified by the "server" TLS Configuration's CA certificates (in this case, "ca_cert" and "ica_cert"):

    => SELECT name, certificate, ca_certificate, mode FROM tls_configurations WHERE name = 'server';
      name  |   certificate    |   ca_certificate   |   mode
    --------+------------------+--------------------+-----------
     server | server_cert      | ca_cert,ica_cert   | VERIFY_CA
    (1 row)
    
  • VERIFY_FULL: Require TLS and validate the certificate (as with VERIFY_CA), and also validate the server certificate's hostname.

  • REQUIRE_FORCE, VERIFY_CA_FORCE, and VERIFY_FULL_FORCE: Same behavior as REQUIRE, VERIFY_CA, and VERIFY_FULL, respectively, and cannot be overridden by CONNECT TO VERTICA.

ImportExportTLSMode is a global parameter that applies to all import and export connections you make using CONNECT TO VERTICA. You can override it for an individual connection.

For more information about these and other configuration parameters, see Security parameters.

2.2 - Exporting data to another database

EXPORT TO VERTICA exports table data from one Vertica database to another.

EXPORT TO VERTICA exports table data from one Vertica database to another. The following requirements apply:

  • You already opened a connection to the target database with CONNECT TO VERTICA.

  • The source database is no more than one major release behind the target database.

  • The table in the target database must exist.

  • Source and target table columns must have the same or compatible data types.

Each EXPORT TO VERTICA statement exports data from only one table at a time. You can use the same database connection for multiple export operations.

Export process

Exporting is a three-step process:

  1. Connect to the target database with CONNECT TO VERTICA.

    For example:

    => CONNECT TO VERTICA testdb USER dbadmin PASSWORD '' ON 'VertTest01', 5433;
    CONNECT
    
  2. Export the desired data with EXPORT TO VERTICA. For example, the following statement exports all table data in customer_dimension to a table of the same name in target database testdb:

    
    => EXPORT TO VERTICA testdb.customer_dimension FROM customer_dimension;
    Rows Exported
    ---------------
             23416
    (1 row)
    
  3. DISCONNECT disconnects from the target database when all export and import operations are complete:

    => DISCONNECT testdb;
    DISCONNECT
    

Mapping between source and target columns

If you export all table data from one database to another as in the previous example, EXPORT TO VERTICA can omit specifying column lists. This is possible only if column definitions in both tables comply with the following conditions:

  • Same number of columns

  • Identical column names

  • Same sequence of columns

  • Matching or compatible column data types

If any of these conditions is not true, the EXPORT TO VERTICA statement must include column lists that explicitly map source and target columns to each other, as follows:

  • Contain the same number of columns.

  • List source and target columns in the same order.

  • Pair columns with the same (or compatible) data types.

For example:

=> EXPORT TO VERTICA testdb.people (name, gender, age)
   FROM customer_dimension (customer_name, customer_gender, customer_age);

Exporting subsets of table data

In general, you can export a subset of table data in two ways:

  • Export data of specific source table columns.

  • Export the result set of a query (including historical queries) on the source table.

In both cases, the EXPORT TO VERTICA statement typically must specify column lists for the source and target tables.

The following example exports data from three columns in the source table to three columns in the target table. Accordingly, the EXPORT TO VERTICA statement specifies a column list for each table. The order of columns in each list determines how Vertica maps target columns to source columns. In this case, target columns name, gender, and age map to source columns customer_name, customer_gender, and customer_age, respectively:

=> EXPORT TO VERTICA testdb.people (name, gender, age) FROM customer_dimension
(customer_name, customer_gender, customer_age);
Rows Exported
---------------
         23416
(1 row)

The next example queries source table customer_dimension, and exports the result set to table ma_customers in target database testdb:

=> EXPORT TO VERTICA testdb.ma_customers(customer_key, customer_name, annual_income)
   AS SELECT customer_key, customer_name, annual_income FROM customer_dimension WHERE customer_state = 'MA';
Rows Exported
---------------
          3429
(1 row)

Exporting IDENTITY columns

You can export tables (or columns) that contain IDENTITY values, but the sequence values are not incremented automatically at the target table. You must use ALTER SEQUENCE to make updates.

Export IDENTITY columns as follows:

  • If both source and destination tables have an IDENTITY column and configuration parameter CopyFromVerticaWithIdentity is set to true (1), you do not need to list them.

  • If source table has an IDENTITY column, but target table does not, you must explicitly list the source and target columns.

By default, EXPORT TO VERTICA exports all IDENTITY columns . To disable this behavior globally, set the CopyFromVerticaWithIdentity configuration parameter.

2.3 - Copying data from another Vertica database

COPY FROM VERTICA imports table data from one Vertica database to another.

COPY FROM VERTICA imports table data from one Vertica database to another. The following requirements apply:

  • You already opened a connection to the target database with CONNECT TO VERTICA.

  • The source database is no more than one major release behind the target database.

  • The table in the target database must exist.

  • Source and target table columns must have the same or compatible data types.

Import process

Importing is a three-step process:

  1. Connect to the source database with CONNECT TO VERTICA. For example:

    => CONNECT TO VERTICA vmart USER dbadmin PASSWORD '' ON 'VertTest01',5433;
    CONNECT
    
  2. Import the desired data with COPY FROM VERTICA. For example, the following statement imports all table data in customer_dimension to a table of the same name:

    
    => COPY customer_dimension FROM  VERTICA vmart.customer_dimension;
     Rows Loaded
    -------------
          500000
    (1 row)
    => DISCONNECT vmart;
    DISCONNECT
    
  3. DISCONNECT disconnects from the source database when all import and export operations are complete:

    => DISCONNECT vmart;
    DISCONNECT
    

Importing IDENTITY columns

You can import IDENTITY columns as follows:

  • If both source and destination tables have an IDENTITY column and configuration parameter CopyFromVerticaWithIdentity is set to true (1), you do not need to list them.

  • If source table has an IDENTITY column, but target table does not, you must explicitly list the source and target columns.

After importing the columns, the IDENTITY column values do not increment automatically. Use ALTER SEQUENCE to make updates.

The default behavior for this statement is to import IDENTITY columns by specifying them directly in the source table. To disable this behavior globally, set the CopyFromVerticaWithIdentity configuration parameter.

2.4 - Copy and export data on AWS

There are common issues that occur when exporting or copying on AWS clusters, as described below.

There are common issues that occur when exporting or copying on AWS clusters, as described below. Except for these specific issues as they relate to AWS, copying and exporting data works as documented in Database export and import.

To copy or export data on AWS:

  1. Verify that all nodes in source and destination clusters have their own elastic IPs (or public IPs) assigned.

    If your destination cluster is located within the same VPC as your source cluster, proceed to step 3. Each node in one cluster must be able to communicate with each node in the other cluster. Thus, each source and destination node needs an elastic IP (or public IP) assigned.

  2. (For non-CloudFormation Template installs) Create an S3 gateway endpoint.

    If you aren't using a CloudFormation Template (CFT) to install Vertica, you must create an S3 gateway endpoint in your VPC. For more information, see the AWS documentation.

    For example, the Vertica CFT has the following VPC endpoint:

    "S3Enpoint" : {
        "Type" : "AWS::EC2::VPCEndpoint",
        "Properties" : {
        "PolicyDocument" : {
            "Version":"2012-10-17",
            "Statement":[{
            "Effect":"Allow",
            "Principal": "*",
            "Action":["*"],
            "Resource":["*"]
            }]
        },
        "RouteTableIds" : [ {"Ref" : "RouteTable"} ],
        "ServiceName" : { "Fn::Join": [ "", [ "com.amazonaws.", { "Ref": "AWS::Region" }, ".s3" ] ] },
        "VpcId" : {"Ref" : "VPC"}
    }
    

  3. Verify that your security group allows the AWS clusters to communicate.

    Check your security groups for both your source and destination AWS clusters. Verify that ports 5433 and 5434 are open. If one of your AWS clusters is on a separate VPC, verify that your network access control list (ACL) allows communication on port 5434.

  4. If there are one or more elastic load balancers (ELBs) between the clusters, verify that port 5433 is open between the ELBs and clusters.

  5. If you use the Vertica client to connect to one or more ELBs, the ELBs only distribute incoming connections. The data transmission path occurs between clusters.

2.5 - Changing node export addresses

You can change the export address for your Vertica cluster.

You can change the export address for your Vertica cluster. You might need to do so to export data between clusters in different network subnets.

  1. Create a subnet for importing and exporting data between Vertica clusters. The CREATE SUBNET statement identifies the public network IP addresses residing on the same subnet.

    => CREATE SUBNET kv_subnet with '10.10.10.0';
    
  2. Alter the database to specify the subnet name of a public network for import/export.

    => ALTER DATABASE DEFAULT EXPORT ON kv_subnet;
    
  3. Create network interfaces for importing and exporting data from individual nodes to other Vertica clusters. The CREATE NETWORK INTERFACE statement identifies the public network IP addresses residing on multiple subnets.

    => CREATE NETWORK INTERFACE kv_node1 on v_VMartDB_node0001 with '10.10.10.1';
    => CREATE NETWORK INTERFACE kv_node2 on v_VMartDB_node0002 with '10.10.10.2';
    => CREATE NETWORK INTERFACE kv_node3 on v_VMartDB_node0003 with '10.10.10.3';
    => CREATE NETWORK INTERFACE kv_node4 on v_VMartDB_node0004 with '10.10.10.4';
    

    For users on Amazon Web Services (AWS) or using Network Address Translation (NAT), refer to Vertica on Amazon Web Services.

  4. Alter the node settings to change the export address. When used with the EXPORT ON clause, the ALTER NODE specifies the network interface of the public network on individual nodes for importing and exporting data.

    => ALTER NODE v_VMartDB_node0001 export on kv_node1;
    => ALTER NODE v_VMartDB_node0002 export on kv_node2;
    => ALTER NODE v_VMartDB_node0003 export on kv_node3;
    => ALTER NODE v_VMartDB_node0004 export on kv_node4;
    
  5. Verify if the node address and the export address are different on different network subnets of the Vertica cluster.

    => SELECT node_name, node_address, export_address FROM nodes;
         node_name     | node_address    | export_address
    -------------------+-----------------+----------------
    v_VMartDB_node0001 | 192.168.100.101 | 10.10.10.1
    v_VMartDB_node0002 | 192.168.100.102 | 10.10.10.2
    v_VMartDB_node0003 | 192.168.100.103 | 10.10.10.3
    v_VMartDB_node0004 | 192.168.100.104 | 10.10.10.4
    

Creating a network interface and altering the node settings to change the export address takes precedence over creating a subnet and altering the database for import/export.

2.6 - Using public and private IP networks

In many configurations, Vertica cluster hosts use two network IP addresses as follows:.

In many configurations, Vertica cluster hosts use two network IP addresses as follows:

  • A private address for communication between the cluster hosts.

  • A public IP address for communication with client connections.

By default, importing from and exporting to another Vertica database uses the private network.

To use the public network address for copy and export activities, as well as moving large amounts of data, configure the system to use the public network to support exporting to or importing from another Vertica cluster:

Vertica encrypts data during transmission (if you have configured a certificate). Vertica attempts to also encrypt plan metadata but, by default, falls back to plaintext if needed. You can configure Vertica to require encryption for metadata too; see Configuring connection security between clusters.

In certain instances, both public and private addresses exceed the demand capacity of a single Local Area Network (LAN). If you encounter this type of scenario, then configure your Vertica cluster to use two LANs: one for public network traffic and one for private network traffic.

2.6.1 - Identify the public network to Vertica

To be able to import to or export from a public network, Vertica needs to be aware of the IP addresses of the nodes or clusters on the public network that will be used for import/export activities.

To be able to import to or export from a public network, Vertica needs to be aware of the IP addresses of the nodes or clusters on the public network that will be used for import/export activities. Your public network might be configured in either of these ways:

  • Public network IP addresses reside on the same subnet (create a subnet)

  • Public network IP addresses are on multiple subnets (create a network interface)

To identify public network IP addresses residing on the same subnet:

  • Use the CREATE SUBNET statement provide your subnet with a name and to identify the subnet routing prefix.

To identify public network IP addresses residing on multiple subnets:

After you've identified the subnet or network interface to be used for import/export, you must Identify the Database Or Nodes Used For Import/Export.

See also

2.6.2 - Identify the database or nodes used for import/export

After you identify the public network to Vertica, you can configure a database and its nodes to use it for import and export operations:.

After you identify the public network to Vertica, you can configure a database and its nodes to use it for import and export operations:

  • Use ALTER DATABASE to specify a subnet on the public network for the database. After doing so, all nodes in the database automatically use the network interface on the subnet for import/export operations.

  • On each database node, use ALTER NODE to specify a network interface of the public network.

See also

2.7 - Handling node failure during copy/export

When an export (EXPORT TO VERTICA) or import from Vertica (COPY FROM VERTICA) task is in progress, and a non-initiator node fails, Vertica does not complete the task automatically.

When an export (EXPORT TO VERTICA) or import from Vertica (COPY FROM VERTICA) task is in progress, and a non-initiator node fails, Vertica does not complete the task automatically. A non-initiator node is any node that is not the source or target node in your export or import statement. To complete the task, you must run the statement again.

You address the problem of a non-initiator node failing during an import or export as follows:

  1. You export or import from one cluster to another using the EXPORT TO VERTICA or COPY FROM VERTICA statement.

    During the export or import, a non-initiating node on the target or source cluster fails. Vertica issues an error message that indicates possible node failure, one of the following:

    • ERROR 4534: Receive on v_tpchdb1_node0002: Message receipt from v_tpchdb2_node0005 failed

    • WARNING 4539: Received no response from v_tpchdb1_node0004 in abandon plan

    • ERROR 3322: [tpchdb2] Execution canceled by operator

  2. Complete your import or export by running the statement again. The failed node does not need to be up for Vertica to successfully complete the export or import.

2.8 - Using EXPORT functions

Vertica provides several EXPORT_ functions that let you recreate a database, or specific schemas and tables, in a target database.

Vertica provides several EXPORT_ functions that let you recreate a database, or specific schemas and tables, in a target database. For example, you can use the EXPORT_ functions to transfer some or all of the designs and objects you create in a development or test environment to a production database.

The EXPORT_ functions create SQL scripts that you can run to generate the exported database designs or objects. These functions serve different purposes to the export statements, COPY FROM VERTICA (pull data) and EXPORT TO VERTICA (push data). These statements transfer data directly from source to target database across a network connection between both. They are dynamic actions and do not generate SQL scripts.

The EXPORT_ functions appear in the following table. Depending on what you need to export, you can use one or more of the functions. EXPORT_CATALOG creates the most comprehensive SQL script, while EXPORT_TABLES and EXPORT_OBJECTS are subsets of that function to narrow the export scope.

Use this function... To recreate...
EXPORT_CATALOG

These catalog items:

  • An existing schema design, tables, projections, constraints, views, and stored procedures.

  • The Database Designer-created schema design, tables, projections, constraints, and views

  • A design on a different cluster.

EXPORT_TABLES Non-virtual objects up to, and including, the schema of one or more tables.
EXPORT_OBJECTS Catalog objects in order dependency for replication.

The designs and object definitions that the script creates depend on the EXPORT_ function scope you specify. The following sections give examples of the commands and output for each function and the scopes it supports.

Saving scripts for export functions

All of the examples in this section were generated using the standard Vertica VMART database, with some additional test objects and tables. One output directory was created for all SQL scripts that the functions created:

/home/dbadmin/xtest

If you specify the destination argument as an empty string (''), the function writes the export results to STDOUT.

2.8.1 - Exporting the catalog

Vertica function EXPORT_CATALOG generates a SQL script for copying a database design to another cluster.

Vertica function EXPORT_CATALOG generates a SQL script for copying a database design to another cluster. This script replicates the physical schema design of the source database. You call this function as follows:

EXPORT_CATALOG ( ['[destination]' [, '[scope]']] )

The SQL script conforms to the following requirements:

  • Only includes objects to which the user has access.

  • Orders CREATE statements according to object dependencies so they can be recreated in the correct sequence. For example, if a table is in a non-PUBLIC schema, the required CREATE SCHEMA statement precedes the CREATE TABLE statement. Similarly, a table's CREATE ACCESS POLICY statement follows the table's CREATE TABLE statement.

  • If possible, creates projections with their KSAFE clause, if any, otherwise with their OFFSET clause.

Setting export scope

If you omit specifying a scope, EXPORT_CATALOG exports all objects. You can set the scope of the export operation to one of the following levels:

Scope Exports...
TABLES Tables, schemas, and table-dependent objects: constraints, and access policies
DESIGN All catalog objects: schemas, tables, constraints, views, access policies, projections, SQL macros, and stored procedures.
DIRECTED_QUERIES All directed queries that are stored in the catalog. For details, see Managing directed queries.

Exporting table objects

Use the TABLES scope to generate a script that recreates all tables and the objects that they depend on: schemas, sequences, constraints, and access policies:

=> SELECT EXPORT_CATALOG (
     '/home/dbadmin/xtest/sql_cat_tables.sql',
     'TABLES');
           EXPORT_CATALOG
-------------------------------------
 Catalog data exported successfully
(1 row)

The SQL script can include the following statements:

  • CREATE SCHEMA

  • CREATE TABLE

  • ALTER TABLE (constraints)

  • CREATE SEQUENCE

  • CREATE ACCESS POLICY

  • CREATE PROCEDURE (Stored)

Exporting all catalog objects

Use the DESIGN scope to export all design elements of a source database in order of their dependencies. This scope exports all catalog objects, including schemas, tables, constraints, projections, views, and access policies.

=> SELECT EXPORT_CATALOG(
     '/home/dbadmin/xtest/sql_cat_design.sql',
     'DESIGN' );
           EXPORT_CATALOG
-------------------------------------
 Catalog data exported successfully
(1 row)

The SQL script include statements that are required to recreate the database:

  • CREATE SCHEMA

  • CREATE TABLE

  • ALTER TABLE (constraints)

  • CREATE VIEW

  • CREATE SEQUENCE

  • CREATE ACCESS

  • CREATE PROJECTION (with ORDER BY and SEGMENTED BY)

Projection considerations

If a projection to export was created with no ORDER BY clause, the SQL script reflects the default behavior for projections. Vertica implicitly creates projections using a sort order based on the SELECT columns in the projection definition.

The EXPORT_CATALOG script is portable if all projections are created using UNSEGMENTED ALL NODES or SEGMENTED ALL NODES.

See also

2.8.2 - Exporting tables

Vertica function EXPORT_TABLES exports DDL for tables and related objects in the current database.

Vertica function EXPORT_TABLES exports DDL for tables and related objects in the current database. The generated SQL includes all non-virtual table objects to which you have access. You can use this SQL to recreate tables and related non-virtual objects on a different cluster.

You execute EXPORT_TABLES as follows:

EXPORT_TABLES( ['[destination]' [, '[scope]']] )

The SQL script conforms to the following requirements:

  • Only includes objects to which the user has access.

  • Orders CREATE statements according to object dependencies so they can be recreated in the correct sequence. For example, if a table references a named sequence, a CREATE SEQUENCE statement precedes the CREATE TABLE statement. Similarly, a table's CREATE ACCESS POLICY statement follows the table's CREATE TABLE statement.

Setting export scope

The EXPORT_TABLES scope argument specifies the scope of the export operation:

To export... Set scope to...
All tables to which you have access, including constraints Empty string (' ')
One or more named tables

Comma-delimited list of table objects. For example:

myschema.newtable, yourschema.oldtable

You can optionally qualify the schema with the name of the current database:

mydb.myschema.newtable

A named table object in the current search path: a schema, table, or sequence. If you specify a schema, EXPORT_TABLES exports all table objects in that schema to which you have access.

Table object's name and, optionally, its path:

VMart.myschema

Exporting all table objects

If you set the scope parameter to an empty string (''), EXPORT_TABLES exports all tables and their related objects. For example, the following call to EXPORT_TABLES exports all table objects in the VMart database to the specified output file.

 => SELECT EXPORT_TABLES(
      '/home/dbadmin/xtest/sql_tables_empty.sql', '');
            EXPORT_TABLES
-------------------------------------
 Catalog data exported successfully
(1 row)

The exported SQL includes the following types of statements, depending on what is required to recreate the tables and related objects, such as schemas, sequences and access policies:

  • CREATE SCHEMA

  • CREATE TABLE

  • CREATE SEQUENCE

  • CREATE ACCESS POLICY

  • ALTER TABLE (to add foreign key constraints)

Exporting individual table objects

EXPORT_TABLES can specify a comma-separated list of tables and table-related objects such as sequences or schemas to export. The generated SQL script includes the CREATE statements for the specified objects and their dependent objects:

  • CREATE SCHEMA

  • CREATE TABLE

  • CREATE SEQUENCE

  • CREATE ACCESS POLICY

  • ALTER TABLE (to add foreign keys)

For example, the following call to EXPORT_TABLES exports two VMart tables: store.store_sales_fact and store.store_dimension:

=> SELECT export_tables('','store.store_sales_fact, store.store_dimension');
                              export_tables
-------------------------------------------------------------------------

CREATE TABLE store.store_dimension
(
    store_key int NOT NULL,
    store_name varchar(64),
    store_number int,
    store_address varchar(256),
    store_city varchar(64),
    store_state char(2),
    store_region varchar(64),
    floor_plan_type varchar(32),
    photo_processing_type varchar(32),
    financial_service_type varchar(32),
    selling_square_footage int,
    total_square_footage int,
    first_open_date date,
    last_remodel_date date,
    number_of_employees int,
    annual_shrinkage int,
    foot_traffic int,
    monthly_rent_cost int,
    CONSTRAINT C_PRIMARY PRIMARY KEY (store_key) DISABLED
);

CREATE TABLE store.store_sales_fact
(
    date_key int NOT NULL,
    product_key int NOT NULL,
    product_version int NOT NULL,
    store_key int NOT NULL,
    promotion_key int NOT NULL,
    customer_key int NOT NULL,
    employee_key int NOT NULL,
    pos_transaction_number int NOT NULL,
    sales_quantity int,
    sales_dollar_amount int,
    cost_dollar_amount int,
    gross_profit_dollar_amount int,
    transaction_type varchar(16),
    transaction_time time,
    tender_type varchar(8)
);

ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_date FOREIGN KEY (date_key) references public.date_dimension (date_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_product FOREIGN KEY (product_key, product_version) references public.product_dimension (product_key, product_version);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_store FOREIGN KEY (store_key) references store.store_dimension (store_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_promotion FOREIGN KEY (promotion_key) references public.promotion_dimension (promotion_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_customer FOREIGN KEY (customer_key) references public.customer_dimension (customer_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_employee FOREIGN KEY (employee_key) references public.employee_dimension (employee_key);

The following call to EXPORT_TABLES specifies to export all tables in the VMart schema store :

=> select export_tables('','store');
                          export_tables
-----------------------------------------------------------------
CREATE SCHEMA store;

CREATE TABLE store.store_dimension
(
 ...
);

CREATE TABLE store.store_sales_fact
(
 ...
);

CREATE TABLE store.store_orders_fact
(
  ...
);

ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_date FOREIGN KEY (date_key) references public.date_dimension (date_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_product FOREIGN KEY (product_key, product_version) references public.product_dimension (product_key, product_version);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_store FOREIGN KEY (store_key) references store.store_dimension (store_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_promotion FOREIGN KEY (promotion_key) references public.promotion_dimension (promotion_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_customer FOREIGN KEY (customer_key) references public.customer_dimension (customer_key);
ALTER TABLE store.store_sales_fact ADD CONSTRAINT fk_store_sales_employee FOREIGN KEY (employee_key) references public.employee_dimension (employee_key);
ALTER TABLE store.store_orders_fact ADD CONSTRAINT fk_store_orders_product FOREIGN KEY (product_key, product_version) references public.product_dimension (product_key, product_version);
ALTER TABLE store.store_orders_fact ADD CONSTRAINT fk_store_orders_store FOREIGN KEY (store_key) references store.store_dimension (store_key);
ALTER TABLE store.store_orders_fact ADD CONSTRAINT fk_store_orders_vendor FOREIGN KEY (vendor_key) references public.vendor_dimension (vendor_key);
ALTER TABLE store.store_orders_fact ADD CONSTRAINT fk_store_orders_employee FOREIGN KEY (employee_key) references public.employee_dimension (employee_key);

(1 row)

See also

2.8.3 - Exporting objects

The Vertica function EXPORT_OBJECTS generates a SQL script that you can use to recreate non-virtual catalog objects on a different cluster, as follows:.

The Vertica function EXPORT_OBJECTS generates a SQL script that you can use to recreate non-virtual catalog objects on a different cluster, as follows:

EXPORT_OBJECTS( ['[destination]' [, '[scope]'] [, 'mark-ksafe']] )

The SQL script conforms to the following requirements:

  • Only includes objects to which the user has access.

  • Orders CREATE statements according to object dependencies so they can be recreated in the correct sequence. For example, if a table is in a non-PUBLIC schema, the required CREATE SCHEMA statement precedes the CREATE TABLE statement. Similarly, a table's CREATE ACCESS POLICY statement follows the table's CREATE TABLE statement.

  • If possible, creates projections with their KSAFE clause, if any, otherwise with their OFFSET clause.

Setting export scope

The EXPORT_OBJECTS scope argument specifies the scope of the export operation:

To export... Set scope to...
All objects to which you have access Empty string ('')
One or more named database objects and related objects

Comma-delimited list of objects. For example:

myschema.newtable, yourschema.my-sequence

You can optionally qualify the schema with the name of the current database:

mydb.myschema.newtable

If you specify a schema, EXPORT_TABLES exports all objects in that schema to which you have access. If you name a table that references a sequence, the generated script shows the sequence, then the table that references the sequence, and finally any projections of that table.

Exporting all objects

If you set the scope parameter to an empty string (''), Vertica exports all non-virtual objects from the source database in order of dependencies. Running the generated SQL script on another cluster creates all referenced objects and their dependent objects.

By default, the function's KSAFE argument is set to true. In this case, the generated script calls MARK_DESIGN_KSAFE, which replicates K-safety of the original database.

=> SELECT EXPORT_OBJECTS(
     '/home/dbadmin/xtest/sql_objects_all.sql',
     '',
     'true');
           EXPORT_OBJECTS
-------------------------------------
 Catalog data exported successfully
(1 row)

The SQL script includes the following types of statements:

The following output includes the start and end of the output SQL file, including the MARK_DESIGN_KSAFE statement:

CREATE SCHEMA store;
CREATE SCHEMA online_sales;
CREATE SEQUENCE public.my_seq ;
CREATE TABLE public.customer_dimension
(
    customer_key int NOT NULL,
    customer_type varchar(16),
    customer_name varchar(256),
    customer_gender varchar(8),
    title varchar(8),
    household_id int,
...

);
...
SELECT MARK_DESIGN_KSAFE(1);

Exporting individual objects

You can specify one or more objects as the function scope, where multiple objects are specified in a comma-delimited list. The names of objects in any non-PUBLIC schema must include their respective schemas. The objects to export can include schemas, tables, views, and sequences. Accordingly, the SQL script includes the following statements, depending on what objects you list and their dependencies:

  • CREATE SCHEMA

  • CREATE TABLE

  • ALTER TABLE (to add constraints)

  • CREATE VIEW

  • CREATE SEQUENCE

  • CREATE ACCESS POLICY

  • CREATE PROCEDURE (Stored)

  • CREATE SCHEDULE

  • CREATE TRIGGER

If listed objects have overlapping scopes—for example, the list includes a table and one of its projections—EXPORT_OBJECTS exports the projection only once:


=> select export_objects ('','customer_dimension, customer_dimension_super');

                              export_objects
--------------------------------------------------------------------------
CREATE TABLE public.customer_dimension
(
    customer_key int NOT NULL,
    customer_type varchar(16),
    customer_name varchar(256),
    ...
    CONSTRAINT C_PRIMARY PRIMARY KEY (customer_key) DISABLED
);

CREATE ACCESS POLICY ON public.customer_dimension FOR COLUMN customer_address CASE WHEN enabled_role('administrator') THEN customer_address ELSE '**************' END ENABLE;

CREATE PROJECTION public.customer_dimension_super /*+basename(customer_dimension),createtype(L)*/
(
 customer_key,
 customer_type,
 customer_name,
  ...
)
AS
 SELECT customer_dimension.customer_key,
        customer_dimension.customer_type,
        customer_dimension.customer_name,
        ...
 FROM public.customer_dimension
 ORDER BY customer_dimension.customer_key
SEGMENTED BY hash(customer_dimension.customer_key) ALL NODES OFFSET 0;

SELECT MARK_DESIGN_KSAFE(0);

You can export stored procedures by specifying their name and the types of their formal parameters. For stored procedures with the same name but different formal parameters, you can export all implementations by exporting its parent schema.

To export a particular implementation, specify either the types or both the names and types of its formal parameters. The following example specifies the types:

=> SELECT EXPORT_OBJECTS('','raiseXY(int, int)');
    EXPORT_OBJECTS
----------------------

CREATE PROCEDURE public.raiseXY(x int, y int)
LANGUAGE 'PL/vSQL'
SECURITY INVOKER
AS '
BEGIN
RAISE NOTICE ''x = %'', x;
RAISE NOTICE ''y = %'', y;
-- some processing statements
END
';

SELECT MARK_DESIGN_KSAFE(0);

(1 row)

To export all implementations of the overloaded stored procedure raiseXY, export its parent schema:

=> SELECT EXPORT_OBJECTS('','public');
    EXPORT_OBJECTS
----------------------

...

CREATE PROCEDURE public.raiseXY(x int, y varchar)
LANGUAGE 'PL/vSQL'
SECURITY INVOKER
AS '
BEGIN
RAISE NOTICE ''x = %'', x;
RAISE NOTICE ''y = %'', y;
-- some processing statements
END
';

CREATE PROCEDURE public.raiseXY(x int, y int)
LANGUAGE 'PL/vSQL'
SECURITY INVOKER
AS '
BEGIN
RAISE NOTICE ''x = %'', x;
RAISE NOTICE ''y = %'', y;
-- some processing statements
END
';

SELECT MARK_DESIGN_KSAFE(0);

(1 row)

See also

Exporting tables