Syntax
Note
EXPORT TO PARQUET, EXPORT TO ORC, EXPORT TO JSON, and EXPORT TO DELIMITED have the same general syntax. This documentation refers to them collectively as EXPORT statements, not to be confused with EXPORT TO VERTICA.Use the EXPORT statements to export data specified by a SELECT statement, as in the following example:
=> EXPORT TO PARQUET(directory='webhdfs:///data/sales_data')
AS SELECT * FROM public.sales;
Rows Exported
---------------
14336
(1 row)
The directory
parameter specifies where to write the files and is required. You must have permission to write to the output directory. You can use the ifDirExists
parameter to specify what to do if the directory already exists:
-
fail
(default) -
overwrite
: replace the entire directory -
append
: export new files into the existing directory
You can export to HDFS, S3, GCS, Azure, or the Linux file system. For additional considerations specific to object stores, see Exporting to object stores. If you export to the local Linux file system (not a shared NFS mount) you must take some additional steps; see Exporting to the Linux file system.
You can use EXPORT statements to write queries across multiple tables in Vertica and export the results. With this approach you can take advantage of powerful, fast query execution in Vertica while making the results available to other clients:
=> EXPORT TO ORC(directory='webhdfs:///data/sales_by_region')
AS SELECT sale.price, sale.date, store.region
FROM public.sales sale
JOIN public.vendor store ON sale.distribID = store.ID;
Rows Exported
---------------
23301
(1 row)
Restrictions
-
If you perform more than one concurrent export to the same directory, only one succeeds. If you perform concurrent exports with
ifDirExists=overwrite
, the results will be incorrect. -
Exports cannot be run in fenced mode. If the ForceUDxFencedMode configuration parameter is set to true to support other UDxs, you can set it to false at the session level before performing the export.
Data types
All exporters can export scalar types. All except EXPORT TO DELIMITED also support primitive arrays. The JSON and Parquet exporters also support the ARRAY and ROW types in any combination:
=> SELECT * FROM restaurants;
name | cuisine | location_city | menu
-------------------+---------+----------------------------+------------------------------------------------------------------------------
Bob's pizzeria | Italian | ["Cambridge","Pittsburgh"] | [{"item":"cheese pizza","price":null},{"item":"spinach pizza","price":10.5}]
Bakersfield Tacos | Mexican | ["Pittsburgh"] | [{"item":"veggie taco","price":9.95},{"item":"steak taco","price":10.95}]
(2 rows)
=> EXPORT TO JSON (directory='/output/json') AS SELECT * FROM restaurants;
Rows Exported
---------------
2
(1 row)
=> \! cat /output/json/*.json
{"name":"Bob's pizzeria","cuisine":"Italian","location_city":["Cambridge","Pittsburgh"],"menu":[{"item":"cheese pizza","price":null},{"item":"spinach pizza","price":10.5}]}
{"name":"Bakersfield Tacos","cuisine":"Mexican","location_city":["Pittsburgh"],"menu":[{"item":"veggie taco","price":9.95},{"item":"steak taco","price":10.95}]}
Query
EXPORT statements rewrite the query you specify, because the export is done by a user-defined transform function (UDTF). Because of this rewrite, there are some restrictions on the query you supply.
The query can contain only a single outer SELECT statement. For example, you cannot use UNION:
=> EXPORT TO PARQUET(directory = '/mnt/shared_nfs/accounts/rm')
OVER(PARTITION BY hash)
AS
SELECT 1 as account_id, '{}' as json, 0 hash
UNION ALL
SELECT 2 as account_id, '{}' as json, 1 hash;
ERROR 8975: Only a single outer SELECT statement is supported
HINT: Please use a subquery for multiple outer SELECT statements
Instead, rewrite the query to use a subquery:
=> EXPORT TO PARQUET(directory = '/mnt/shared_nfs/accounts/rm')
OVER(PARTITION BY hash)
AS
SELECT
account_id,
json
FROM
(
SELECT 1 as account_id, '{}' as json, 0 hash
UNION ALL
SELECT 2 as account_id, '{}' as json, 1 hash
) a;
Rows Exported
---------------
2
(1 row)
To use composite statements such as UNION, INTERSECT, and EXCEPT, rewrite them as subqueries.
When exporting data you can use an OVER() clause to partition and sort the data as described in Partitioning and sorting data. Partitioning and sorting can improve query performance. If you partition the data, you cannot specify schema and table names in the SELECT statement. In this case, specify only the column name.
All exporters have a required directory
parameter. All allow you to specify a compression type, and each exporter has format-specific parameters. See the parameter descriptions on the individual reference pages: EXPORT TO PARQUET, EXPORT TO ORC, EXPORT TO JSON, and EXPORT TO DELIMITED.