EXPORT TO DELIMITED
Exports a table, columns from a table, or query results to delimited files. The files can be read back in using DELIMITED. Several exporter parameters have corresponding parser parameters, allowing you to change delimiters, null indicators, and other formatting.
There are some limitations on the queries you can use in an export statement. See Query Restrictions.
You can export data stored in Vertica in ROS format and data from external tables.
This statement returns the number of rows written and logs information about exported files in a system table. See Monitoring exports.
During an export to HDFS or an NFS mount point, Vertica writes files to a temporary directory in the same location as the destination and renames the directory when the export is complete. Do not attempt to use the files in the temporary directory. During an export to S3, GCS, or Azure, Vertica writes files directly to the destination path, so you must wait for the export to finish before reading the files. For more information, see Exporting to object stores.
Syntax
EXPORT [ /*+LABEL (label-string)*/ ] TO DELIMITED
( directory='path'[, param=value[,...] ] )
[ OVER (over-clause ) ] AS SELECT query-expression
Arguments
-
LABEL
- Assigns a label to a statement to identify it for profiling and debugging.
over-clause
- Specifies how to partition table data using PARTITION BY. Within partitions you can sort using ORDER BY. See SQL analytics. This clause may contain column references but not expressions.
If you partition data, Vertica creates a partition directory structure, transforming column names to lowercase. See Partitioned file paths for a description of the directory structure. If you use the
fileName
parameter, you cannot use partitioning.If you omit this clause, Vertica optimizes for maximum parallelism.
query-expression
- Specifies the data to export. See Query Restrictions for important limitations.
Parameters
directory
The destination directory for the output files. The current user must have permission to write it. The destination can be on any of the following file systems:
- HDFS file system
- S3 object store
- Google Cloud Storage (GCS) object store
- Azure Blob Storage object store
-
Linux file system, either an NFS mount or local storage on each node
See also:
ifDirExists
.ifDirExists
What to do if
directory
already exists, one of:-
fail
(default) -
overwrite
: replace the entire directory -
append
: export new files into the existing directory
If you specify
overwrite
for an export to an object store, the existing directory is deleted recursively at the beginning of the operation and is not restored if the operation fails. Be careful not to export to a directory containing data you want to keep. For an export to a Linux file system or HDFS, the directory is only overwritten if the export succeeds.Do not do concurrent exports to the same directory. In particular, if you do so with a value of
overwrite
, all operations appear to succeed, but the results are incorrect.When using
append
, be careful to use the same table schema. Otherwise, queries of external tables using this data path could fail.-
filename
If specified, all output is written to a single file of this name in the location specified by
directory
. While the query can be processed by multiple nodes, only a single node generates the output data. ThefileSizeMB
parameter is ignored, and the query cannot use partitioning in the OVER() clause.addHeader
- If true, add a header row to the beginning of each file.
Default: false
delimiter
- Column delimiter character. To produce CSV in accordance with RFC 4180, set the delimiter character to
,
(comma).Default:
|
(vertical bar) escapeDelimitersInsideEnclosures
- If true, delimiter characters are escaped within values that are enclosed by
enclosedBy
. This parameter is most relevant whenenclosedBy
andescapeAs
have the same value. A value of false only applies whenenclosedBy
has a value (is not an empty string).Default: true
recordTerminator
- Character that marks the record end.
Default:
\n
enclosedBy
- Character to use to enclose string and date/time data. If you omit this parameter, no character encloses these data types.
Default:
''
(empty string, no enclosing character) escapeAs
- Character to use to escape values in exported data that must be escaped, including the
enclosedBy
value.Default:
\
(backslash) nullAs
- String to represent null values in the data. If this parameter is included, the exporter exports all null values as this value. Otherwise, the exporter exports null values as zero-length strings.
binaryTypesFormat
- Format for exported binary data type (BINARY, VARBINARY, and LONG VARBINARY) values, one of the following:
-
Default
: Printable ASCII characters where possible and escaped octal representations of the non-printable bytes. The DELIMITED parser reads this format. -
Hex
: Base 16 (hexadecimal) representation; value is preceded by '0x' and bytes are not escaped. -
Octal
: Base 8 (octal) representation, without escaping. -
Bitstring
: Binary representation, without escaping.
For example, the value
a\000b\001c
can be exported as follows:-
Default (assuming an escape character of
\
):a\\000b\\001c
-
Hex:
0x6100620163
-
Octal:
141000142001143
-
Bitstring:
0110000100000000011000100000000101100011
-
compression
- Compression type, one of:
-
Uncompressed
-
BZip
-
GZip
Default: Uncompressed
-
fileExtension
- Output file extension. If using compression, a compression-specific extension such as
.bz2
is appended.Default:
csv
fileSizeMB
The maximum file size of a single output file. This value is a hint, not a hard limit. A value of 0 specifies no limit. If
filename
is also specified,fileSizeMB
is ignored.This value affects the size of individual output files, not the total output size. For smaller values, Vertica divides the output into more files; all data is still exported.
Default: 10GB
fileMode
For writes to HDFS only, permission to apply to all exported files. You can specify the value in Unix octal format (such as
665
) oruser
-
group
-
other
format—for example,rwxr-xr-x
. The value must be formatted as a string even if using the octal format.Valid octal values range between
0
and1777
, inclusive. See HDFS Permissions in the Apache Hadoop documentation.When writing files to any destination other than HDFS, this parameter has no effect.
Default:
660
, regardless of the value offs.permissions.umask-mode
inhdfs-site.xml
.dirMode
For writes to HDFS only, permission to apply to all exported directories. Values follow the same rules as those for
fileMode
. Further, you must give the Vertica HDFS user full permission, at leastrwx------
or700
.When writing files to any destination other than HDFS, this parameter has no effect.
Default:
755
, regardless of the value offs.permissions.umask-mode
inhdfs-site.xml
.
Privileges
Non-superusers:
-
Source table: SELECT
-
Source table schema: USAGE
-
Destination directory: Write
Query restrictions
You must provide an alias column label for selected column targets that are expressions.
If you partition the output, you cannot specify schema and table names in the SELECT statement. Specify only the column name.
The query can contain only a single outer SELECT statement. For example, you cannot use UNION:
=> EXPORT TO DELIMITED(directory = '/mnt/shared_nfs/accounts/rm')
OVER(PARTITION BY hash)
AS
SELECT 1 as account_id, '{}' as json, 0 hash
UNION ALL
SELECT 2 as account_id, '{}' as json, 1 hash;
ERROR 8975: Only a single outer SELECT statement is supported
HINT: Please use a subquery for multiple outer SELECT statements
Instead, rewrite the query to use a subquery:
=> EXPORT TO DELIMITED(directory = '/mnt/shared_nfs/accounts/rm')
OVER(PARTITION BY hash)
AS
SELECT
account_id,
json
FROM
(
SELECT 1 as account_id, '{}' as json, 0 hash
UNION ALL
SELECT 2 as account_id, '{}' as json, 1 hash
) a;
Rows Exported
---------------
2
(1 row)
To use composite statements such as UNION, INTERSECT, and EXCEPT, rewrite them as subqueries.
Data types
EXPORT TO DELIMITED does not support ARRAY, ROW, and SET types.
This operation exports raw Flex columns as binary data.
Output
The export operation always creates (or appends to) an output directory, even if all output is written to a single file or the query produces zero rows.
Output file names follow the pattern: prefix
-
nodename
-
threadId
[-
sequenceNumber
].
fileExtension
. prefix
is typically an 8-character hash, but can be longer if the export appended to an existing directory. A sequence number is added if an export needs to be broken into pieces to satisfy fileSizeMB
.
Column names in partition directories are lowercase.
Files exported to a local file system by any Vertica user are owned by the Vertica superuser. Files exported to HDFS or object stores are owned by the Vertica user who exported the data.
Making concurrent exports to the same output destination is an error and can produce incorrect results.
Exports to the local file system can be to an NFS mount (shared) or to the Linux file system on each node (non-shared). For details, see Exporting to the Linux file system. Exports to non-shared local file systems have the following restrictions:
-
The output directory must not exist on any node.
-
You must have a USER storage location or superuser privileges.
-
You cannot override the permissions mode of 700 for directories and 600 for files.
Exports to object-store file systems are not atomic. Be careful to wait for the export to finish before using the data. For details, see Exporting to object stores.
Examples
The following example exports uncompressed comma-separated values (CSV) with a header row in each file:
=> EXPORT TO DELIMITED(directory='webhdfs:///user1/data', delimiter=',', addHeader='true')
AS SELECT * FROM public.sales;