Parameters
COPY parameters and their descriptions are divided into the following sections:
Target options
The following options apply to the target tables and their columns:
-
LABEL
Assigns a label to a statement to identify it for profiling and debugging.
{
database
|
namespace
}
- Name of the database or namespace that contains
table
:-
Database name: If specified, it must be the current database.
-
Namespace name (Eon Mode only): You must specify the namespace of objects in non-default namespaces. If no namespace is provided, Vertica assumes the object is in the default namespace.
-
schema
- Name of the schema, by default
public
. If you specify the namespace or database name, you must provide the schema name, even if the schema ispublic
.If you specify a schema name that contains a period, the part before the period cannot be the same as the name of an existing namespace.
Note
COPY ignoresschema
when used in CREATE EXTERNAL TABLE or CREATE FLEX EXTERNAL TABLE statements.
target-table
- The target columnar or flexible table for loading new data. Vertica loads the data into all projections that include columns from the schema table.
column-as-expression
- An expression used to compute values for the target column, which must not be of a complex type. For example:
=> COPY t(year AS TO_CHAR(k, 'YYYY')) FROM 'myfile.dat'
Use this option to transform data when it is loaded into the target database.
For details, see Transforming data during loads.
column
- Restricts the load to one or more specified columns in the table. If you omit specifying columns, COPY loads all columns by default.
Table columns that you omit from the column list are assigned their DEFAULT or SET USING values, if any; otherwise, COPY inserts NULL.
If you leave the
column
parameter blank to load all columns in the table, you can use the optional parameter COLUMN OPTION to specify parsing options for specific columns.The data file must contain the same number of columns as the COPY command's column list.
COLUMN OPTION
- Specifies load metadata for one or more columns declared in the table column list. For example, you can specify that a column has its own DELIMITER, ENCLOSED BY, or NULL AS expression, and so on. You do not have to specify every column name explicitly in the COLUMN OPTION list, but each column you specify must correspond to a column in the table column list.
Column options
Depending on how they are specified, the following COPY options can qualify specific columns or all columns. Some parser-specific options can also apply to either specific columns or all columns. See Global and column-specific options For details about these two modes.
ENFORCELENGTH
- If specified, COPY rejects data rows of type CHAR, VARCHAR, BINARY, and VARBINARY, or elements of those types in collections, if they are larger than the declared size.
By default, COPY truncates offending rows of these data types and elements of these types in collections, but does not reject the rows. For more details, see Handling Messy Data.
If a collection does not fit with all of its elements, COPY rejects the row without truncating. It does not reduce the number of elements. This can happen if each element is individually within limits but the number of elements causes the collection to exceed the maximum size for the column.
FILLER
datatype
- Reads but does not copy the data of an input column. Use filler columns to ignore input columns that do not have columns in the table. You can also use filler columns to transform data (see Examples and Transforming data during loads). Filler columns cannot be of complex types.
If a filler column has the same name as a column in the table, you must explicitly scope other uses of the name in the COPY statement. To refer to an ambiguous filler column, prefix the name with
"*FILLER*".
:=> COPY names(first FILLER VARCHAR, last, full AS "*FILLER*".first||' '||last) FROM ...;
FORMAT '
format
'
- Input format, one of the following:
-
octal
-
hex
-
bitstream
See Binary (native) data to learn more about these formats.
When loading date/time columns, using FORMAT significantly improves load performance. COPY supports the same formats as the TO_DATE function. See Template patterns for date/time formatting.
If you specify invalid format strings, the COPY operation returns an error.
-
NULL [AS]
- The string representing a null value. The default is an empty string (
''
). You can specify a null value as any ASCII value in the rangeE'\000'
toE'\177'
inclusive. You cannot use the same character for both the DELIMITER and NULL options. For details, see Delimited data.
Input options
The following options are available for specifying source data:
LOCAL
- Loads data files (up to 4,294,967,295 files) on a client system, rather than on a cluster host. LOCAL can qualify the STDIN and path-to-data arguments. For details, see COPY LOCAL.
Restrictions: Invalid for CREATE EXTERNAL TABLE AS COPY
STDIN
- Reads from the client a standard input instead of a file. STDIN takes one input source only. To load multiple input sources, use path-to-data.
User must have INSERT privileges on the table and USAGE privileges on its schema.
Restrictions: Invalid for CREATE EXTERNAL TABLE AS COPY
path-to-data
- Specifies the absolute path of the file (or files) containing the data, which can be from multiple input sources.
-
If the file is stored in HDFS,
path-to-data
is a URI in thewebhdfs
scheme, typically[[s]web]hdfs://[
nameservice
]/
path
. See HDFS file system. -
If the file is stored in an S3 bucket,
path-to-data
is a URI in the formats3://
bucket
/
path
. See S3 object store. -
If the file is stored in Google Cloud Storage,
path-to-data
is a URI in the formatgs://
bucket
/
path
. See Google Cloud Storage (GCS) object store. -
If the file is stored in Azure Blob Storage,
path-to-data
is a URI in the formatazb://
account
/
container
/
path
. See Azure Blob Storage object store. -
If the file is on the local Linux file system or an NFS mount,
path-to-data
is a local absolute file path.
path-to-data
can optionally contain wildcards to match more than one file. The file or files must be accessible to the local client or the host on which the COPY statement runs. COPY skips empty files in the file list. A file list that includes directories causes the query to fail. See Specifying where to load data from. The supported patterns for wildcards are specified in the Linux Manual Page for Glob (7), and for ADO.net platforms, through the .NET Directory.getFiles method.You can use variables to construct the pathname as described in Using load scripts.
If
path-to-data
resolves to a storage location on a local file system, and the user invoking COPY is not a superuser, the following requirements apply:-
The storage location was created with CREATE LOCATION...USAGE USER.
-
The user must already have READ access to the file storage location.
Further, if a user has privileges but is not a superuser, and invokes COPY from that storage location, Vertica ensures that symbolic links do not result in unauthorized access.
-
PARTITION COLUMNS
column
[,...]
- Columns whose values are specified in the directory structure and not in the data itself.
- For Hive-style partitioning, paths contain directory names of the form
form
colname
=
value
, and COPY parses values for the specified partition columns. If a value is missing or cannot be coerced to the column data type, the source path is rejected. - For other partition schemes, column expressions specify how to extract values using the CURRENT_LOAD_SOURCE function. If the expression cannot be evaluated, the source path is rejected.
For more information, see Partitioned data.
- For Hive-style partitioning, paths contain directory names of the form
form
ON
nodename
- Specifies the node on which the data to copy resides and the node that should parse the load file. If you omit
nodename
, the location of the input file defaults to the initiator node. Usenodename
to copy and parse a load file from a node other than the COPY initiator node.Note
nodename
is invalid with STDIN and LOCAL. ON (
nodeset
)
- Specifies a set of nodes on which to perform the load. The same data must be available for load on all named nodes.
nodeset
is a comma-separated list of node names in parentheses. For example:=> COPY t FROM 'file1.txt' ON (v_vmart_node0001, v_vmart_node0002);
Vertica apportions the load among all of the specified nodes. If you also specify ERROR TOLERANCE or REJECTMAX, Vertica instead chooses a single node on which to perform the load.
If the data is available on all nodes, you usually use ON ANY NODE, which is the default for loads from HDFS and cloud object stores. However, you can use ON
nodeset
to do manual load-balancing among concurrent loads. -
ON ANY NODE
- Specifies that the data to load is available on all nodes, so COPY opens the path and parses it from any node in the cluster. For an Eon Mode database, COPY uses nodes within the same subcluster as the initiator.
Caution
The data must be the same on all nodes. If the data differs on two nodes, an incorrect or incomplete result is returned, with no error or warning.Vertica attempts to apportion the load among several nodes if a file is large enough to benefit from apportioning. It chooses a single node if ERROR TOLERANCE or REJECTMAX is specified.
You can use a wildcard or glob (such as
*.dat
) to load multiple input files, combined with the ON ANY NODE clause. If you use a glob, COPY distributes the list of files to all cluster nodes and spreads the workload.ON ANY NODE is invalid with STDIN and LOCAL. STDIN can only use the client host, and LOCAL indicates a client node.
ON ANY NODE is the default for loads from all paths other than Linux (HDFS and cloud object stores).
ON EACH NODE
- Loads data from the specified path on each node. Use this option when the path exists on all nodes but the data files it contains are different on each node. If the path is not valid on all nodes, COPY loads the valid paths and produces a warning. If the path is a shared location, COPY loads it only once as for ON ANY NODE.
-
compression
- The input compression type, one of the following:
-
UNCOMPRESSED (default)
-
BZIP
-
GZIP
-
LZO
-
ZSTD
Input files can be of any format. If you use wildcards, all qualifying input files must be in the same format. To load different file formats, specify the format types specifically.
The following requirements and restrictions apply:
-
When using concatenated BZIP or GZIP files, verify that all source files terminate with a record terminator before concatenating them.
-
Concatenated BZIP and GZIP files are not supported for NATIVE (binary) and NATIVE VARCHAR formats.
-
LZO files are assumed to be compressed with
lzop
. Vertica supports the following lzop arguments:-
--no-checksum
/-F
-
--crc32
-
--adler32
-
--no-name
/-n
-
--name
/-N
-
--no-mode
-
--no-time
-
--fast
-
--best
-
Numbered compression levels
-
-
BZIP, GZIP, ZSTD, and LZO compression cannot be used with ORC format.
-
VERTICA
- See COPY FROM VERTICA.
[WITH]
UDL-clause
[...]
- Specifies one or more user-defined load functions—one source, and optionally one or more filters and one parser, as follows:
SOURCE source( [arg=value[,...] ] [ FILTER filter( [arg=value[,...] ] ) ]... [ PARSER parser( [arg=value[,...] ] ) ]
To use a flex table parser for column tables, use the PARSER parameter followed by a flex table parser argument. For supported flex table parsers, see Bulk loading data into flex tables.
Handling options
The following options control how COPY handles different contingencies:
ABORT ON ERROR
- Specifies that COPY stops if any row is rejected. The statement is rolled back and no data is loaded.
COLSIZES (
integer
[,...])
- Specifies column widths when loading fixed-width data. COPY requires that you specify COLSIZES when using the FIXEDWIDTH parser. COLSIZES and the list of integers must correspond to the columns listed in the table column list. For details, see Fixed-width format data.
ERROR TOLERANCE
- Specifies that COPY treats each source during execution independently when loading data. The statement is not rolled back if a single source is invalid. The invalid source is skipped and the load continues.
Using this parameter disables apportioned load.
Restrictions: Invalid for ORC or Parquet data
EXCEPTIONS { '
path
' [ ON
nodename
] [,...]
- Exceptions, used in combination with
REJECTED DATA
, describe why each rejected row was rejected. Each exception row corresponds to a rejection row.This option specifies the file name or absolute path of the output file for exceptions. If the file already exists, it is overwritten.
Files are written on the node or nodes executing the load or, if
ON
is used, on the designated node. You cannot specify a path on a shared file system or qualify the path withON ANY NODE
. For more information, see Saving load exceptions (EXCEPTIONS).To collect exceptions together with the rejected rows, use
REJECTED DATA AS TABLE
. You cannot useEXCEPTIONS
withREJECTED DATA AS TABLE
.If you use this option with COPY...ON ANY NODE, you must still specify the individual nodes for the exception files, as in the following example:
EXCEPTIONS '/home/ex01.txt' on v_db_node0001,'/home/ex02.txt' on v_db_node0002,'/home/ex03.txt' on v_db_node0003
If
path
resolves to a storage location, non-superusers have the following requirements:-
The storage location must be created with the USER option (see CREATE LOCATION).
-
The user must have READ access to the storage location where the files exist, as described in GRANT (storage location).
-
REJECTED DATA { '
path
' [ ON
nodename
] [,...] | AS TABLE
reject-table
- Specifies where to write each row that failed to load. If this option is specified, records that failed due to parsing errors are always written. Records that failed due to an error during a transformation are written only if the CopyFaultTolerantExpressions configuration parameter is set.
Vertica can write rejected data to the local file system or to a table:
-
Local file system: Vertica writes rejected row data to the specified path on the node or nodes executing the load or, if
ON
is used, on the designated node. You cannot specify a path on a shared file system or qualify the path withON ANY NODE
. To make rejected data available on all nodes, write rejections to a table.The value of
path
can be a directory or a file prefix. If there are multiple load sources,path
is always treated as a directory. If there are not multiple load sources butpath
ends with '/
', or if a directory of that name already exists, it is also treated as a directory. Otherwise,path
is treated as a file prefix.If the output file already exists, it is overwritten.
When
REJECTED DATA
with a path is used withLOCAL
, the output is written to the client. -
Table: Vertica writes rejected row data and exceptions to a table. You cannot use this option with
EXCEPTIONS
.Writing rejections to a table is generally the better option, because they can then be queried from any node. If rejected data needs to be available from outside of Vertica, you can write it to a table and then export that table to a shared file system.
For more information about both options, see Handling messy data.
-
REJECTMAX
integer
- The maximum number of logical records that can be rejected before a load fails. For details, see Handling messy data.
REJECTMAX disables apportioned load.
SKIP
integer
- The number of records to skip in a load file. For example, you can use the SKIP option to omit table header information.
Restrictions: Invalid for ORC or Parquet data
STREAM NAME
- Supplies a COPY load stream identifier. Using a stream name helps to quickly identify a particular load. The STREAM NAME value that you supply in the load statement appears in the STREAM_NAME column of system tables LOAD_STREAMS and LOAD_SOURCES.
A valid stream name can contain any combination of alphanumeric or special characters up to 128 bytes in length.
For example:
=> COPY mytable FROM myfile DELIMITER '|' STREAM NAME 'My stream name';
-
WITH parser
- Specifies the parser to use when bulk loading columnar tables, one of the following:
By default, COPY uses the DELIMITER parser for UTF-8 format, delimited text input data. You do not specify the DELIMITER parser directly; absence of a specific parser indicates the default.
To use a flex table parser for column tables, use the PARSER parameter followed by a flex table parser argument. For supported flex table parsers, see Bulk loading data into flex tables.
When loading into flex tables, you must use a compatible parser. For supported flex table parsers, see Bulk loading data into flex tables.
COPY LOCAL does not support the NATIVE, NATIVE VARCHAR, ORC, and PARQUET parsers.
For parser support for complex data types, see the documentation of the specific parser.
For parser details, see Data formats in Data load.
-
NO COMMIT
- Prevents the COPY statement from committing its transaction automatically when it finishes copying data. This option must be the last COPY statement parameter. For details, see Using transactions to stage a load.
This option is ignored by CREATE EXTERNAL TABLE AS COPY.
Parser-specific options
The following options apply only when using specific parsers.
DELIMITED parser
DELIMITER
Indicates the single ASCII character used to separate columns within each record of a file. You can use any ASCII value in the range E'\000' to E'\177', inclusive. You cannot use the same character for both the DELIMITER and NULL parameters. For more information, see Delimited data.
Default: Vertical bar ('|').
ENCLOSED [BY]
Sets the quote character within which to enclose data, allowing delimiter characters to be embedded in string values. You can choose any ASCII value in the range E'\001' to E'\177' inclusive (any ASCII character except NULL: E'\000'). By default, ENCLOSED BY has no value, meaning data is not enclosed by any sort of quote character.
ESCAPE [AS]
Sets the escape character. Once set, the character following the escape character is interpreted literally, rather than as a special character. You can define an escape character using any ASCII value in the range E'\001' to E'\177', inclusive (any ASCII character except NULL: E'\000').
The COPY statement does not interpret the data it reads in as String literals. It also does not follow the same escape rules as other SQL statements (including the COPY parameters). When reading data, COPY interprets only the characters defined by these options as special values:
-
ESCAPE [AS]
-
DELIMITER
-
ENCLOSED [BY]
-
RECORD TERMINATOR
-
All COLLECTION options
Default: Backslash ('\').
-
NO ESCAPE
Eliminates escape-character handling. Use this option if you do not need any escape character and you want to prevent characters in your data from being interpreted as escape sequences.
RECORD TERMINATOR
- Specifies the literal character string indicating the end of a data file record. For more information about using this parameter, see Delimited data.
TRAILING NULLCOLS
- Specifies that if Vertica encounters a record with insufficient data to match the columns in the table column list, COPY inserts the missing columns with NULL values. For other information and examples, see Fixed-width format data.
COLLECTIONDELIMITER
For columns of collection types, indicates the single ASCII character used to separate elements within each collection. You can use any ASCII value in the range E'\000' to E'\177', inclusive. No COLLECTION option may have the same value as any other COLLECTION option. For more information, see Delimited data.
Default: Comma (',').
COLLECTIONOPEN
,COLLECTIONCLOSE
For columns of collection types, these options indicate the characters that mark the beginning and end of the collection. It is an error to use these characters elsewhere within the list of elements without escaping them. No COLLECTION option may have the same value as any other COLLECTION option.
Default: Square brackets ('[' and ']').
COLLECTIONNULLELEMENT
The string representing a null element value in a collection. You can specify a null value as any ASCII value in the range E'\001' to E'\177' inclusive (any ASCII value except NULL: E'\000'). No COLLECTION option may have the same value as any other COLLECTION option. For more information, see Delimited data.
Default: 'null'
COLLECTIONENCLOSE
For columns of collection types, sets the quote character within which to enclose individual elements, allowing delimiter characters to be embedded in string values. You can choose any ASCII value in the range E'\001' to E'\177' inclusive (any ASCII character except NULL: E'\000').
No COLLECTION option may have the same value as any other COLLECTION option.
Default: double quote ('"')
FIXEDWIDTH parser
SKIP BYTES
integer
- The total number of bytes in a record to skip.
TRIM
- Trims the number of bytes you specify from a column. This option is only available when loading fixed-width data. You can set TRIM at the table level for a column, or as part of the COLUMN OPTION parameter.