This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

COPY

COPY; Load data;.

1: Examples
2: Parameters
3: Parsers

3.1: DELIMITED
3.2: FAVROPARSER
3.3: FCEFPARSER
3.4: FCSVPARSER
3.5: FDELIMITEDPAIRPARSER
3.6: FDELIMITEDPARSER
3.7: FJSONPARSER
3.8: FREGEXPARSER
3.9: ORC
3.10: PARQUET

4: Restrictions

COPY bulk-loads data into a Vertica database. By default, COPY automatically commits itself and any current transaction except when loading temporary tables. If COPY is terminated or interrupted Vertica rolls it back.

COPY reads data as UTF-8 encoding.

For information on loading one or more files or pipes on a cluster host or on a client system, see COPY LOCAL.

Syntax

COPY [ /*+ LABEL (label-string)*/ ] [[database.]schema-name.]target-table
   [ ( { column-as-expression | column }
       [ DELIMITER [ AS ] 'char' ]
       [ ENCLOSED [ BY ] 'char' ]
       [ ENFORCELENGTH ]
       [ ESCAPE [ AS ] 'char' | NO ESCAPE ]
       [ FILLER datatype]
       [ FORMAT 'format' ]
       [ NULL [ AS ] 'string' ]
       [ TRIM 'byte' ]
       [,...] ) ]
   [ COLUMN OPTION (column
       [ DELIMITER [ AS ] 'char' ]
       [ ENCLOSED [ BY ] 'char' ]
       [ ENFORCELENGTH ]
       [ ESCAPE [ AS ] 'char' | NO ESCAPE ]
       [ FORMAT 'format' ]
       [ NULL [ AS ] 'string' ]
       [ TRIM 'byte' ]
     [,...] ) ]
FROM {
   [ LOCAL ] STDIN [ compression ]
   | { 'path-to-data'
       [ ON { nodename | (nodeset) | ANY NODE | EACH NODE } ] [ compression ] }[,...]
     [ PARTITION COLUMNS column[,...] ]
   | LOCAL 'path-to-data' [ compression ] [,...]
   | VERTICA source-database.[source-schema.]source-table[( source-column[,...] ) ]
  }
  [ NATIVE
    | FIXEDWIDTH COLSIZES {( integer )[,...]}
    | NATIVE VARCHAR
    | ORC
    | PARQUET
  ]
  | [ WITH ] UDL-clause[...]
}
   [ ABORT ON ERROR ]
   [ DELIMITER [ AS ] 'char' ]
   [ ENCLOSED [ BY ] 'char'
   [ ENFORCELENGTH ]
   [ ERROR TOLERANCE ]
   [ ESCAPE [ AS ] 'char' | NO ESCAPE ]
   [ EXCEPTIONS 'path' [ ON nodename] [,...]
   [ NULL [ AS ] 'string' ]
   [ RECORD TERMINATOR 'string' ]
   [ REJECTED DATA {'path' [ ON nodename] [,...] | AS TABLE reject-table} ]
   [ REJECTMAX integer ]
   [ SKIP integer ]
   [ SKIP BYTES integer ]
   [ STREAM NAME  'streamName']
   [ TRAILING NULLCOLS ]
   [ TRIM 'byte' ]
   [ [ WITH ] PARSER parser ([ arg=value[,...] ]) ] ]
   [ NO COMMIT ]

Parameters

See Parameters.

Restrictions

See Restrictions.

Privileges

Superusers have full COPY privileges. The following requirements apply to non-superusers:

INSERT privilege on table
USAGE privilege on schema
USER-accessible storage location
Applicable READ or WRITE privileges granted to the storage location where files are read or written

COPY can specify a path to store rejected data and exceptions. If the path resolves to a storage location, the following privileges apply to non-superusers:

The storage location was created with the USER option (see CREATE LOCATION).
The user must has READ access to the storage location, as described in GRANT (storage location)

1 - Examples

For additional COPY examples, see the reference pages for specific parsers, including: DELIMITED (Parser), ORC (Parser), PARQUET (Parser), FJSONPARSER (Parser), and FAVROPARSER (Parser).

For additional COPY examples, see the reference pages for specific parsers, including: DELIMITED, ORC, PARQUET, FJSONPARSER, and FAVROPARSER.

Specifying string options

Use COPY with FORMAT, DELIMITER, NULL, and ENCLOSED BY options:

=> COPY public.customer_dimension (customer_since FORMAT 'YYYY')
   FROM STDIN
   DELIMITER ','
   NULL AS 'null'
   ENCLOSED BY '"';

Use COPY with DELIMITER and NULL options. This example sets and references a vsql variable for the input file:

=> \set input_file ../myCopyFromLocal/large_table.gzip
=> COPY store.store_dimension
   FROM :input_file
   DELIMITER '|'
   NULL ''
   RECORD TERMINATOR E'\f';

Including multiple source files

Create a table and then copy multiple source files to it:

=> CREATE TABLE sampletab (a int);
CREATE TABLE

=> COPY sampletab FROM '/home/dbadmin/one.dat', 'home/dbadmin/two.dat';
 Rows Loaded
-------------
           2
(1 row)

Use wildcards to indicate a group of files:

=> COPY myTable FROM 'webhdfs:///mydirectory/ofmanyfiles/*.dat';

Wildcards can include regular expressions:

=> COPY myTable FROM 'webhdfs:///mydirectory/*_[0-9]';

Specify multiple paths in a single COPY statement:

=> COPY myTable FROM 'webhdfs:///data/sales/01/*.dat', 'webhdfs:///data/sales/02/*.dat',
    'webhdfs:///data/sales/historical.dat';

Distributing a load

Load data that is shared across all nodes. Vertica distributes the load across all nodes, if possible:

=> COPY sampletab FROM '/data/file.dat' ON ANY NODE;

Load data from two files. Because the first load file does not specify nodes (or ON ANY NODE), the initiator performs the load. Loading the second file is distributed across all nodes:

=> COPY sampletab FROM '/data/file1.dat', '/data/file2.dat' ON ANY NODE;

Specify different nodes for each load file:

=> COPY sampletab FROM '/data/file1.dat' ON (v_vmart_node0001, v_vmart_node0002),
    '/data/file2.dat' ON (v_vmart_node0003, v_vmart_node0004);

Loading data from shared storage

To load data from shared storage, use URLs in the corresponding schemes:

HDFS: [[s]web]hdfs://[nameservice]/path
S3: s3://bucket/path
Google Cloud: gs://bucket/path
Azure: azb://account/container/path

Note

Loads from HDFS, S3, GCS, and Azure default to ON ANY NODE; you do not need to specify it.

Load a file stored in HDFS using the default name node or name service:

=> COPY t FROM 'webhdfs:///opt/data/file1.dat';

Load data from a particular HDFS name service (testNS). You specify a name service if your database is configured to read from more than one HDFS cluster:

=> COPY t FROM 'webhdfs://testNS/opt/data/file2.csv';

Load data from an S3 bucket:

=> COPY t FROM 's3://AWS_DataLake/*' ORC;

Partitioned data

Data files can be partitioned using the directory structure, such as:

path/created=2016-11-01/region=northeast/*
path/created=2016-11-01/region=central/*
path/created=2016-11-01/region=southeast/*
path/created=2016-11-01/...
path/created=2016-11-02/region=northeast/*
path/created=2016-11-02/region=central/*
path/created=2016-11-02/region=southeast/*
path/created=2016-11-02/...
path/created=2016-11-03/...
path/...

Load partition columns using the PARTITION COLUMNS option:

=> CREATE EXTERNAL TABLE records (id int, name varchar(50), created date, region varchar(50))
   AS COPY FROM 'webhdfs:///path/*/*/*'
   PARTITION COLUMNS created, region;

Using filler columns

In the following example, the table has columns for first name, last name, and full name, but the data being loaded contains columns for first, middle, and last names. The COPY statement reads all of the source data but only loads the source columns for first and last names. It constructs the data for the full name by concatenating each of the source data columns, including the middle name. The middle name is read as a FILLER column so it can be used in the concatenation, but is ignored otherwise. (There is no table column for middle name.)

=> CREATE TABLE names(first VARCHAR(20), last VARCHAR(20), full VARCHAR(60));
CREATE TABLE
=> COPY names(first,
              middle FILLER VARCHAR(20),
              last,
              full AS first||' '||middle||' '||last)
      FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> Marc|Gregory|Smith
>> Sue|Lucia|Temp
>> Jon|Pete|Hamilton
>> \.
=> SELECT * from names;
 first |   last   |        full
-------+----------+--------------------
 Jon   | Hamilton | Jon Pete Hamilton
 Marc  | Smith    | Marc Gregory Smith
 Sue   | Temp     | Sue Lucia Temp
(3 rows)

Loading data into a flex table

Create a Flex table and copy JSON data into it using FJSONPARSER:

=> CREATE FLEX TABLE darkdata();
CREATE TABLE
=> COPY tweets FROM '/myTest/Flexible/DATA/tweets_12.json' PARSER FJSONPARSER();
 Rows Loaded
-------------
          12
(1 row)

Using named pipes

COPY supports named pipes that follow the same naming conventions as file names on the given file system. Permissions are open, write, and close.

Create named pipe, pipe1, and set two vsql variables:

=> \! mkfifo  pipe1
=> \set dir `pwd`/
=> \set file '''':dir'pipe1'''

Copy an uncompressed file from the named pipe:

=> \! cat pf1.dat > pipe1 &
=> COPY large_tbl FROM :file delimiter '|';
=> SELECT * FROM large_tbl;
=> COMMIT;

Loading compressed data

Copy a GZIP file from a named pipe and uncompress it:

=> \! gzip pf1.dat
=> \! cat pf1.dat.gz > pipe1 &
=> COPY large_tbl FROM :file ON site01 GZIP delimiter '|';
=> SELECT * FROM large_tbl;
=> COMMIT;
=> \!gunzip pf1.dat.gz

2 - Parameters

COPY parameters and their descriptions are divided into the following sections:.

COPY parameters and their descriptions are divided into the following sections:

Target Options
Column Options
Input Options
Handling Options
Parser-Specific Options

Target options

The following options apply to the target tables and their columns:

LABEL

Assigns a label to a statement to identify it for profiling and debugging.

[database.]schema

Database and schema. The default schema is public. If you specify a database, it must be the current database.

COPY ignores schema-name when used in CREATE EXTERNAL TABLE or CREATE FLEX EXTERNAL TABLE statements.

target-table

The target columnar or flexible table for loading new data. Vertica loads the data into all projections that include columns from the schema table.

column-as-expression

An expression used to compute values for the target column, which must not be of a complex type. For example:

=> COPY t(year AS TO_CHAR(k, 'YYYY')) FROM 'myfile.dat'

Use this option to transform data when it is loaded into the target database.

For details, see Transforming data during loads.

column

Restricts the load to one or more specified columns in the table. If you omit specifying columns, COPY loads all columns by default.

Table columns that you omit from the column list are assigned their DEFAULT or SET USING values, if any; otherwise, COPY inserts NULL.

If you leave the column parameter blank to load all columns in the table, you can use the optional parameter COLUMN OPTION to specify parsing options for specific columns.

The data file must contain the same number of columns as the COPY command's column list.

COLUMN OPTION

Specifies load metadata for one or more columns declared in the table column list. For example, you can specify that a column has its own DELIMITER, ENCLOSED BY, or NULL AS expression, and so on. You do not have to specify every column name explicitly in the COLUMN OPTION list, but each column you specify must correspond to a column in the table column list.

Column options

Depending on how they are specified, the following COPY options can qualify specific columns or all columns. Some parser-specific options can also apply to either specific columns or all columns. See Global and column-specific options For details about these two modes.

ENFORCELENGTH

If specified, COPY rejects data rows of type CHAR, VARCHAR, BINARY, and VARBINARY, or elements of those types in collections, if they are larger than the declared size.

By default, COPY truncates offending rows of these data types and elements of these types in collections, but does not reject the rows. For more details, see Handling Messy Data.

If a collection does not fit with all of its elements, COPY rejects the row without truncating. It does not reduce the number of elements. This can happen if each element is individually within limits but the number of elements causes the collection to exceed the maximum size for the column.

FILLER datatype

Reads but does not copy the data of an input column. Use filler columns to ignore input columns that do not have columns in the table. You can also use filler columns to transform data (see Examples and Transforming data during loads). Filler columns cannot be of complex types.

FORMAT 'format'

Input format, one of the following:

octal
hex
bitstream

See Binary (native) data to learn more about these formats.

When loading date/time columns, using FORMAT significantly improves load performance. COPY supports the same formats as the TO_DATE function. See Template patterns for date/time formatting.

If you specify invalid format strings, the COPY operation returns an error.

NULL [AS]

The string representing a null value. The default is an empty string (''). You can specify a null value as any ASCII value in the range E'\000' to E'\177' inclusive. You cannot use the same character for both the DELIMITER and NULL options. For details, see Delimited data.

Input options

The following options are available for specifying source data:

LOCAL

Loads data files (up to 65,535 files) on a client system, rather than on a cluster host. LOCAL can qualify the STDIN and [path-to-data](#pathToData) parameters. For details, see COPY LOCAL.

Restrictions: Invalid for CREATE EXTERNAL TABLE AS COPY

STDIN

Reads from the client a standard input instead of a file. STDIN takes one input source only. To load multiple input sources, use [path-to-data](#pathToData).

User must have INSERT privileges on the table and USAGE privileges on its schema.

Restrictions: Invalid for CREATE EXTERNAL TABLE AS COPY

path-to-data

Specifies the absolute path of the file (or files) containing the data, which can be from multiple input sources.

If the file is stored in HDFS, path-to-data is a URI in the webhdfs scheme, typically [[s]web]hdfs://[nameservice]/path. See HDFS file system.
If the file is stored in an S3 bucket, path-to-data is a URI in the format s3://bucket/path. See S3 object store.
If the file is stored in Google Cloud Storage, path-to-data is a URI in the format gs://bucket/path. See Google Cloud Storage (GCS) object store.
If the file is stored in Azure Blob Storage, path-to-data is a URI in the format azb://account/container/path. See Azure Blob Storage object store.
If the file is on the local Linux file system or an NFS mount, path-to-data is a local absolute file path.

path-to-data can optionally contain wildcards to match more than one file. The file or files must be accessible to the local client or the host on which the COPY statement runs. COPY skips empty files in the file list. A file list that includes directories causes the query to fail. See Specifying where to load data from. The supported patterns for wildcards are specified in the Linux Manual Page for Glob (7), and for ADO.net platforms, through the .NET Directory.getFiles method.

You can use variables to construct the pathname as described in Using load scripts.

If path-to-data resolves to a storage location on a local file system, and the user invoking COPY is not a superuser, the following requirements apply:

The storage location was created with CREATE LOCATION...USAGE USER.
The user must already have READ access to the file storage location.

Further, if a user has privileges but is not a superuser, and invokes COPY from that storage location, Vertica ensures that symbolic links do not result in unauthorized access.

PARTITION COLUMNS column[,...]

Columns whose values are specified in the directory structure and not in the data itself. This option is valid if the path contains directory names of the form colname=value, such as:

/data/created=2016-01-01/*.dat
/data/created=2016-01-02/*.dat
/data/created=2016-01-03/*.dat

The value is parsed from the value part of the directory name. If it cannot be coerced to the column data type, it is rejected. If the value is missing, COPY loads it as null.

The following example loads these dates into a table column, along with other columns in the data files:

=> CREATE EXTERNAL TABLE records
    (id int, name varchar(50), created date)
   AS COPY FROM 'webhdfs:///path/*/*'
   PARTITION COLUMNS created;

For more information, see Partitioned file paths.

ON nodename

Specifies the node on which the data to copy resides and the node that should parse the load file. If you omit nodename, the location of the input file defaults to the initiator node. Use nodename to copy and parse a load file from a node other than the COPY initiator node.

Note

nodename is invalid with STDIN and LOCAL.

ON (nodeset)

Specifies a set of nodes on which to perform the load. The same data must be available for load on all named nodes. nodeset is a comma-separated list of node names in parentheses. For example:

=> COPY t FROM 'file1.txt' ON (v_vmart_node0001, v_vmart_node0002);

Vertica apportions the load among all of the specified nodes. If you also specify ERROR TOLERANCE or REJECTMAX, Vertica instead chooses a single node on which to perform the load.

If the data is available on all nodes, you usually use ON ANY NODE, which is the default for loads from HDFS and cloud object stores. However, you can use ON nodeset to do manual load-balancing among concurrent loads.

ON ANY NODE

Specifies that the data to load is available on all nodes, so COPY opens the path and parses it from any node in the cluster. For an Eon Mode database, COPY uses nodes within the same subcluster as the initiator.

Caution

The data must be the same on all nodes. If the data differs on two nodes, an incorrect or incomplete result is returned, with no error or warning.

Vertica attempts to apportion the load among several nodes if a file is large enough to benefit from apportioning. It chooses a single node if ERROR TOLERANCE or REJECTMAX is specified.

You can use a wildcard or glob (such as *.dat) to load multiple input files, combined with the ON ANY NODE clause. If you use a glob, COPY distributes the list of files to all cluster nodes and spreads the workload.

ON ANY NODE is invalid with STDIN and LOCAL. STDIN can only use the client host, and LOCAL indicates a client node.

ON ANY NODE is the default for loads from all paths other than Linux (HDFS and cloud object stores).

ON EACH NODE

Loads data from the specified path on each node. Use this option when the path exists on all nodes but the data files it contains are different on each node. If the path is not valid on all nodes, COPY loads the valid paths and produces a warning. If the path is a shared location, COPY loads it only once as for ON ANY NODE.

compression

The input compression type, one of the following:

UNCOMPRESSED (default)
BZIP
GZIP
LZO
ZSTD

Input files can be of any format. If you use wildcards, all qualifying input files must be in the same format. To load different file formats, specify the format types specifically.

The following requirements and restrictions apply:

When using concatenated BZIP or GZIP files, verify that all source files terminate with a record terminator before concatenating them.
Concatenated BZIP and GZIP files are not supported for NATIVE (binary) and NATIVE VARCHAR formats.
LZO files are assumed to be compressed with lzop. Vertica supports the following lzop arguments:
- --no-checksum / -F
- --crc32
- --adler32
- --no-name / -n
- --name / -N
- --no-mode
- --no-time
- --fast
- --best
- Numbered compression levels
BZIP, GZIP, ZSTD, and LZO compression cannot be used with ORC format.

VERTICA

See COPY FROM VERTICA.

[WITH] UDL-clause[...]

Specifies one or more user-defined load functions—one source, and optionally one or more filters and one parser, as follows:

SOURCE source( [arg=value[,...] ]
[ FILTER filter( [arg=value[,...] ] ) ]...
[ PARSER parser( [arg=value[,...] ] ) ]

To use a flex table parser for column tables, use the PARSER parameter followed by a flex table parser argument. For supported flex table parsers, see Bulk loading data into flex tables.

Handling options

The following options control how COPY handles different contingencies:

ABORT ON ERROR

Specifies that COPY stops if any row is rejected. The statement is rolled back and no data is loaded.

COLSIZES (integer[,...])

Specifies column widths when loading fixed-width data. COPY requires that you specify COLSIZES when using the FIXEDWIDTH parser. COLSIZES and the list of integers must correspond to the columns listed in the table column list. For details, see Fixed-width format data.

ERROR TOLERANCE

Specifies that COPY treats each source during execution independently when loading data. The statement is not rolled back if a single source is invalid. The invalid source is skipped and the load continues.

Using this parameter disables apportioned load.

Restrictions: Invalid for ORC or Parquet data

EXCEPTIONS

Specifies the file name or absolute path of the file in which to write exceptions, as follows:

EXCEPTIONS 'path' [ ON nodename[,...]]

Exceptions describe why each rejected row was rejected. Each exception describes the corresponding record in the file specified by the REJECTED DATA option.

Files are written on the node or nodes executing the load. If the file already exists, it is overwritten.

To collect all exceptions in one place, use the REJECTED DATA AS TABLE clause and exceptions are automatically listed in the table's rejected_reason column.

Note

EXCEPTIONS is incompatible with REJECTED DATA AS TABLE.

The ON nodename clause moves existing exceptions files on nodename to the indicated path on the same node. For details, see Saving load exceptions (EXCEPTIONS).

If you use this parameter with COPY...ON ANY NODE, you must still specify the individual nodes for the exception files, as in the following example:

EXCEPTIONS '/home/ex01.txt' on v_db_node0001,'/home/ex02.txt'
on v_db_node0002,'/home/ex03.txt' on v_db_node0003

If path resolves to a storage location, the following privileges apply to non-superusers:

The storage location must be created with the USER option (see CREATE LOCATION).
The user must have READ access to the storage location where the files exist, as described in GRANT (storage location).

REJECTED DATA

Specifies where to write each row that failed to load. If this parameter is specified, records that failed due to parsing errors are always written. Records that failed due to an error during a transformation are written only if configuration parameter CopyFaultTolerantExpressions is set.

The syntax for this parameter is:

REJECTED DATA
{ 'path' [ ON nodename ] [,...] | AS TABLE reject-table }

Vertica can write rejected data to the specified path or to a table:

'path' [ON nodename]: Copies the rejected row data to the specified path on the node executing the load. If qualified by ON nodename, Vertica moves existing rejected data files on nodename to path on the same node.

The value of path can be a directory or a file prefix. If there are multiple load sources, path is always treated as a directory. If there are not multiple load sources but path ends with '/', or if a directory of that name already exists, it is also treated as a directory. Otherwise, path is treated as a file prefix.

Files are written on the node or nodes executing the load. If the file already exists, it is overwritten.

When this parameter is used with LOCAL, the output is written to the client.

Note
Do not qualify path with ON ANY NODE. To collect all rejected data in one place regardless of how the load is distributed, use a table.
AS TABLE reject-table: Saves rejected rows to reject-table.

Note
REJECTED DATA AS TABLE is incompatible with EXCEPTIONS.

For details about both options, see Handling messy data.

REJECTMAX integer

The maximum number of logical records that can be rejected before a load fails. For details, see Handling messy data.

REJECTMAX disables apportioned load.

SKIP integer

The number of records to skip in a load file. For example, you can use the SKIP option to omit table header information.

Restrictions: Invalid for ORC or Parquet data

STREAM NAME

Supplies a COPY load stream identifier. Using a stream name helps to quickly identify a particular load. The STREAM NAME value that you supply in the load statement appears in the STREAM_NAME column of system tables LOAD_STREAMS and LOAD_SOURCES.

A valid stream name can contain any combination of alphanumeric or special characters up to 128 bytes in length.

For example:

=> COPY mytable FROM myfile 
   DELIMITER '|' STREAM NAME 'My stream name';

WITH parser

Specifies the parser to use when bulk loading columnar tables, one of the following:

NATIVE
NATIVE VARCHAR
FIXEDWIDTH
ORC
PARQUET

By default, COPY uses the DELIMITER parser for UTF-8 format, delimited text input data. You do not specify the DELIMITER parser directly; absence of a specific parser indicates the default.

To use a flex table parser for column tables, use the PARSER parameter followed by a flex table parser argument. For supported flex table parsers, see Bulk loading data into flex tables.

When loading into flex tables, you must use a compatible parser. For supported flex table parsers, see Bulk loading data into flex tables.

COPY LOCAL does not support the NATIVE, NATIVE VARCHAR, ORC, and PARQUET parsers.

For parser support for complex data types, see the documentation of the specific parser.

For parser details, see Data formats in Data load.

NO COMMIT

Prevents the COPY statement from committing its transaction automatically when it finishes copying data. This option must be the last COPY statement parameter.

For details, see Using transactions to stage a load.

Restrictions: Invalid for ORC or Parquet data, ignored by CREATE EXTERNAL TABLE AS COPY

Parser-specific options

The following options apply only when using specific parsers.

DELIMITED parser

DELIMITER

Indicates the single ASCII character used to separate columns within each record of a file. You can use any ASCII value in the range E'\000' to E'\177', inclusive. You cannot use the same character for both the DELIMITER and NULL parameters. For more information, see Delimited data.

Default: Vertical bar ('|').

ENCLOSED [BY]

Sets the quote character within which to enclose data, allowing delimiter characters to be embedded in string values. You can choose any ASCII value in the range E'\001' to E'\177' inclusive (any ASCII character except NULL: E'\000'). By default, ENCLOSED BY has no value, meaning data is not enclosed by any sort of quote character.

ESCAPE [AS]

Sets the escape character. Once set, the character following the escape character is interpreted literally, rather than as a special character. You can define an escape character using any ASCII value in the range E'\001' to E'\177', inclusive (any ASCII character except NULL: E'\000').

The COPY statement does not interpret the data it reads in as String literals. It also does not follow the same escape rules as other SQL statements (including the COPY parameters). When reading data, COPY interprets only the characters defined by these options as special values:

ESCAPE [AS]
DELIMITER
ENCLOSED [BY]
RECORD TERMINATOR
All COLLECTION options

Default: Backslash ('\').

NO ESCAPE

Eliminates escape-character handling. Use this option if you do not need any escape character and you want to prevent characters in your data from being interpreted as escape sequences.

RECORD TERMINATOR

Specifies the literal character string indicating the end of a data file record. For more information about using this parameter, see Delimited data.

TRAILING NULLCOLS

Specifies that if Vertica encounters a record with insufficient data to match the columns in the table column list, COPY inserts the missing columns with NULL values. For other information and examples, see Fixed-width format data.

COLLECTIONDELIMITER

For columns of collection types, indicates the single ASCII character used to separate elements within each collection. You can use any ASCII value in the range E'\000' to E'\177', inclusive. No COLLECTION option may have the same value as any other COLLECTION option. For more information, see Delimited data.

Default: Comma (',').

COLLECTIONOPEN, COLLECTIONCLOSE

For columns of collection types, these options indicate the characters that mark the beginning and end of the collection. It is an error to use these characters elsewhere within the list of elements without escaping them. No COLLECTION option may have the same value as any other COLLECTION option.

Default: Square brackets ('[' and ']').

COLLECTIONNULLELEMENT

The string representing a null element value in a collection. You can specify a null value as any ASCII value in the range E'\001' to E'\177' inclusive (any ASCII value except NULL: E'\000'). No COLLECTION option may have the same value as any other COLLECTION option. For more information, see Delimited data.

Default: 'null'

COLLECTIONENCLOSE

For columns of collection types, sets the quote character within which to enclose individual elements, allowing delimiter characters to be embedded in string values. You can choose any ASCII value in the range E'\001' to E'\177' inclusive (any ASCII character except NULL: E'\000').

No COLLECTION option may have the same value as any other COLLECTION option.

Default: double quote ('"')

FIXEDWIDTH parser

SKIP BYTES integer: The total number of bytes in a record to skip.
TRIM: Trims the number of bytes you specify from a column. This option is only available when loading fixed-width data. You can set TRIM at the table level for a column, or as part of the COLUMN OPTION parameter.

3 - Parsers

Vertica supports several parsers to load different types of data.

Vertica supports several parsers to load different types of data. Some parsers are for use only with flex tables, as noted.

3.1 - DELIMITED

Use the DELIMITED parser, which is the default, to load delimited text data using COPY.

Use the DELIMITED parser, which is the default, to load delimited text data using COPY. You can specify the delimiter, escape characters, how to handle null values, and other parameters.

The DELIMITED parser supports both apportioned load and cooperative parse.

COPY options

The following options are specific to this parser. See Parameters for other applicable options.

DELIMITER

Default: Vertical bar ('|').

ENCLOSED [BY]

ESCAPE [AS]

ESCAPE [AS]
DELIMITER
ENCLOSED [BY]
RECORD TERMINATOR
All COLLECTION options

Default: Backslash ('\').

NO ESCAPE

Eliminates escape-character handling. Use this option if you do not need any escape character and you want to prevent characters in your data from being interpreted as escape sequences.

RECORD TERMINATOR

Specifies the literal character string indicating the end of a data file record. For more information about using this parameter, see Delimited data.

TRAILING NULLCOLS

COLLECTIONDELIMITER

Default: Comma (',').

COLLECTIONOPEN, COLLECTIONCLOSE

Default: Square brackets ('[' and ']').

COLLECTIONNULLELEMENT

Default: 'null'

COLLECTIONENCLOSE

No COLLECTION option may have the same value as any other COLLECTION option.

Default: double quote ('"')

Data types

The DELIMITED parser supports reading one-dimensional collections (arrays or sets) of scalar types.

If the total size of an array exceeds the size defined by the target table, the parser rejects the row.

Examples

The following example shows the default behavior, in which the delimiter character is '|'

=> CREATE TABLE employees (id INT, name VARCHAR(50), department VARCHAR(50));
CREATE TABLE

=> COPY employees FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 42|Sheldon Cooper|Physics
>> 17|Howard Wolowitz|Astronomy
>> \.

=> SELECT * FROM employees;
 id |      name       |  department
----+-----------------+--------------
 17 | Howard Wolowitz | Astrophysics
 42 | Sheldon Cooper  | Physics
(2 rows)

The following example shows loading array values with the default options.

=> CREATE TABLE researchers (id INT, name VARCHAR, grants ARRAY[VARCHAR], values ARRAY[INT]);
CREATE TABLE

=> COPY researchers FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 42|Sheldon Cooper|[US-7376,DARPA-1567]|[65000,135000]
>> 17|Howard Wolowitz|[NASA-1683,NASA-7867,SPX-76]|[16700,85000,45000]
>> \.

=> SELECT * FROM researchers;
 id |      name       |               grants               |       values
----+-----------------+------------------------------------+---------------------
 17 | Howard Wolowitz | ["NASA-1683","NASA-7867","SPX-76"] | [16700,85000,45000]
 42 | Sheldon Cooper  | ["US-7376","DARPA-1567"]           | [65000,135000]
(2 rows)

In the following example, collections are enclosed in braces and delimited by periods, and the arrays contain null values.

=> COPY researchers FROM STDIN COLLECTIONOPEN '{' COLLECTIONCLOSE '}' COLLECTIONDELIMITER '.';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 19|Leonard|{"us-1672".null."darpa-1963"}|{16200.null.16700}
>> \.

=> SELECT * FROM researchers;
 id |      name       |               grants               |       values
----+-----------------+------------------------------------+---------------------
 17 | Howard Wolowitz | ["NASA-1683","NASA-7867","SPX-76"] | [16700,85000,45000]
 42 | Sheldon Cooper  | ["US-7376","DARPA-1567"]           | [65000,135000]
 19 | Leonard         | ["us-1672",null,"darpa-1963"]      | [16200,null,16700]
(3 rows)

3.2 - FAVROPARSER

Parses data from an Avro file.

Parses data from an Avro file. The following requirements apply:

Avro files must be encoded in the Avro binary serialization encoding format, described in the Apache Avro standard. The parser also supports Snappy and deflate compression.
FAVROPARSER does not support Avro files with separate schema files. The Avro file must include the schema.

You can load complex types in the Avro source (arrays, structs, or combinations) with strong typing or as flexible complex types. A flexible complex type is loaded into a VMap column, as in flex tables. To load complex types as VMap columns, specify a column type of LONG VARBINARY. To preserve the indexing in complex types, set flatten_maps to false.

This parser can notify you if it finds keys in the data that are not part of the table definition. See Unmatched Keys.

When loading into a flex table, Vertica loads all data into the __raw__ (VMap) column, including complex types found in the data.

This parser does not support apportioned load or cooperative parse.

Syntax

FAVROPARSER ( [parameter=value[,...]] )

Parameters

flatten_maps

Boolean, whether to flatten all Avro maps. Key names are concatenated with nested levels. This value is recursive and affects all data in the load.

This parameter applies only to flex tables or VMap columns and is ignored when loading strongly-typed complex types.

Default: true

flatten_arrays

Boolean, whether to flatten all Avro arrays. Key names are concatenated with nested levels. This value is recursive and affects all data in the load.

This parameter applies only to flex tables or VMap columns and is ignored when loading strongly-typed complex types.

Default: false

flatten_records

Boolean, whether to flatten all Avro records. Key names are concatenated with nested levels. This value is recursive and affects all data in the load.

This parameter applies only to flex tables or VMap columns and is ignored when loading strongly-typed complex types.

Default: true

reject_on_materialized_type_error

Boolean, whether to reject a data row that contains a materialized column value that cannot be coerced into a compatible data type. If the value is false and the type cannot be coerced, the parser sets the value in that column to NULL.

If the column is an array and the data to be loaded is too large, then false sets the column value to NULL and true rejects the row.

If the column is a strongly-typed complex type, as opposed to a flexible complex type, then a type mismatch anywhere in the complex type causes the entire column to be treated as a mismatch. The parser does not partially load complex types.

Default: false

suppress_warnings

String, which warnings to suppress:

unmatched_key (see Unmatched Keys)
true or t (suppress all warnings)
false or f (do not suppress warnings)

Default: false

Primitive data types

FAVROPARSER supports the following primitive data types, including as element types and field values in complex types.

AVRO Data Type	Vertica Data Type	Value
NULL	NULL value	No value
boolean	Boolean data type	A binary value
int	INTEGER	32-bit signed integer
long	INTEGER	64-bit signed integer
float	DOUBLE PRECISION (FLOAT) Synonymous with 64-bit IEEE FLOAT	Single precision (32-bit) IEEE 754 floating-point number
double	DOUBLE PRECISION (FLOAT)	Double precision (64-bit) IEEE 754 floating-point number
bytes	VARBINARY	Sequence of 8-bit unsigned bytes
string	VARCHAR	Unicode character sequence

Note

Vertica does not have an explicit 4-byte (32-bit integer) or smaller types. Instead, Vertica encoding and compression automatically eliminate the storage overhead of values that require less than 64 bits.

Avro logical types

FAVROPARSER supports the following Avro logical types. The target column must use a Vertica data type that supports the logical type. When you attempt to load data using an invalid logical type, the logical type is ignored and the underlying Avro type is used.

AVRO Logical Type	Base Avro Type	Supported Vertica Data Types
`decimal` 0 < `precision` ≤ 1024 0 ≤ `scale` ≤ `precision`	`bytes` or `fixed`	NUMERIC, Character Vertica rejects the value if: The Avro precision setting is greater than the precision setting for the target column. For fixed types, the precision value is greater than what is allowed by the `size` attribute. If the data type for the target column uses the default precision setting, the precision setting in the Avro schema overrides the default.
`date`	`integer`	DATE, Character
`time-micros`	`long`	TIME/TIMETZ, Character The time logical type does not provide a time zone value. For target columns that use the TIMETZ data type, Vertica uses UTC as the default.
`time-millis`	`int`
`timestamp-micros`	`long`	TIMESTAMP/TIMESTAMPTZ, TIME/TIMETZ For `timestamp-millis` only, the timezone is included and is represented as an offset to UTC. Additionally, the millisecond values are right-extended with padded zeros.
`timestamp-millis`	`long`
`duration`	`fixed`	INTERVAL, Character

Avro complex data types

The Avro format supports several complex data types. When loading into strongly-typed columns, you can use the ROW and ARRAY types to represent them. For example, Avro Record and Enums are structs (ROWs); see the Avro specification.

You can use ARRAY[ROW] to match an Avro map. You must name the ROW fields key and value. These are the names that the Avro format uses for those fields in the data, and the parser relies on field names to match data to table columns.

If the total size of an array exceeds the size defined by the target table, the parser sets the value to null.

When loading into flex tables or using flexible complex types, this parser handles Avro complex types as follows:

Record

The name of each field is used as a virtual column name. If flatten_records is true and several nesting levels are present, Vertica concatenates the record names to create the key name.

Map

The value of each map key is used as a virtual column name. If flatten_maps is true and several nesting levels are present, Vertica concatenates the key names to create the key name.

Enum

Vertica treats Avro Enums like records, with the name of the Enum as the key and the value as the value.

Array

Vertica treats Avro Arrays as key/value pairs. By default, the index of each element is the key. In the following example, product_detail is a Record with a field, product_category, that is an Array:

=> CREATE FLEX TABLE products;
CREATE TABLE

=> COPY products FROM :datafile WITH PARSER FAVROPARSER();
 Rows Loaded
-------------
           2
(1 row)

=> SELECT MAPTOSTRING(__raw__) FROM products ORDER BY __identity__;
                    maptostring
--------------------------------------------------------------------------------
 {
    "__name__": "Order",
    "customer_id": "111222",
    "order_details": {
        "0.__name__": "OrderDetail",
        "0.product_detail.__name__": "Product",
        "0.product_detail.price": "46.21",
        "0.product_detail.product_category": {
            "0": "electronics",
            "1": "printers",
            "2": "computers"
        },
        "0.product_detail.product_description": "hp printer X11ew description :\
P",
        "0.product_detail.product_hash": "\u0000\u0001\u0002\u0003\u0004",
        "0.product_detail.product_id": "999012",
        "0.product_detail.product_map.one": "1.1",
        "0.product_detail.product_map.two": "1.1",
        "0.product_detail.product_name": "hp printer X11ew",
        "0.product_detail.product_status": "ONLY_FEW_LEFT",
        "0.quantity": "3",
        "0.total": "354.34"
    },
    "order_id": "2389646",
    "total": "132.43"
}
...

If flatten_arrays is true and several nesting levels are present, Vertica concatenates the indices to create the key name.

=> COPY products FROM :datafile WITH PARSER FAVROPARSER(flatten_arrays=true);
 Rows Loaded
-------------
           2
(1 row)

=> SELECT MAPTOSTRING(__raw__) FROM products ORDER BY __identity__;
                    maptostring
--------------------------------------------------------------------------------

 {
    "__name__": "Order",
    "customer_id": "111222",
    "order_details.0.__name__": "OrderDetail",
    "order_details.0.product_detail.__name__": "Product",
    "order_details.0.product_detail.price": "46.21",
    "order_details.0.product_detail.product_category.0": "electronics",
    "order_details.0.product_detail.product_category.1": "printers",
    "order_details.0.product_detail.product_category.2": "computers",
    "order_details.0.product_detail.product_description": "hp printer X11ew des\
cription :P",
    "order_details.0.product_detail.product_hash": "\u0000\u0001\u0002\u0003\u0\
004",
    "order_details.0.product_detail.product_id": "999012",
    "order_details.0.product_detail.product_map.one": "1.1",
    "order_details.0.product_detail.product_map.two": "1.1",
    "order_details.0.product_detail.product_name": "hp printer X11ew",
    "order_details.0.product_detail.product_status": "ONLY_FEW_LEFT",
    "order_details.0.quantity": "3",
    "order_details.0.total": "354.34",
    "order_id": "2389646",
    "total": "132.43"
}
...

Union

Vertica treats Avro Unions as arrays.

Unmatched keys

Data being loaded can contain keys that are not part of the table definition. If you are loading into a flex table (or a flexible complex type column), no data is lost. For a table with strongly-defined columns, however, new keys cannot be loaded because the table does not have a place to put them.

This parser emits warnings if it finds new keys and if both of the following are true:

The target table is not a flex table.
The new key is not nested within a flexible complex type column.

New keys are logged in the UDX_EVENTS system table. If a new key is a complex type with nested keys, only the top-level key is logged. When you see a warning about unmatched keys, you can query this table and then use ALTER TABLE to modify your table definition for future loads.

Querying an external table loads data and thus can trigger these warnings. To prevent them, set the suppress_warnings parameter to 'unmatched_keys' or 'true':

=> CREATE EXTERNAL TABLE restaurants(
            name VARCHAR(50),
            menu ARRAY[ROW(item VARCHAR(50), price NUMERIC(8,2)),100])
   AS COPY FROM '/data/rest.json'
   PARSER FAVROPARSER(suppress_warnings='unmatched_key');

Examples

This example shows how to create and load a flex table with Avro data using favroparser. After loading the data, you can query virtual columns:

=> CREATE FLEX TABLE avro_basic();
CREATE TABLE

=> COPY avro_basic FROM '/home/dbadmin/data/weather.avro' PARSER FAVROPARSER();
Rows Loaded
-------------
5
(1 row)

=> SELECT station, temp, time FROM avro_basic;
station | temp |     time
---------+------+---------------
mohali  | 0    | -619524000000
lucknow | 22   | -619506000000
norwich | -11  | -619484400000
ams     | 111  | -655531200000
baddi   | 78   | -655509600000
(5 rows)

For more information, see Avro data.

3.3 - FCEFPARSER

Parses ArcSight Common Event Format (CEF) log files.

Parses ArcSight Common Event Format (CEF) log files. This parser loads values directly into any table column with a column name that matches a source data key. The parser stores the data loaded into a flex table in a single VMap.

This parser is for use in Flex tables only. All flex parsers store the data as a single VMap in the LONG VARBINAR_raw__ column. If a data row is too large to fit in the column, it is rejected. Vertica supports null values for loading data with NULL-specified columns.

Syntax

FCEFPARSER ( [parameter-name='value'[,...]] )

Parameters

delimiter: Single-character delimiter.
Default: ' '
record_terminator: Single-character record terminator.
**Default ****value: **newline
trim: Boolean, specifies whether to trim white space from header names and key values.
Default: true
reject_on_unescaped_delimiter: Boolean, specifies whether to reject rows containing unescaped delimiters. The CEF standard does not permit them.
Default: false

Examples

The following example illustrates creating a sample flex table for CEF data, with two real columns, eventId and priority.

Create a flex table cefdata:

=> create flex table cefdata();
CREATE TABLE

Load some basic CEF data, using the flex parser fcefparser:

=> copy cefdata from stdin parser fcefparser();
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> CEF:0|ArcSight|ArcSight|2.4.1|machine:20|New alert|High|
>> \.

Use the maptostring() function to view the contents of your cefdata flex table:

=> select maptostring(__raw__) from cefdata;
                  maptostring
-------------------------------------------------------------
 {
   "deviceproduct" : "ArcSight",
   "devicevendor" : "ArcSight",
   "deviceversion" : "2.4.1",
   "name" : "New alert",
   "severity" : "High",
   "signatureid" : "machine:20",
   "version" : "0"
}


(1 row)

Select some virtual columns from the cefdata flex table:
```
= select deviceproduct, severity, deviceversion from cefdata;
 deviceproduct | severity | deviceversion
---------------+----------+---------------
 ArcSight      | High     | 2.4.1
(1 row)
```
For more information, see Common event format (CEF) data

See also

3.4 - FCSVPARSER

Parses CSV format (comma-separated values) data.

Parses CSV format (comma-separated values) data. Use this parser to load CSV data into columnar, flex, and hybrid tables. All data must be encoded in Unicode UTF-8 format. The fcsvparser parser supports the RFC 4180 standard for CSV data, and other options, to accommodate variations in CSV file format definitions. Invalid records are rejected. For more information about data formats, see Handling Non-UTF-8 input.

Syntax

FCSVPARSER ( [parameter='value'[,...]] )

Parameters

type

The default parameter values for the parser, one of the following strings:

rfc4180
traditional

You do not have to use the type parameter when loading data that conforms to the RFC 4180 standard (such as MS Excel files). See Loading CSV data for the RFC4180 default parameters, and other options you can specify for traditional CSV files.

Default: RFC4180

delimiter

A single-character value used to separate fields in the CSV data.

Default: , (for rfc4180 and traditional)

escape

A single-character value used as an escape character to interpret the next character in the data literally.

Default:

rfc4180: "
traditional: \

enclosed_by

A single-character value. Use enclosed_by to include a value that is identical to the delimiter, but should be interpreted literally. For example, if the data delimiter is a comma (,), and you want to use a comma within the data ("my name is jane, and his is jim").

Default: "

record_terminator

A single-character value used to specify the end of a record.

Default:

rfc4180: \n
traditional: \r\n

header

Boolean, specifies whether to use the first row of data as a header column. When header=true (default), and no header exists, fcsvparser uses a default column heading. The default header consists of ucoln, where n is the column offset number, starting with 0 for the first column. You can specify custom column heading names using the header_names parameter, described next.

If you specify header=false, the fcsvparser parses the first row of input as data, rather than as column headers.

Default: true

header_names

A list of column header names, delimited by the character defined by the parser's delimiter parameter. Use this parameter to specify header names in a CSV file without a header row, or to override the column names present in the CSV source. To override one or more existing column names, specify the header names to use. This parameter overrides any header row in the data.

trim

Boolean, specifies whether to trim white space from header names and key values.

Default: true

omit_empty_keys

Boolean, specifies how the parser handles header keys without values. If true, keys with an empty value in the header row are not loaded.

Default: false

reject_on_duplicate

Boolean, specifies whether to ignore duplicate records (false), or to reject duplicates (true). In either case, the load continues.

Default: false

reject_on_empty_key

Boolean, specifies whether to reject any row containing a key without a value.

Default: false

reject_on_materialized_type_error

Boolean, specifies whether to reject any materialized column value that the parser cannot coerce into a compatible data type. See Loading CSV data.

Default: false

Examples

This example shows how you can use fcsvparser to load a flex table, build a view, and then query that view.

Create a flex table for CSV data:

=> CREATE FLEX TABLE rfc();
CREATE TABLE

Use fcsvparser to load the data from STDIN. Specify that no header exists, and enter some data as shown:

=> COPY rfc FROM stdin PARSER fcsvparser(header='false');
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 10,10,20
>> 10,"10",30
>> 10,"20""5",90
>> \.

Run the compute_flextable_keys_and_build_view function, and query the rfc_view. Notice that the default enclosed_by character permits an escape character (") within a field ("20""5"). Thus, the resulting value was parsed correctly. Since no header existed in the input data, the function added ucoln for each column:

=> SELECT compute_flextable_keys_and_build_view('rfc');
                           compute_flextable_keys_and_build_view
--------------------------------------------------------------------------------------------
 Please see public.rfc_keys for updated keys
The view public.rfc_view is ready for querying
(1 row)

=> SELECT * FROM rfc_view;
 ucol0 | ucol1 | ucol2
-------+-------+-------
 10    | 10    | 20
 10    | 10    | 30
 10    | 20"5  | 90
(3 rows)

For more information and examples using other parameters of this parser, see Loading CSV data.

3.5 - FDELIMITEDPAIRPARSER

Parses delimited data files.

Parses delimited data files. This parser provides a subset of the functionality in the parser fdelimitedparser. Use the fdelimitedpairparser when the data you are loading specifies pairs of column names with data in each row.

Syntax

FDELIMITEDPAIRPARSER ( [parameter-name='value'[,...]] )

Parameters

delimiter: Specifies a single-character delimiter.
Default: ' '
record_terminator: Specifies a single-character record terminator.
Default: newline
trim: Boolean specifies whether to trim white space from header names and key values.
Default: true

Examples

The following example illustrates creating a sample flex table for simple delimited data, with two real columns, eventId and priority.

Create a table:

=> create flex table CEFData(eventId int default(eventId::int), priority int default(priority::int) );
CREATE TABLE

Load a sample delimited OpenText ArcSight log file into the CEFData table, using the fcefparser:

=> copy CEFData from '/home/release/kmm/flextables/sampleArcSight.txt' parser fdelimitedpairparser();
Rows Loaded | 200

After loading the sample data file, use maptostring() to display the virtual columns in the __raw__ column of CEFData:

=> select maptostring(__raw__) from CEFData limit 1;                                                                                                                                                                                                                                        maptostring
-----------------------------------------------------------
   "agentassetid" : "4-WwHuD0BABCCQDVAeX21vg==",
   "agentzone" : "3083",
   "agt" : "265723237",
   "ahost" : "svsvm0176",
   "aid" : "3tGoHuD0BABCCMDVAeX21vg==",
   "art" : "1099267576901",
   "assetcriticality" : "0",
   "at" : "snort_db",
   "atz" : "America/Los_Angeles",
   "av" : "5.3.0.19524.0",
   "cat" : "attempted-recon",
   "categorybehavior" : "/Communicate/Query",
   "categorydevicegroup" : "/IDS/Network",
   "categoryobject" : "/Host",
   "categoryoutcome" : "/Attempt",
   "categorysignificance" : "/Recon",
   "categorytechnique" : "/Scan",
   "categorytupledescription" : "An IDS observed a scan of a host.",
   "cnt" : "1",
   "cs2" : "3",
   "destinationgeocountrycode" : "US",
   "destinationgeolocationinfo" : "Richardson",
   "destinationgeopostalcode" : "75082",
   "destinationgeoregioncode" : "TX",
   "destinationzone" : "3133",
   "device product" : "Snort",
   "device vendor" : "Snort",
   "device version" : "1.8",
   "deviceseverity" : "2",
   "dhost" : "198.198.121.200",
   "dlat" : "329913940429",
   "dlong" : "-966644973754",
   "dst" : "3334896072",
   "dtz" : "America/Los_Angeles",
   "dvchost" : "unknown:eth1",
   "end" : "1364676323451",
   "eventid" : "1219383333",
   "fdevice product" : "Snort",
   "fdevice vendor" : "Snort",
   "fdevice version" : "1.8",
   "fdtz" : "America/Los_Angeles",
   "fdvchost" : "unknown:eth1",
   "lblstring2label" : "sig_rev",
   "locality" : "0",
   "modelconfidence" : "0",
   "mrt" : "1364675789222",
   "name" : "ICMP PING NMAP",
   "oagentassetid" : "4-WwHuD0BABCCQDVAeX21vg==",
   "oagentzone" : "3083",
   "oagt" : "265723237",
   "oahost" : "svsvm0176",
   "oaid" : "3tGoHuD0BABCCMDVAeX21vg==",
   "oat" : "snort_db",
   "oatz" : "America/Los_Angeles",
   "oav" : "5.3.0.19524.0",
   "originator" : "0",
   "priority" : "8",
   "proto" : "ICMP",
   "relevance" : "10",
   "rt" : "1099267573000",
   "severity" : "8",
   "shost" : "198.198.104.10",
   "signature id" : "[1:469]",
   "slat" : "329913940429",
   "slong" : "-966644973754",
   "sourcegeocountrycode" : "US",
   "sourcegeolocationinfo" : "Richardson",
   "sourcegeopostalcode" : "75082",
   "sourcegeoregioncode" : "TX",
   "sourcezone" : "3133",
   "src" : "3334891530",
   "start" : "1364676323451",
   "type" : "0"
}

(1 row)

Select the eventID and priority real columns, along with two virtual columns, atz and destinationgeoregioncode:


=>  select eventID, priority, atz, destinationgeoregioncode from CEFData limit 10;
  eventID   | priority |         atz         | destinationgeoregioncode
------------+----------+---------------------+--------------------------
 1218325417 |        5 | America/Los_Angeles |
 1219383333 |        8 | America/Los_Angeles | TX
 1219533691 |        9 | America/Los_Angeles | TX
 1220034458 |        5 | America/Los_Angeles | TX
 1220034578 |        9 | America/Los_Angeles |
 1220067119 |        5 | America/Los_Angeles | TX
 1220106960 |        5 | America/Los_Angeles | TX
 1220142122 |        5 | America/Los_Angeles | TX
 1220312009 |        5 | America/Los_Angeles | TX
 1220321355 |        5 | America/Los_Angeles | CA
(10 rows)

3.6 - FDELIMITEDPARSER

Parses data using a delimiter character to separate values.

Parses data using a delimiter character to separate values. The fdelimitedparser loads delimited data, storing it in a single-value VMap.

Note

By default, fdelimitedparser treats empty fields as NULL, rather than as an empty string (''). This behavior makes casting easier. Casting a NULL to an integer (NULL::int) is valid, while casting an empty string to an integer (''::int) is not. If required, use the treat_empty_val_as_null parameter to change the default behavior of fdelimitedparser.

Syntax

FDLIMITEDPARSER ( [parameter-name='value'[,...]] )

Parameters

delimiter: Single character delimiter.
Default:|
record_terminator: Single-character record terminator.
Default:\n
trim: Boolean, specifies whether to trim white space from header names and key values.
Default: true
header: Boolean, specifies that a header column exists. The parser uses col### for the column names if you use this parameter but no header exists.
Default:true
omit_empty_keys: Boolean, specifies how the parser handles header keys without values. If omit_empty_keys=true, keys with an empty value in the headerrow are not loaded.
Default: false
reject_on_duplicate: Boolean, specifies whether to ignore duplicate records (false), or to reject duplicates (true). In either case, the load continues.
Default:false
reject_on_empty_key: Boolean, specifies whether to reject any row containing a key without a value.
Default:false
reject_on_materialized_type_error: Boolean, specifies whether to reject any row value for a materialized column that the parser cannot coerce into a compatible data type. See Using flex table parsers.
Default:false
treat_empty_val_as_null: Boolean, specifies that empty fields become NULLs, rather than empty strings ('').
Default: true

Examples

Create a flex table for delimited data:

t=> CREATE FLEX TABLE delim_flex ();
CREATE TABLE

Use the fdelimitedparser to load some delimited data from STDIN, specifying a comma (,) column delimiter:

=> COPY delim_flex FROM STDIN parser fdelimitedparser (delimiter=',');
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> deviceproduct, severity, deviceversion
>> ArcSight, High, 2.4.1
>> \.

You can now query virtual columns in the delim_flex flex table:

=> SELECT deviceproduct, severity, deviceversion from delim_flex;
 deviceproduct | severity | deviceversion
---------------+----------+---------------
 ArcSight      | High     | 2.4.1
(1 row)

3.7 - FJSONPARSER

Parses and loads a JSON file.

Parses and loads a JSON file. This file can contain either repeated JSON data objects (including nested maps), or an outer list of JSON elements.

When loading into a flex or hybrid table, the parser stores the JSON data in a single-value VMap. When loading into a hybrid or columnar table, the parser loads data directly into any table column with a column name that matches a key in the JSON source data.

You can load complex types in the JSON source (arrays, structs, or combinations) with strong typing or as flexible complex types. A flexible complex type is loaded into a VMap column, as in flex tables. To load complex types as VMap columns, specify a column type of LONG VARBINARY. To preserve the indexing in complex types, set flatten_maps to false.

This parser can notify you if it finds keys in the data that are not part of the table definition. See Unmatched Keys.

FJSONPARSER supports cooperative parse only if record_terminator is specified. It does not support apportioned load.

Syntax

FJSONPARSER ( [parameter=value[,...]] )

Parameters

flatten_maps

Boolean, whether to flatten sub-maps within the JSON data, separating map levels with a period (.). This value affects all data in the load, including nested maps.

This parameter applies only to flex tables or VMap columns and is ignored when loading strongly-typed complex types.

Default: true

flatten_arrays

Boolean, whether to convert lists to sub-maps with integer keys. When lists are flattened, key names are concatenated as for maps. Lists are not flattened by default. This value affects all data in the load, including nested lists.

This parameter applies only to flex tables or VMap columns and is ignored when loading strongly-typed complex types.

Default: false

reject_on_duplicate

Boolean, whether to ignore duplicate records (false), or to reject duplicates (true). In either case, the load continues.

Default: false

reject_on_empty_key

Boolean, whether to reject any row containing a field key without a value.

Default: false

omit_empty_keys

Boolean, whether to omit any field key from the data that does not have a value. Other fields in the same record are loaded.

Default: false

record_terminator

When set, any invalid JSON records are skipped and parsing continues with the next record. Records must be terminated uniformly. For example, if your input file has JSON records terminated by newline characters, set this parameter to E'\n'). If any invalid JSON records exist, parsing continues after the next record_terminator.

Even if the data does not contain invalid records, specifying an explicit record terminator can improve load performance by allowing cooperative parse and apportioned load to operate more efficiently.

When you omit this parameter, parsing ends at the first invalid JSON record.

reject_on_materialized_type_error

If the column is an array and the data to be loaded is too large, then false sets the column value to NULL and true rejects the row.

Default: false

start_point

String, the name of a key in the JSON load data at which to begin parsing. The parser ignores all data before the start_point value. The value is loaded for each object in the file. The parser processes data after the first instance, and up to the second, ignoring any remaining data.

start_point_occurrence

Integer, the nth occurrence of the value you specify with start_point. Use in conjunction with start_point when the data has multiple start values and you know the occurrence at which to begin parsing.

Default: 1

suppress_nonalphanumeric_key_chars

Boolean, whether to suppress non-alphanumeric characters in JSON key values. The parser replaces these characters with an underscore (_) when this parameter is true.

Default: false

key_separator

Character for the parser to use when concatenating key names.

Default: period (.)

suppress_warnings

String, which warnings to suppress:

unmatched_key (see Unmatched Keys)
true or t (suppress all warnings)
false or f (do not suppress warnings)

Default: false

Data types

If the total size of an array exceeds the size defined by the target table, the parser sets the value to null.

Unmatched keys

This parser emits warnings if it finds new keys and if both of the following are true:

The target table is not a flex table.
The new key is not nested within a flexible complex type column.

Querying an external table loads data and thus can trigger these warnings. To prevent them, set the suppress_warnings parameter to 'unmatched_keys' or 'true':

=> CREATE EXTERNAL TABLE restaurants(
            name VARCHAR(50),
            menu ARRAY[ROW(item VARCHAR(50), price NUMERIC(8,2)),100])
   AS COPY FROM '/data/rest.json'
   PARSER FJSONPARSER(suppress_warnings='unmatched_key');

Examples

The following example loads JSON data from STDIN using the default parameters:

=> CREATE TABLE people(age INT, name VARCHAR);
CREATE TABLE

=> COPY people FROM STDIN PARSER FJSONPARSER();
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> {"age": 5, "name": "Tim"}
>>  {"age": 3}
>>  {"name": "Fred"}
>>  {"name": "Bob", "age": 10}
>> \.
=> SELECT * FROM people;
 age | name
-----+------
     | Fred
  10 | Bob
   5 | Tim
   3 |
(4 rows)

The following example uses the reject_on_duplicate parameter to reject duplicate values:

=> CREATE FLEX TABLE json_dupes();
CREATE TABLE
=> COPY json_dupes FROM stdin PARSER fjsonparser(reject_on_duplicate=true)
exceptions '/home/dbadmin/load_errors/json_e.out'
rejected data '/home/dbadmin/load_errors/json_r.out';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> {"a":"1","a":"2","b":"3"}
>> \.
=>  \!cat /home/dbadmin/load_errors/json_e.out
COPY: Input record 1 has been rejected (Rejected by user-defined parser).
Please see /home/dbadmin/load_errors/json_r.out, record 1 for the rejected record.
COPY: Loaded 0 rows, rejected 1 rows.

The following example loads array data:

$ cat addrs.json
 {"number": 301, "street": "Grant", "attributes": [1, 2, 3, 4]}

=> CREATE EXTERNAL TABLE customers(number INT, street VARCHAR, attributes ARRAY[INT])
    AS COPY FROM 'addrs.json' PARSER fjsonparser();

=> SELECT number, street, attributes FROM customers;
 num | street| attributes
-----+-----------+---------------
301  | Grant | [1,2,3,4]
(1 row)

The following example loads a flexible complex type, rejecting rows that have empty keys within the nested records. Notice that while the data has two restaurants, one has a key name that is an empty string. This one is rejected:

$ cat rest1.json
{
    "name" : "Bob's pizzeria",
    "cuisine" : "Italian",
    "location_city" : ["Cambridge", "Pittsburgh"],
    "menu" : [{"item" : "cheese pizza", "" : "$8.25"},
              {"item" : "spinach pizza", "price" : "$10.50"}]
}
{
    "name" : "Bakersfield Tacos",
    "cuisine" : "Mexican",
    "location_city" : ["Pittsburgh"],
    "menu" : [{"item" : "veggie taco", "price" : "$9.95"},
              {"item" : "steak taco", "price" : "$10.95"}]
}

=> CREATE TABLE rest (name VARCHAR, cuisine VARCHAR, location_city LONG VARBINARY, menu LONG VARBINARY);

=> COPY rest FROM '/data/rest1.json'
    PARSER fjsonparser(flatten_maps=false, reject_on_empty_key=true);
Rows Loaded
------------
       1
(1 row)

=> SELECT maptostring(location_city), maptostring(menu) FROM rest;
        maptostring        |                          maptostring
---------------------------+-------------------------------------------------------
 {
    "0": "Pittsburgh"
} | {
    "0": {
        "item": "veggie taco",
        "price": "$9.95"
    },
    "1": {
        "item": "steak taco",
        "price": "$10.95"
    }
}
(1 row)

To instead load partial data, use omit_empty_keys to bypass the missing keys while loading everything else:



=> COPY rest FROM '/data/rest1.json'
    PARSER fjsonparser(flatten_maps=false, omit_empty_keys=true);
 Rows Loaded
-------------
           2
(1 row)

=> SELECT maptostring(location_city), maptostring(menu) from rest;
                   maptostring                   |               maptostring
-------------------------------------------------+---------------------------------
 {
    "0": "Pittsburgh"
}                       | {
    "0": {
        "item": "veggie taco",
        "price": "$9.95"
    },
    "1": {
        "item": "steak taco",
        "price": "$10.95"
    }
}
 {
    "0": "Cambridge",
    "1": "Pittsburgh"
} | {
    "0": {
        "item": "cheese pizza"
    },
    "1": {
        "item": "spinach pizza",
        "price": "$10.50"
    }
}
(2 rows)

To instead load this data with strong typing, define the complex types in the table:

=> CREATE EXTERNAL TABLE restaurants
  (name VARCHAR, cuisine VARCHAR,
   location_city ARRAY[VARCHAR(80)],
   menu ARRAY[ ROW(item VARCHAR(80), price FLOAT) ]
  )
 AS COPY FROM '/data/rest.json' PARSER fjsonparser();

=> SELECT * FROM restaurants;
       name        | cuisine |       location_city        |                    \
                menu
-------------------+---------+----------------------------+--------------------\
--------------------------------------------------------
 Bob's pizzeria    | Italian | ["Cambridge","Pittsburgh"] | [{"item":"cheese pi\
zza","price":0.0},{"item":"spinach pizza","price":0.0}]
 Bakersfield Tacos | Mexican | ["Pittsburgh"]             | [{"item":"veggie ta\
co","price":0.0},{"item":"steak taco","price":0.0}]
(2 rows)

In the following example, the data contains two new fields. One is a top-level field (a new column), and the other is a new field on an existing struct. The new fields are recorded in the UDX_EVENTS system table:

=> COPY rest FROM '/data/rest2.json' PARSER FJSONPARSER();
WARNING 10596:  Warning in UDx call in user-defined object [FJSONParser], code: 0, message:
Data source contained keys which did not match table schema
HINT:  SELECT key, sum(num_instances) FROM v_monitor.udx_events WHERE event_type = 'UNMATCHED_KEY' GROUP BY key
 Rows Loaded
-------------
           2
(1 row)

=> SELECT key, SUM(num_instances) FROM v_monitor.UDX_EVENTS
   WHERE event_type = 'UNMATCHED_KEY' GROUP BY key;
          key           | SUM
------------------------+-----
 chain                  |   1
 menu.elements.calories |   7
(2 rows)

For other examples, see JSON data.

3.8 - FREGEXPARSER

Parses a regular expression, matching columns to the contents of the named regular expression groups.

Syntax

FREGEXPARSER ( pattern=[parameter-name='value'[,...]] )

Parameters

pattern: Specifies the regular expression of data to match.
Default: Empty string ("")
use_jit: Boolean, specifies whether to use just-in-time compiling when parsing the regular expression.
Default: false
record_terminator: Specifies the character used to separate input records.
Default: \n
logline_column: A string that captures the destination column containing the full string that the regular expression matched.
Default: Empty string ("")

Example

These examples use the following regular expression, which searches for information that includes the timestamp, date, thread_name, and thread_id strings.

Caution

For display purposes, this sample regular expression adds new line characters to split long lines of text. To use this expression in a query, first copy and edit the example to remove any new line characters.

This example expression loads any thread_id hex value, regardless of whether it has a 0x prefix, (<thread_id>(?:0x)?[0-9a-f]+).

'^(?<time>\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d+)
 (?<thread_name>[A-Za-z ]+):(?<thread_id>(?:0x)?[0-9a-f]+)
-?(?<transaction_id>[0-9a-f])?(?:[(?<component>\w+)]
\<(?<level>\w+)\> )?(?:<(?<elevel>\w+)> @[?(?<enode>\w+)]?: )
?(?<text>.*)'

Create a flex table (vlog) to contain the results of a Vertica log file. For this example, we made a copy of a log file in the directory /home/dbadmin/data/vertica.log:
```
=> CREATE FLEX TABLE vlog1();
CREATE TABLE
```

Use the fregexparser with the sample regular expression to load data from the log file. Be sure to remove any line characters before using this expression shown here:

=>  COPY vlog1 FROM '/home/dbadmin/tempdat/KMvertica.log'
PARSER FREGEXPARSER(pattern='^(?<time>\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d+)
 (?<thread_name>[A-Za-z ]+):(?<thread_id>(?:0x)?[0-9a-f]+)
-?(?<transaction_id>[0-9a-f])?(?:[(?<component>\w+)]
\<(?<level>\w+)\> )?(?:<(?<elevel>\w+)> @[?(?<enode>\w+)]?: )
?(?<text>.*)'


);

 Rows Loaded
-------------
       31049
(1 row)

After successfully loading data, use the MAPTOSTRING() function with the table's __raw__ column. The four rows (limt 4) that the query returns are regular expression results of the KMvertica.log file, parsed with fregexparser. The output shows thread_id values with a preceding 0x or without:

=> SELECT maptostring(__raw__) FROM vlog1 LIMIT 4;
                        maptostring
-------------------------------------------------------------------------------------
{
  "text" : " [Init] <INFO> Log /home/dbadmin/VMart/v_vmart_node0001_catalog/vertica.log 
opened; #2",
  "thread_id" : "0x7f2157e287c0",
  "thread_name" : "Main",
  "time" : "2017-03-21 23:30:01.704"
}

{
  "text" : " [Init] <INFO> Processing command line: /opt/vertica/bin/vertica -D 
/home/dbadmin/VMart/v_vmart_node0001_catalog -C VMart -n v_vmart_node0001 -h 
10.20.100.247 -p 5433 -P 4803 -Y ipv4",
  "thread_id" : "0x7f2157e287c0",
  "thread_name" : "Main",
  "time" : "2017-03-21 23:30:01.704"
}

{
  "text" : " [Init] <INFO> Starting up Vertica Analytic Database v8.1.1-20170321",
  "thread_id" : "7f2157e287c0",
  "thread_name" : "Main",
  "time" : "2017-03-21 23:30:01.704"
}

{
  "text" : " [Init] <INFO> Compiler Version: 4.8.2 20140120 (Red Hat 4.8.2-15)",
  "thread_id" : "7f2157e287c0",
  "thread_name" : "Main",
  "time" : "2017-03-21 23:30:01.704"
}
(4 rows)

3.9 - ORC

Use the ORC clause with the COPY FROM statement to load data in the ORC format.

Use the ORC clause with the COPY statement to load data in the ORC format. When loading data into Vertica, you can read all primitive types, UUIDs, and complex types.

When loading ORC data, you must account for all columns in the data; you cannot select only some columns.

If the table definition includes columns of primitive types and those columns are not in the data, the parser fills those columns with NULL. If the table definition includes columns of complex types, those columns must be present in the data.

This parser does not support apportioned load or cooperative parse.

Syntax

ORC ( [ parameter=value[,...] ] )

Parameters

All parameters are optional.

hive_partition_cols: Comma-separated list of columns that are partition columns in the data.

Deprecated
Instead, use COPY PARTITION COLUMNS. See Partitioned file paths. If you use both this parameter and PARTITION COLUMNS, COPY ignores the parameter.
allow_no_match: Whether to accept a path containing a glob with no matching files and report zero rows in query results. If this parameter is not set, Vertica returns an error if the path in the FROM clause does not match at least one file.

Data types

This parser can read all primitive types, UUIDs, and complex types.

If the total size of an array exceeds the size defined by the target table, the parser rejects the row.

Examples

The ORC clause does not use the PARSER option:

=> CREATE EXTERNAL TABLE orders (...)
   AS COPY FROM 's3://DataLake/orders.orc' ORC;

You can read a map column as an array of rows, as in the following example:

=> CREATE EXTERNAL TABLE orders
 (orderkey INT,
  custkey INT,
  prods ARRAY[ROW(key VARCHAR(10), value DECIMAL(12,2))],
  orderdate DATE
 ) AS COPY FROM '...' ORC;

3.10 - PARQUET

Use the PARQUET parser with the COPY FROM statement to load data in the Parquet format.

Use the PARQUET parser with the COPY statement to load data in the Parquet format. When loading data into Vertica you can read all primitive types, UUIDs, and complex types.

By default, the Parquet parser uses strong schema matching, meaning that columns in the data must exactly match the columns in the table using the data. You can optionally use Loose Schema Matching.

When loading Parquet data, Vertica caches the Parquet metadata to improve efficiency. This cache uses local TEMP storage and is not used if TEMP is remote. See the ParquetMetadataCacheSizeMB configuration parameter to change the size of the cache.

This parser does not support apportioned load or cooperative parse.

Syntax

PARQUET ( [ parameter=value[,...] ] )

Parameters

All parameters are optional.

hive_partition_cols: Comma-separated list of columns that are partition columns in the data.

Deprecated
Instead, use COPY PARTITION COLUMNS. See Partitioned file paths. If you use both this parameter and PARTITION COLUMNS, COPY ignores the parameter.
allow_no_match: Boolean. Whether to accept a path containing a glob with no matching files and report zero rows in query results. If this parameter is not set, Vertica returns an error if the path in the FROM clause does not match at least one file.
allow_long_varbinary_match_complex_type: Boolean. Whether to enable flexible column types (see Flexible complex types). If true, the Parquet parser allows a complex type in the data to match a table column defined as LONG VARBINARY. If false, the Parquet parser requires strong typing of complex types. With the parameter set you can still use strong typing. Set this parameter to false if you want use of flexible columns to be treated as an error.
do_soft_schema_match_by_name: Boolean. Whether to enable loose schema matching (true) instead of the strict matching based on column order in the table definition and parquet file (false, default). See Loose Schema Matching for more information.
reject_on_materialized_type_error: Boolean, applies only if do_soft_schema_match_by_name is true. Specifies what to do when loose schema matching is being used and a value cannot be coerced from the data to the target column type. A value of true (default) means to reject the row; a value of false means to use NULL for the value or, for strings that are too long, truncate. See the table of type coercions for coercible type mappings.

Loose schema matching

By default, the Parquet parser uses strong schema matching. This means that all columns in the Parquet data must be loaded, in the same order as in the data. However, there are times when you only want to pull certain columns, or you want to be able to accommodate future changes in the Parquet schema.

Use the do_soft_schema_match_by_name parameter to enable soft schema matching. This setting has the following effects:

Columns in the data are matched to columns in the table by their names. Names must exactly match but are case-insensitive.
Columns that exist in the Parquet data but are not part of the table definition are ignored.
Columns that exist in the table definition but not the Parquet data are filled with NULL. The parser logs an UNMATCHED_TABLE_COLUMNS_PARQUETPARSER event in QUERY_EVENTS.
If the same case-insensitive column name occurs more than once in the Parquet data, the parser uses the last one. (This situation can arise when using data written by tools that are case-sensitive.)
Column types do not need to exactly match, so long as the data type in the Parquet file can be coerced to the type used by the table. If a type cannot be coerced, the parser logs a TYPE_MISMATCH_COLUMNS_PARQUETPARSER event in QUERY_EVENTS. If reject_on_materialized_type_error is true then the parser rejects the row. If it is false, the parser uses NULL or, for string values that are too long, truncates the value.
Columns using complex types (other than one-dimensional arrays of primitive types) can be defined but not queried.

Data types

The Parquet parser maps Parquet data types to Vertica data types as follows.

Parquet Logical Type	Vertica Data Type
StringLogicalType	VARCHAR
MapLogicalType	ARRAY[ROW]
ListLogicalType	ARRAY/SET
IntLogicalType	INT/NUMERIC
DecimalLogicalType	NUMERIC
DateLogicalType	DATE
TimeLogicalType	TIME
TimestampLogicalType	TIMESTAMP
UUIDLogicalType	UUID

If the total size of an array exceeds the size defined by the target table, the parser rejects the row.

The following logical types are not supported:

EnumLogicalType
IntervalLogicalType
JSONLogicalType
BSONLogicalType
UnknownLogicalType

The Parquet parser supports the following mappings of physical types:

Parquet Physical Type	Vertica Data Type
BOOLEAN	BOOLEAN
INT32/INT64	INT
INT96	Supported only for TIMESTAMP
FLOAT	DOUBLE
DOUBLE	DOUBLE
BYTE_ARRAY	VARBINARY
FIXED_LEN_BYTE_ARRAY	BINARY

The following mappings are supported with type coercion and loose schema matching.

Parquet Physical Type	Coercible to Vertica Data Type
BOOLEAN	BOOLEAN
INT32, INT64, BOOLEAN	INT
FLOAT, DOUBLE	DOUBLE
INT32, INT96	DATE
INT64, INT96	TIMESTAMP, TIMESTAMPTZ
INT64 If precision > 0: INT32, BYTE_ARRAY, FIXED_LEN_BYTE_ARRAY	Numeric
BYTE_ARRAY	CHAR, VARCHAR, LONG VARCHAR, BINARY, VARBINARY, LONG VARBINARY
FIXED_LEN_BYTE_ARRAY	UUID

Vertica supports only 3-level-encoded arrays, not 2-level-encoded.

Examples

The PARQUET clause does not use the PARSER option:

=> COPY sales FROM 's3://DataLake/sales.parquet' PARQUET;

In the following example, the data directory contains no files:

=> CREATE EXTERNAL TABLE customers (...)
    AS COPY FROM 'webhdfs:///data/*.parquet' PARQUET;
=> SELECT COUNT(*) FROM customers;
ERROR 7869: No files match when expanding glob: [webhdfs:///data/*.parquet]

To read zero rows instead of producing an error, use the allow_no_match parameter:

=> CREATE EXTERNAL TABLE customers (...)
    AS COPY FROM 'webhdfs:///data/*.parquet'
       PARQUET(allow_no_match='true');
=> SELECT COUNT(*) FROM customers;
 count
-------
     0
(1 row)

To allow reading a complex type (menu, in this example) as a flexible column type, use the allow_long_varbinary_match_complex_type parameter:

=> CREATE EXTERNAL TABLE restaurants
    (name VARCHAR, cuisine VARCHAR, location_city ARRAY[VARCHAR], menu LONG VARBINARY)
    AS COPY FROM '/data/rest*.parquet'
    PARQUET(allow_long_varbinary_match_complex_type='True');

To read only some columns from the restaurant data, use loose schema matching:

=> CREATE EXTERNAL TABLE restaurants(name VARCHAR, cuisine VARCHAR)
    AS COPY FROM '/data/rest*.parquet'
    PARQUET(allow_long_varbinary_match_complex_type='True',
            do_soft_schema_match_by_name='True');

=> SELECT * from restaurant;
       name        | cuisine
-------------------+----------
 Bob's pizzeria    | Italian
 Bakersfield Tacos | Mexican
(2 rows)

4 - Restrictions

COPY has the following restrictions:.

COPY has the following restrictions:

Invalid data

COPY considers the following data invalid:

Missing columns (an input line has fewer columns than the recipient table).
Extra columns (an input line has more columns than the recipient table).
Empty columns for an INTEGER or DATE/TIME data type. If a column is empty for either of these types, COPY does not use the default value that was defined by CREATE TABLE. However, if you do not supply a column option as part of the COPY statement, the default value is used.
Incorrect representation of a data type. For example, trying to load a non-numeric value into an INTEGER column is invalid.

Constraint violations

If primary key, unique key, or check constraints are enabled for automatic enforcement in the target table, Vertica enforces those constraints when you load new data. If a violation occurs, Vertica rolls back the operation and returns an error.

Empty line handling

When COPY encounters an empty line while loading data, the line is neither inserted nor rejected, but COPY increments the line record number. Consider this behavior when evaluating rejected records. If you return a list of rejected records and COPY encountered an empty row while loading data, the position of rejected records is not incremented by one, as demonstrated in the following example.

The example first loads values into a table that defines the first column as INT. Note the errors on rows 3, 4, and 8:

=> \! cat -n /home/dbadmin/test.txt
     1 1|A|2
     2 2|B|4
     3 A|D|7
     4 A|E|7
     5
     6
     7 6|A|3
     8 B|A|3

The empty rows (5 and 6) shift the reporting of the error on row 8:

=> SELECT row_number, rejected_data, rejected_reason FROM test_bad;
 row_number | rejected_data | rejected_reason
------------+---------------+----------------------------------------------
          3 | A|D|7 | Invalid integer format 'A' for column 1 (c1)
          4 | A|E|7 | Invalid integer format 'A' for column 1 (c1)
          6 | B|A|3 | Invalid integer format 'B' for column 1 (c1)
(3 rows)

Compressed file errors

When loading compressed files, COPY might abort and report an error, if the file seems to be corrupted. For example, this behavior can occur if reading the header block fails.

Disk quota

Tables and schemas can have disk quotas. If a load would violate either quota, the operation fails. For more information, see Disk quotas.

COPY

Syntax

Parameters

Restrictions

Privileges

1 - Examples

Specifying string options

Including multiple source files

Distributing a load

Loading data from shared storage

Note

Partitioned data

Using filler columns

Loading data into a flex table

Using named pipes

Loading compressed data

2 - Parameters

Target options

Column options

Input options

Note

Caution

Handling options

Note

Note

Note

Parser-specific options

DELIMITED parser

FIXEDWIDTH parser

3 - Parsers

3.1 - DELIMITED

COPY options

Data types

Examples

3.2 - FAVROPARSER

Syntax

Parameters

Primitive data types

Note

Avro logical types

Avro complex data types

Record

Map

Enum

Array

Union

Unmatched keys

Examples

3.3 - FCEFPARSER

Syntax

Parameters

Examples

See also

3.4 - FCSVPARSER

Syntax

Parameters

Examples

See also

3.5 - FDELIMITEDPAIRPARSER

Syntax

Parameters

Examples

See also

3.6 - FDELIMITEDPARSER

Note

Syntax

Parameters

Examples

See also

3.7 - FJSONPARSER

Syntax

Parameters

Data types

Unmatched keys

Examples

3.8 - FREGEXPARSER

Syntax

Parameters

Example

Caution