COPY rejected data and exception files

When executing a COPY statement, and parallel processing is ON (the default setting), COPY creates separate threads to process load files.

When executing a COPY statement, and parallel processing is ON (the default setting), COPY creates separate threads to process load files. Typically, the number of threads depends on the number of node cores in the system. Each node processes part of the load data. If the load succeeds overall, any parser rejections that occur during load processing are written to that node's specific rejected data and exceptions files. If the load fails, the rejected data file contents can be incomplete, or empty. If you do not specify a file name explicitly, COPY uses a default name and location for rejected data files. See the next topic for specifying your own rejected data and exception files.

Both rejected data and exceptions files are saved and stored on a per-node basis. This example uses multiple files as COPY inputs. Since the statement does not include either the REJECTED DATA or EXCEPTIONS parameters, rejected data and exceptions files are written to the default location, the database catalog subdirectory, CopyErrorLogs, on each node:

\set dir `pwd`/data/ \set remote_dir /vertica/test_dev/tmp_ms/
\set file1 '''':dir'C1_large_tbl.dat'''
\set file2 '''':dir'C2_large_tbl.dat'''
\set file3 '''':remote_dir'C3_large_tbl.dat'''
\set file4 '''':remote_dir'C4_large_tbl.dat'''
=>COPY large_tbl FROM :file1 ON site01,:file2 ON site01,
               :file3 ON site02,
               :file4 ON site02
               DELIMITER '|';

Specifying rejected data and exceptions files

The optional COPY REJECTED DATA and EXCEPTIONS parameters 'path' element lets you specify a non-default path in which to store the files.

If path resolves to a storage location, and the user invoking COPY is not a superuser, these are the required permissions:

Both parameters also have an optional ON nodename clause that uses the specified path:

...[ EXCEPTIONS 'path' [ ON nodename ] [, ...] ]...[ REJECTED DATA 'path' [ ON nodename ] [, ...] ]

While 'path' specifies the location of the rejected data and exceptions files (with their corresponding parameters), the optional ON nodename clause moves any existing rejected data and exception files on the node to the specified path on the same node.

Saving rejected data and exceptions files to a single server

The COPY statement does not have a facility to merge exception and rejected data files after COPY processing is complete. To see the contents of exception and rejected data files requires accessing each node's specific files.

Using VSQL variables for rejected data and exceptions files

This example uses vsql variables to specify the path and file names to use with the exceptions and rejected data parameters (except_s1 and reject_s1). The COPY statement specifies a single input file (large_tbl) on the initiator node:

\set dir `pwd`/data/ \set file1 '''':dir'C1_large_tbl.dat'''
\set except_s1 '''':dir'exceptions'''
\set reject_s1 '''':dir'rejections'''

COPY large_tbl FROM :file1 ON site01 DELIMITER '|'
REJECTED DATA :reject_s1 ON site01
EXCEPTIONS :except_s1 ON site01;

This example uses variables to specify exception and rejected date files (except_s2 and reject_s2) on a remote node. The COPY statement consists of a single input file on a remote node (site02):

\set remote_dir /vertica/test_dev/tmp_ms/\set except_s2 '''':remote_dir'exceptions'''
\set reject_s2 '''':remote_dir'rejections'''

COPY large_tbl FROM :file1 ON site02 DELIMITER '|'
REJECTED DATA :reject_s2 ON site02
EXCEPTIONS :except_s2 ON site02;

This example uses variables to specify that the exception and rejected data files are on a remote node (indicated by :remote_dir). The inputs to the COPY statement consist of multiple data files on two nodes (site01 and site02). The exceptions and rejected data options use the ON nodename clause with the variables to indicate where the files reside (site01 and site02):

\set dir `pwd`/data/ \set remote_dir /vertica/test_dev/tmp_ms/
\set except_s1 '''':dir''''
\set reject_s1 '''':dir''''
\set except_s2 '''':remote_dir''''
\set reject_s2 '''':remote_dir''''
COPY large_tbl FROM :file1 ON site01,
               :file2 ON site01,
               :file3 ON site02,
               :file4 ON site02
               DELIMITER '|'
               REJECTED DATA :reject_s1 ON site01, :reject_s2 ON site02
               EXCEPTIONS :except_s1 ON site01, :except_s2 ON site02;