COPY rejected data and exception files
When executing a COPY
statement, and parallel processing is ON (the default setting), COPY creates separate threads to process load files. Typically, the number of threads depends on the number of node cores in the system. Each node processes part of the load data. If the load succeeds overall, any parser rejections that occur during load processing are written to that node's specific rejected data and exceptions files. If the load fails, the rejected data file contents can be incomplete, or empty. If you do not specify a file name explicitly, COPY uses a default name and location for rejected data files. See the next topic for specifying your own rejected data and exception files.
Both rejected data and exceptions files are saved and stored on a per-node basis. This example uses multiple files as COPY
inputs. Since the statement does not include either the REJECTED DATA
or EXCEPTIONS
parameters, rejected data and exceptions files are written to the default location, the database catalog subdirectory, CopyErrorLogs
, on each node:
\set dir `pwd`/data/ \set remote_dir /vertica/test_dev/tmp_ms/
\set file1 '''':dir'C1_large_tbl.dat'''
\set file2 '''':dir'C2_large_tbl.dat'''
\set file3 '''':remote_dir'C3_large_tbl.dat'''
\set file4 '''':remote_dir'C4_large_tbl.dat'''
=>COPY large_tbl FROM :file1 ON site01,:file2 ON site01,
:file3 ON site02,
:file4 ON site02
DELIMITER '|';
Specifying rejected data and exceptions files
The optional COPY
REJECTED DATA
and EXCEPTIONS
parameters 'path'
element lets you specify a non-default path in which to store the files.
If path resolves to a storage location, and the user invoking COPY is not a superuser, these are the required permissions:
-
The storage location must have been created (or altered) with the USER option (see CREATE LOCATION and ALTER_LOCATION_USE)
-
The user must already have been granted READ access to the storage location where the file(s) exist, as described in GRANT (storage location)
Both parameters also have an optional ON
nodename clause that uses the specified path:
...[ EXCEPTIONS 'path' [ ON nodename ] [, ...] ]...[ REJECTED DATA 'path' [ ON nodename ] [, ...] ]
While 'path' specifies the location of the rejected data and exceptions files (with their corresponding parameters), the optional ON
nodename clause moves any existing rejected data and exception files on the node to the specified path on the same node.
Saving rejected data and exceptions files to a single server
The COPY
statement does not have a facility to merge exception and rejected data files after COPY
processing is complete. To see the contents of exception and rejected data files requires accessing each node's specific files.
Note
To save all exceptions and rejected data files on a network host, be sure to give each node's files unique names, so that different cluster nodes do not overwrite other nodes' files. For instance, if you set up a server with two directories (/vertica/exceptions
and /vertica/rejections
), specify file names for each Vertica cluster node to identify each node, such as node01_exceptions.txt
and node02_exceptions.txt
. This way, each cluster node's files are easily distinguishable in the exceptions and rejections directories.
Using VSQL variables for rejected data and exceptions files
This example uses vsql
variables to specify the path and file names to use with the exceptions
and rejected data
parameters (except_s1
and reject_s1
). The COPY
statement specifies a single input file (large_tbl
) on the initiator node:
\set dir `pwd`/data/ \set file1 '''':dir'C1_large_tbl.dat'''
\set except_s1 '''':dir'exceptions'''
\set reject_s1 '''':dir'rejections'''
COPY large_tbl FROM :file1 ON site01 DELIMITER '|'
REJECTED DATA :reject_s1 ON site01
EXCEPTIONS :except_s1 ON site01;
This example uses variables to specify exception and rejected date files (except_s2
and reject_s2
) on a remote node. The COPY statement consists of a single input file on a remote node (site02
):
\set remote_dir /vertica/test_dev/tmp_ms/\set except_s2 '''':remote_dir'exceptions'''
\set reject_s2 '''':remote_dir'rejections'''
COPY large_tbl FROM :file1 ON site02 DELIMITER '|'
REJECTED DATA :reject_s2 ON site02
EXCEPTIONS :except_s2 ON site02;
This example uses variables to specify that the exception and rejected data files are on a remote node (indicated by :remote_dir
). The inputs to the COPY statement consist of multiple data files on two nodes (site01
and site02
). The exceptions
and rejected data
options use the ON
nodename
clause with the variables to indicate where the files reside (site01
and site02
):
\set dir `pwd`/data/ \set remote_dir /vertica/test_dev/tmp_ms/
\set except_s1 '''':dir''''
\set reject_s1 '''':dir''''
\set except_s2 '''':remote_dir''''
\set reject_s2 '''':remote_dir''''
COPY large_tbl FROM :file1 ON site01,
:file2 ON site01,
:file3 ON site02,
:file4 ON site02
DELIMITER '|'
REJECTED DATA :reject_s1 ON site01, :reject_s2 ON site02
EXCEPTIONS :except_s1 ON site01, :except_s2 ON site02;