Saving rejected data to a table
Use the REJECTED DATA
parameter with the AS TABLE
clause to specify a table in which to save rejected data. Saving rejected data to a file is mutually exclusive with using the AS TABLE
clause.
When you use the AS TABLE
clause, Vertica creates a new table if one does not exist, or appends to an existing table. If no parsing rejections occur during a load, the table exists but is empty. The next time you load data, Vertica inserts any rejected rows to the existing table.
The load rejection tables are a special type of table with the following capabilities and limitations:
-
Support
SELECT
statements -
Can use
DROP TABLE
-
Cannot be created outside of a
COPY
statement -
Do not support DML and DDL activities
-
Are not K-safe
To make the data in a rejected table K-safe, you can do one of the following:
-
Write a
CREATE TABLE..AS
statement, such as this example:=> CREATE TABLE new_table AS SELECT * FROM rejected_table;
-
Create a table to store rejected records, and run
INSERT..SELECT
operations into the new table
Using COPY NO COMMIT
If the COPY
statement includes options NO COMMIT
and REJECTED DATA AS TABLE
, and the reject-table
does not already exist, Vertica Analytic Database saves the rejected data table as a LOCAL TEMP table and returns a message that a LOCAL TEMP table is being created.
Rejected-data tables are useful for Extract-Load-Transform workflows, where you will likely use temporary tables more frequently. The rejected-data tables let you quickly load data and identify which records failed to load. If you load data into a temporary table that you created using the ON COMMIT DELETE
clause, the COPY
operation will not commit.
Location of rejected data table records
When you save rejected records to a table, using the REJECTED DATA AS TABLE
table_name
option, the data for the table is saved in a database data subdirectory, RejectionTableData
. For example, for a VMart
database, table data files reside here:
/home/dbadmin/VMart/v_vmart_node0001_data/RejectionTableData
Rejected data tables include both rejected data and the reason for the rejection (exceptions), along with other data columns, described next. Vertica suggests that you periodically drop any rejected data tables that you no longer require.
Querying a rejected data table
When you specify a rejected data table when loading data with COPY
, you can query that table for information about rejected data after the load operation is complete. For example:
-
Create the
loader
table:=> CREATE TABLE loader(a INT) CREATE TABLE
-
Use
COPY
to load values, saving rejected data to a table,loader_rejects
:=> COPY loader FROM STDIN REJECTED DATA AS TABLE loader_rejects; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> 1 >> 2 >> 3 >> a >> \.
-
Query the
loader
table after loading data:=> SELECT * FROM loader; x --- 1 2 3 (3 rows)
-
Query the
loader_rejects
table to see its column rows:=> SELECT * FROM loader_rejects; -[ RECORD 1 ]-------------+-------------------------------------------- node_name | v_vmart_node0001 file_name | STDIN session_id | v_vmart_node0001.example.-24016:0x3439 transaction_id | 45035996274080923 statement_id | 1 batch_number | 0 row_number | 4 rejected_data | a rejected_data_orig_length | 1 rejected_reason | Invalid integer format 'a' for column 1 (x)
The rejected data table has the following columns:
Column | Data Type | Description |
---|---|---|
node_name |
VARCHAR |
The name of the Vertica node on which the input load file was located. |
file_name |
VARCHAR |
The name of the file being loaded, which applies if you loaded a file (as opposed to using STDIN). |
session_id |
VARCHAR |
The session ID number in which the COPY statement occurred. |
transaction_id |
INTEGER |
Identifier for the transaction within the session, if any; otherwise NULL. |
|
INTEGER |
The unique identification number of the statement within the transaction that included the rejected data. TipYou can use the |
batch_number |
INTEGER |
INTERNAL USE. Represents which batch (chunk) the data comes from. |
row_number |
INTEGER |
The rejected row number from the input file, or -1 if it could not be determined. The value can be -1 when using cooperative parse. Each parse operation resets the row number, so in an apportioned load, there can be several entries with the same row number but different rows. |
rejected_data |
LONG VARCHAR |
The data that was not loaded. |
rejected_data_orig_length |
INTEGER |
The length of the rejected data. |
rejected_reason |
VARCHAR |
The error that caused the rejected row. This column returns the same message that exists in a load exceptions file when you do not save to a table. |
Exporting the rejected records table
You can export the contents of the column rejected_data
to a file to capture only the data rejected during the first COPY statement. Then, correct the data in the file, save it, and load the updated file.
To export rejected records:
-
Create a sample table:
=> CREATE TABLE t (i int); CREATE TABLE
-
Copy data directly into the table, using a table to store rejected data:
=> COPY t FROM STDIN REJECTED DATA AS TABLE t_rejects; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> 1 >> 2 >> 3 >> 4 >> a >> b >> c >> \.
-
Show only tuples and set the output format:
=> \t Showing only tuples. => \a Output format is unaligned.
-
Output to a file:
=> \o rejected.txt => select rejected_data from t_rejects; => \o
-
Use the
cat
command on the saved file:=> \! cat rejected.txt a b c
After a file exists, you can fix load errors and use the corrected file as load input to the COPY
statement.