Saving rejected data to a table

Use the REJECTED DATA parameter with the AS TABLE clause to specify a table in which to save rejected data.

Use the REJECTED DATA AS TABLE option to collect rejected data, including exceptions, in a table that can be queried. REJECTED DATA AS TABLE is incompatable with rejection or exception paths.

When you use REJECTED DATA AS TABLE, Vertica creates a new table if one does not exist, or appends to an existing table from a previous load. Vertica always creates the table if it does not already exist, even if the load produces no rejections. You cannot create the table directly.

Load rejection tables do not support DML and DDL operations and are not K-safe. You can query and drop them.

To make the data in a rejected table K-safe, you can copy it to another table using CREATE TABLE AS:

=> CREATE TABLE save_rejections AS SELECT * FROM rejected_data_table;

Alternatively, you can copy data to another table using INSERT SELECT.

Location of rejected data table records

When you save rejected records to a table, the data for the table is saved in a database data subdirectory, RejectionTableData. For example, a VMart database uses a path like /home/dbadmin/VMart/v_vmart_node0001_data/RejectionTableData.

If you need to make rejected data available from outside of Vertica, you can export the rejections table to a shared file system.

Vertica suggests that you periodically drop any rejected data tables that you no longer require.

Using COPY NO COMMIT

If the COPY statement includes the NO COMMIT option, and the rejections table does not already exist, Vertica saves the rejected data table as a LOCAL TEMP table.

Rejected-data tables are useful for Extract-Load-Transform workflows, where you will likely use temporary tables more frequently. The rejected-data tables let you quickly load data and identify which records failed to load. If you load data into a temporary table that you created using the ON COMMIT DELETE clause, the COPY operation will not commit.

Querying a rejected data table

After a load, you can query the corresponding rejections table for information about failures. For example, consider the following table definition and COPY statement:

=> CREATE TABLE loader(x INT)
CREATE TABLE
    
=> COPY loader FROM STDIN REJECTED DATA AS TABLE loader_rejects;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1
>> 2
>> 3
>> a
>> \.

Three rows were successfully loaded:

=> SELECT * FROM loader;
 x
---
 1
 2
 3
(3 rows)

The loader_rejects table records the failed row. The data that was rejected is in the rejected_data column and the reason for the rejection is in rejected_reason:

=> SELECT * FROM loader_rejects;
-[ RECORD 1 ]-------------+--------------------------------------------
node_name                 | v_vmart_node0001
file_name                 | STDIN
session_id                | v_vmart_node0001.example.-24016:0x3439
transaction_id            | 45035996274080923
statement_id              | 1
batch_number              | 0
row_number                | 4
rejected_data             | a
rejected_data_orig_length | 1
rejected_reason           | Invalid integer format 'a' for column 1 (x)

The rejected data table has the following columns:

Column Data Type Description
node_name VARCHAR The name of the Vertica node on which the input load file was located.
file_name VARCHAR The name of the file being loaded, which applies if you loaded a file (as opposed to using STDIN).
session_id VARCHAR The session ID number in which the COPY statement occurred.
transaction_id INTEGER Identifier for the transaction within the session, if any; otherwise NULL.
statement_id INTEGER

The unique identification number of the statement within the transaction that included the rejected data.

batch_number INTEGER INTERNAL USE. Represents which batch (chunk) the data comes from.
row_number INTEGER

The rejected row number from the input file, or -1 if it could not be determined. The value can be -1 when using cooperative parse.

Each parse operation resets the row number, so in an apportioned load, there can be several entries with the same row number but different rows.

rejected_data LONG VARCHAR The data that was not loaded.
rejected_data_orig_length INTEGER The length of the rejected data.
rejected_reason VARCHAR The error that caused the rejected row. This column returns the same message that exists in a load exceptions file when you do not save to a table.

Correcting rejected data

If a load rejects data, you can save the rejected data in a file, correct the errors, and reload the updated file. First, direct vsql to show only tuples and set the output format:

=> \t
Showing only tuples.
=> \a
Output format is unaligned.

Then query the rejections table and direct output to a file:

=> \o rejected.txt
=> SELECT rejected_data FROM loader_rejects;
=> \o

The saved file has the following output:

=> \! cat rejected.txt
a

You can edit the file to correct the errors and then use COPY to load the updated file.