Saving rejected data to a table
Use the REJECTED DATA AS TABLE
option to collect rejected data, including exceptions, in a table that can be queried. REJECTED DATA AS TABLE
is incompatable with rejection or exception paths.
When you use REJECTED DATA AS TABLE
, Vertica creates a new table if one does not exist, or appends to an existing table from a previous load. Vertica always creates the table if it does not already exist, even if the load produces no rejections. You cannot create the table directly.
Load rejection tables do not support DML and DDL operations and are not K-safe. You can query and drop them.
To make the data in a rejected table K-safe, you can copy it to another table using CREATE TABLE AS
:
=> CREATE TABLE save_rejections AS SELECT * FROM rejected_data_table;
Alternatively, you can copy data to another table using INSERT SELECT
.
Location of rejected data table records
When you save rejected records to a table, the data for the table is saved in a database data subdirectory, RejectionTableData
. For example, a VMart database uses a path like /home/dbadmin/VMart/v_vmart_node0001_data/RejectionTableData
.
If you need to make rejected data available from outside of Vertica, you can export the rejections table to a shared file system.
Vertica suggests that you periodically drop any rejected data tables that you no longer require.
Using COPY NO COMMIT
If the COPY
statement includes the NO COMMIT
option, and the rejections table does not already exist, Vertica saves the rejected data table as a LOCAL TEMP table.
Rejected-data tables are useful for Extract-Load-Transform workflows, where you will likely use temporary tables more frequently. The rejected-data tables let you quickly load data and identify which records failed to load. If you load data into a temporary table that you created using the ON COMMIT DELETE
clause, the COPY
operation will not commit.
Querying a rejected data table
After a load, you can query the corresponding rejections table for information about failures. For example, consider the following table definition and COPY
statement:
=> CREATE TABLE loader(x INT)
CREATE TABLE
=> COPY loader FROM STDIN REJECTED DATA AS TABLE loader_rejects;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1
>> 2
>> 3
>> a
>> \.
Three rows were successfully loaded:
=> SELECT * FROM loader;
x
---
1
2
3
(3 rows)
The loader_rejects
table records the failed row. The data that was rejected is in the rejected_data
column and the reason for the rejection is in rejected_reason
:
=> SELECT * FROM loader_rejects;
-[ RECORD 1 ]-------------+--------------------------------------------
node_name | v_vmart_node0001
file_name | STDIN
session_id | v_vmart_node0001.example.-24016:0x3439
transaction_id | 45035996274080923
statement_id | 1
batch_number | 0
row_number | 4
rejected_data | a
rejected_data_orig_length | 1
rejected_reason | Invalid integer format 'a' for column 1 (x)
The rejected data table has the following columns:
Column | Data Type | Description |
---|---|---|
node_name |
VARCHAR |
The name of the Vertica node on which the input load file was located. |
file_name |
VARCHAR |
The name of the file being loaded, which applies if you loaded a file (as opposed to using STDIN). |
session_id |
VARCHAR |
The session ID number in which the COPY statement occurred. |
transaction_id |
INTEGER |
Identifier for the transaction within the session, if any; otherwise NULL. |
statement_id |
INTEGER |
The unique identification number of the statement within the transaction that included the rejected data. TipYou can use the |
batch_number |
INTEGER |
INTERNAL USE. Represents which batch (chunk) the data comes from. |
row_number |
INTEGER |
The rejected row number from the input file, or -1 if it could not be determined. The value can be -1 when using cooperative parse. Each parse operation resets the row number, so in an apportioned load, there can be several entries with the same row number but different rows. |
rejected_data |
LONG VARCHAR |
The data that was not loaded. |
rejected_data_orig_length |
INTEGER |
The length of the rejected data. |
rejected_reason |
VARCHAR |
The error that caused the rejected row. This column returns the same message that exists in a load exceptions file when you do not save to a table. |
Correcting rejected data
If a load rejects data, you can save the rejected data in a file, correct the errors, and reload the updated file. First, direct vsql
to show only tuples and set the output format:
=> \t
Showing only tuples.
=> \a
Output format is unaligned.
Then query the rejections table and direct output to a file:
=> \o rejected.txt
=> SELECT rejected_data FROM loader_rejects;
=> \o
The saved file has the following output:
=> \! cat rejected.txt
a
You can edit the file to correct the errors and then use COPY
to load the updated file.