Saving rejected data to a table

Use the REJECTED DATA parameter with the AS TABLE clause to specify a table in which to save rejected data.

Use the REJECTED DATA parameter with the AS TABLE clause to specify a table in which to save rejected data. Saving rejected data to a file is mutually exclusive with using the AS TABLE clause.

When you use the AS TABLE clause, Vertica creates a new table if one does not exist, or appends to an existing table. If no parsing rejections occur during a load, the table exists but is empty. The next time you load data, Vertica inserts any rejected rows to the existing table.

The load rejection tables are a special type of table with the following capabilities and limitations:

  • Support SELECT statements

  • Can use DROP TABLE

  • Cannot be created outside of a COPY statement

  • Do not support DML and DDL activities

  • Are not K-safe

To make the data in a rejected table K-safe, you can do one of the following:

  • Write a CREATE TABLE..AS statement, such as this example:

    => CREATE TABLE new_table AS SELECT * FROM rejected_table;
    
  • Create a table to store rejected records, and run INSERT..SELECT operations into the new table

Using COPY NO COMMIT

If the COPY statement includes options NO COMMIT and REJECTED DATA AS TABLE, and the reject-table does not already exist, Vertica Analytic Database saves the rejected data table as a LOCAL TEMP table and returns a message that a LOCAL TEMP table is being created.

Rejected-data tables are useful for Extract-Load-Transform workflows, where you will likely use temporary tables more frequently. The rejected-data tables let you quickly load data and identify which records failed to load. If you load data into a temporary table that you created using the ON COMMIT DELETE clause, the COPY operation will not commit.

Location of rejected data table records

When you save rejected records to a table, using the REJECTED DATA AS TABLE table_name option, the data for the table is saved in a database data subdirectory, RejectionTableData. For example, for a VMart database, table data files reside here:

/home/dbadmin/VMart/v_vmart_node0001_data/RejectionTableData

Rejected data tables include both rejected data and the reason for the rejection (exceptions), along with other data columns, described next. Vertica suggests that you periodically drop any rejected data tables that you no longer require.

Querying a rejected data table

When you specify a rejected data table when loading data with COPY, you can query that table for information about rejected data after the load operation is complete. For example:

  1. Create the loader table:

    => CREATE TABLE loader(a INT)
    CREATE TABLE
    
  2. Use COPY to load values, saving rejected data to a table, loader_rejects:

    => COPY loader FROM STDIN REJECTED DATA AS TABLE loader_rejects;
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> 1
    >> 2
    >> 3
    >> a
    >> \.
    
  3. Query the loader table after loading data:

    => SELECT * FROM loader;
     x
    ---
     1
     2
     3
    (3 rows)
    
  4. Query the loader_rejects table to see its column rows:

    => SELECT * FROM loader_rejects;
    -[ RECORD 1 ]-------------+--------------------------------------------
    node_name                 | v_vmart_node0001
    file_name                 | STDIN
    session_id                | v_vmart_node0001.example.-24016:0x3439
    transaction_id            | 45035996274080923
    statement_id              | 1
    batch_number              | 0
    row_number                | 4
    rejected_data             | a
    rejected_data_orig_length | 1
    rejected_reason           | Invalid integer format 'a' for column 1 (x)
    

The rejected data table has the following columns:

Column Data Type Description
node_name VARCHAR The name of the Vertica node on which the input load file was located.
file_name VARCHAR The name of the file being loaded, which applies if you loaded a file (as opposed to using STDIN).
session_id VARCHAR The session ID number in which the COPY statement occurred.
transaction_id INTEGER Identifier for the transaction within the session, if any; otherwise NULL.
statement_id INTEGER

The unique identification number of the statement within the transaction that included the rejected data.

batch_number INTEGER INTERNAL USE. Represents which batch (chunk) the data comes from.
row_number INTEGER

The rejected row number from the input file, or -1 if it could not be determined. The value can be -1 when using cooperative parse.

Each parse operation resets the row number, so in an apportioned load, there can be several entries with the same row number but different rows.

rejected_data LONG VARCHAR The data that was not loaded.
rejected_data_orig_length INTEGER The length of the rejected data.
rejected_reason VARCHAR The error that caused the rejected row. This column returns the same message that exists in a load exceptions file when you do not save to a table.

Exporting the rejected records table

You can export the contents of the column rejected_data to a file to capture only the data rejected during the first COPY statement. Then, correct the data in the file, save it, and load the updated file.

To export rejected records:

  1. Create a sample table:

    => CREATE TABLE t (i int);
    CREATE TABLE
    
  2. Copy data directly into the table, using a table to store rejected data:

    => COPY t FROM STDIN REJECTED DATA AS TABLE t_rejects;
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> 1
    >> 2
    >> 3
    >> 4
    >> a
    >> b
    >> c
    >> \.
    
  3. Show only tuples and set the output format:

    => \t
    Showing only tuples.
    => \a
    Output format is unaligned.
    
  4. Output to a file:

    => \o rejected.txt
    => select rejected_data from t_rejects;
    => \o
    
  5. Use the catcommand on the saved file:

    
    => \! cat rejected.txt
    a
    b
    c
    

After a file exists, you can fix load errors and use the corrected file as load input to the COPY statement.