Loading delimited data

You can load flex tables with one of two delimited parsers, fdelimitedparser or fdelimitedpairparser.

You can load flex tables with one of two delimited parsers, fdelimitedparser or fdelimitedpairparser.

  • Use fdelimitedpairparser when the data specifies column names with the data in each row.

  • Use fdelimitedparser when the data does not specify column names or has a header row for column names.

This section describes using some options that fdelimitedpairparser and fdelimitedparser support.

Rejecting duplicate values

You can reject duplicate values using the reject_on_duplicate=true option with the fdelimitedparser. The load continues after it rejects a duplicate value. The next example shows how to use this parameter and then displays the specified exception and rejected data files. Saving rejected data to a table, rather than a file, includes both the data and its exception.

=> CREATE FLEX TABLE delim_dupes();
CREATE TABLE
=> COPY delim_dupes FROM stdin PARSER fdelimitedparser(reject_on_duplicate=true)
exceptions '/home/dbadmin/load_errors/except.out' rejected data '/home/dbadmin/load_errors/reject.out';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> A|A
>> 1|2
>> \.
=> \! cat /home/dbadmin/load_errors/reject.out
A|A
=> \! cat /home/dbadmin/load_errors/except.out
COPY: Input record 1 has been rejected (Processed a header row with duplicate keys with
reject_on_duplicate specified; rejecting.).  Please see /home/dbadmin/load_errors/reject.out,
record 1 for the rejected record.
COPY: Loaded 0 rows, rejected 1 rows.

Rejecting materialized column type errors

Both the fjsonparser and fdelimitedparser parsers have a boolean parameter, reject_on_materialized_type_error. Setting this parameter to true causes rows to be rejected if both the following conditions exist in the input data:

  • Includes keys matching an existing materialized column

  • Has a value that cannot be coerced into the materialized column's data type

Suppose the flex table has a materialized column, OwnerPercent, declared as a FLOAT. Trying to load a row with an OwnerPercent key that has a VARCHAR value causes fdelimitedparser to reject the data row.

The following examples illustrate setting this parameter.

  1. Create a table, reject_true_false, with two real columns:

    => CREATE FLEX TABLE reject_true_false(one VARCHAR, two INT);
    CREATE TABLE
    
  2. Load JSON data into the table (from STDIN), using the fjsonparser with reject_on_materialized_type_error=false. While false is the default value, the following example specifies it explicitly for illustration:

    => COPY reject_true_false FROM stdin PARSER fjsonparser(reject_on_materialized_type_error=false);
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> {"one": 1, "two": 2}
    >> {"one": "one", "two": "two"}
    >> {"one": "one", "two": 2}
    >> \.
    
  3. Invoke maptostring to display the table values after loading data:

    =>  SELECT maptostring(__raw__), one, two FROM reject_true_false;
                   maptostring        | one | two
    ----------------------------------+-----+-----
     {
       "one" : "one",
       "two" : "2"
    }
       | one |   2
    {
       "one" : "1",
       "two" : "2"
    }
       | 1   |   2
     {
       "one" : "one",
       "two" : "two"
    }
       | one |
    (3 rows)
    
  4. Truncate the table:

    => TRUNCATE TABLE reject_true_false;
    
  5. Reload the same data again, but this time, set reject_on_materialized_type_error=true:

    => COPY reject_true_false FROM stdin PARSER fjsonparser(reject_on_materialized_type_error=true);
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> {"one": 1, "two": 2}
    >> {"one": "one", "two": "two"}
    >> {"one": "one", "two": 2}
    >> \.
    
  6. Call maptostring to display the table contents. Only two rows were loaded, whereas the previous results had three rows:

    => SELECT maptostring(__raw__), one, two FROM reject_true_false;
                  maptostring              | one | two
    ---------------------------------------+-----+-----
     {
       "one" : "1",
       "two" : "2"
    }
       | 1   |   2
     {
       "one" : "one",
       "two" : "2"
    }
     | one |   2
    (2 rows)