Loading CSV data

Use the fcsvparser to load data in CSV format (comma-separated values).

Use the fcsvparser to load data in CSV format (comma-separated values). Because no formal CSV standard exists, Vertica supports the RFC 4180 standard as the default behavior for fcsvparser. Other parser parameters simplify various combinations of CSV options into columnar or flex tables. Using fcsvparser parses the following CSV data formats:

  • RFC 4180: The RFC4180 CSV format parser for Vertica flex tables. The parameters for this format are fixed and cannot be changed.

  • Traditional: The traditional CSV parser lets you specify the parameter values such as delimiter or record terminator. For a detailed list of parameters, see FCSVPARSER.

Using default parser settings

These fixed parameter settings apply to the RCF4180 format.

You may use the same value for enclosed_by and escape. Other values must be unique.

Parameter Data Type Fixed Value (RCF4180) Default Value (Traditional)
delimiter CHAR , ,
enclosed_by CHAR " "
escape CHAR " \
record_terminator CHAR \n or \r\n \n or \r\n

Use the type parameter to indicate either an RFC 4180-compliant file or a traditional-compliant file. You can specify type as RCF4180. However, you must first verify that the data is compatible with the preceding fixed values for parameters of the RFC4180 format. The default value of the type parameter is RFC4180.

Loading CSV data (RFC4180)

This example shows how you can use fcsvparser to load a flex table, build a view, and then query that view.

  1. Create a flex table for CSV data:

    => CREATE FLEX TABLE rfc();
    CREATE TABLE
    
  2. Use fcsvparser to load the data from STDIN. Specify that no header exists, and enter some data as shown:

    => COPY rfc FROM stdin PARSER fcsvparser(header='false');
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> 10,10,20
    >> 10,"10",30
    >> 10,"20""5",90
    >> \.
    
  3. Run the compute_flextable_keys_and_build_view function, and query the rfc_view. Notice that the default enclosed_by character permits an escape character (") within a field ("20""5"). Thus, the resulting value was parsed correctly. Since no header existed in the input data, the function added ucoln for each column:

    => SELECT compute_flextable_keys_and_build_view('rfc');
                               compute_flextable_keys_and_build_view
    --------------------------------------------------------------------------------------------
     Please see public.rfc_keys for updated keys
    The view public.rfc_view is ready for querying
    (1 row)
    
    => SELECT * FROM rfc_view;
     ucol0 | ucol1 | ucol2
    -------+-------+-------
     10    | 10    | 20
     10    | 10    | 30
     10    | 20"5  | 90
    (3 rows)
    

Loading CSV data (traditional)

Follow these steps to use fcsvparser to load data in traditional CSV data format using fcsvparser.

In this example, the CSV file uses $ as a delimiter and # as a record_terminator. The sample CSV file to load has the following contents:

$ more /home/dbadmin/flex/flexData1.csv
sno$name$age$gender#
1$John$14$male#
2$Mary$23$female#
3$Mark$35$male#
  1. Create a flex table:

    => CREATE FLEX TABLE csv_basic();
    CREATE TABLE
    
  2. Load the data in flex table using fscvparser with parameters type='traditional', delimiter='$' and record_terminator='#' :

    => COPY csv_basic FROM '/home/dbadmin/flex/flexData2.csv' PARSER fcsvparser(type='traditional',
    delimiter='$', record_terminator='#');
    Rows Loaded
    -------------
    3
    (1 row)
    
  3. View the data loaded in the flex table:

    => SELECT maptostring(__raw__) FROM csv_basic;
    maptostring
    ----------------------------------------------------------------------------------
    {
    "age" : "14",
    "gender" : "male",
    "name" : "John",
    "sno" : "1"
    }
    {
    "age" : "23",
    "gender" : "female",
    "name" : "Mary",
    "sno" : "2"
    }
    {
    "age" : "35",
    "gender" : "male",
    "name" : "Mark",
    "sno" : "3"
    }
    (3 rows)
    

Rejecting duplicate values

You can reject duplicate values using the reject_on_duplicate=true option with the fcsvparser. The load continues after it rejects a duplicate value. The next example shows how to use this parameter and then displays the specified exception and rejected data files. Saving rejected data to a table, rather than a file, includes both the data and its exception.

=> CREATE FLEX TABLE csv_basic();
CREATE TABLE
=> COPY csv_basic FROM stdin PARSER fcsvparser(reject_on_duplicate=true)
exceptions '/home/dbadmin/load_errors/except.out' rejected data '/home/dbadmin/load_errors/reject.out';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> A|A
>> 1|2
>> \.
=> \! cat /home/dbadmin/load_errors/reject.out
A|A
=> \! cat /home/dbadmin/load_errors/except.out
COPY: Input record 1 has been rejected (Processed a header row with duplicate keys with
reject_on_duplicate specified; rejecting.).  Please see /home/dbadmin/load_errors/reject.out,
record 1 for the rejected record.
COPY: Loaded 0 rows, rejected 1 rows.

Rejecting data on materialized column type errors

The fcsvparser parser has a Boolean parameter, reject_on_materialized_type_error. Setting this parameter to true causes rows to be rejected if both the following conditions exist in the input data:

  • Includes keys matching an existing materialized column

  • Has a key value that cannot be coerced into the materialized column's data type

The following examples illustrate setting this parameter.

  1. Create a table, reject_true_false, with two real columns:

    => CREATE FLEX TABLE reject_true_false(one int, two int);
    CREATE TABLE
    
  2. Load CSV data into the table (from STDIN), using the fcsvparser with reject_on_materialized_type_error=false. While false is the default value, you can specify it explicitly, as shown. Additionally, set the parameter header=true to specify the columns for input values:

    => COPY reject_true_false FROM stdin PARSER fcsvparser(reject_on_materialized_type_error=false,header=true);
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> one,two
    >> 1,2
    >> "3","four"
    >> "five",6
    >> 7,8
    >> \.
    
  3. Invoke maptostring to display the table values after loading data:

    => SELECT maptostring(__raw__), one, two FROM reject_true_false;
    maptostring               | one | two
    ----------------------------------------+-----+-----
    {
    "one" : "1",
    "two" : "2"
    }
    |   1 |   2
    {
    "one" : "3",
    "two" : "four"
    }
    |   3 |
    {
    "one" : "five",
    "two" : "6"
    }
    |     |   6
    {
    "one" : "7",
    "two" : "8"
    }
    |   7 |   8
    (4 rows)
    
  4. Truncate the table to empty the data stored in the table:

    => TRUNCATE TABLE reject_true_false;
    TRUNCATE TABLE
    
  5. Reload the same data again, but this time, set reject_on_materialized_type_error=true:

    => COPY reject_true_false FROM stdin PARSER fcsvparser(reject_on_materialized_type_error=true,header=true);
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> one,two
    >> 1,2
    >> "3","four"
    >> "five",6
    >> 7,8
    >> \.
    
  6. Call maptostring to display the table contents. Only two rows are currently loaded, whereas the previous results had four rows. The rows having input values with incorrect data type have been rejected:

    => SELECT maptostring(__raw__), one, two FROM reject_true_false;
    maptostring             | one | two
    -------------------------------------+-----+-----
    {
    "one" : "1",
    "two" : "2"
    }
    |   1 |   2
    {
    "one" : "7",
    "two" : "8"
    }
    |   7 |   8
    (2 rows)
    

Rejecting or omitting empty rows

Valid CSV files can include empty key and value pairs. Such rows are invalid for SQL. You can control the behavior for empty rows by either rejecting or omitting them, using two boolean FCSVPARSER parameters:

  • reject_on_empty_key
  • omit_empty_keys

The following example illustrates how to set these parameters:

  1. Create a flex table:

     => CREATE FLEX TABLE csv_basic();
    CREATE TABLE
    
  2. Load CSV data into the table (from STDIN), using the fcsvparser with reject_on_empty_key=false. While false is the default value, you can specify it explicitly, as shown. Additionally, set the parameter header=true to specify the columns for input values:

    => COPY csv_basic FROM stdin PARSER fcsvparser(reject_on_empty_key=false,header=true);
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> ,num
    >> 1,2
    >> \.
    
  3. Invoke maptostring to display the table values after loading data:

    =>SELECT maptostring(__raw__) FROM csv_basic;
    maptostring
    ----------------------------------
    {
    "" : "1",
    "num" : "2"
    }
    
    (1 row)
    
  4. Truncate the table to empty the data stored in the table:

    => TRUNCATE TABLE csv_basic;
    TRUNCATE TABLE
    
  5. Reload the same data again, but this time, set reject_on_empty_key=true:

    => COPY csv_basic FROM stdin PARSER fcsvparser(reject_on_empty_key=true,header=true);
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> ,num
    >> 1,2
    >> \.
    
  6. Call maptostring to display the table contents. No rows are loaded because one of the keys is empty:

    => SELECT maptostring(__raw__) FROM csv_basic;
    maptostring
    -------------
    (0 rows)
    
  7. Truncate the table to empty the data stored in the table:

    => TRUNCATE TABLE csv_basic;
    TRUNCATE TABLE
    
  8. Reload the same data again, but this time, set omit_empty_keys=true:

    => COPY csv_basic FROM stdin PARSER fcsvparser(omit_empty_keys=true,header=true);
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> ,num
    >> 1,2
    >> \.
    
  9. Call maptostring to display the table contents. One row is now loaded, and the rows with empty keys are omitted:

    => SELECT maptostring(__raw__) FROM csv_basic;
    maptostring
    ---------------------
    {
    "num" : "2"
    }
    (1 row)
    

Using the NULL parameter

Use the COPY NULL metadata parameter with fcsvparser to load NULL values into a flex table.

The next example uses this parameter:

  1. Create a flex table:

    => CREATE FLEX TABLE fcsv(c1 int);
    CREATE TABLE
    
  2. Load CSV data in the flex table using STDIN and the NULL parameter:

    => COPY fcsv FROM STDIN PARSER fcsvparser() NULL 'NULL' ;
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> a,b,c1
    >> 10,20,NULL
    >> 20,30,50
    >> 20,30,40
    >> \.
    
  3. Use the compute_flextable_keys_and_build_view function to compute keys and build the flex view:

    => SELECT compute_flextable_keys_and_build_view('fcsv');
    compute_flextable_keys_and_build_view
    --------------------------------------------------------------------------------
    Please see public.fcsv_keys for updated keys
    The view public.fcsv_view is ready for querying
    (1 row)
    
  4. View the flex view and replace the NULL values:

    => SELECT * FROM public.fcsv_view;
    a  | b  | c1
    ----+----+----
    20 | 30 | 50
    10 | 20 |
    20 | 30 | 40
    (3 rows)
    => SELECT a,b, ISNULL(c1,-1) from public.fcsv_view;
    a  | b  | ISNULL
    ----+----+--------
    20 | 30 |     50
    10 | 20 |     -1
    20 | 30 |     40
    (3 rows)
    

Handling column headings

The fcsvparser lets you specify your own column headings with the HEADER_NAMES= parameter. This parameter entirely replaces column names in the CSV source header row.

For example, to use these six column headings for a CSV file you are loading, use the fcsvparser parameter as follows:

HEADER_NAMES='FIRST, LAST, SOCIAL_SECURITY, TOWN, STATE, COUNTRY'

Supplying fewer header names than existing data columns causes fcsvparser to use default names after those you supply. Default header names consist of ucoln, where n is the column offset number, starting at 0 for the first column. For example, if you supply four header names for a 6-column table, fcsvparser supplies the default names ucol4 and ucol5, following the fourth header name you provide.

If you supply more headings than the existing table columns, any additional headings remain unused.