FCSVPARSER

Parses CSV format (comma-separated values) data.

Parses CSV format (comma-separated values) data. Use this parser to load CSV data into columnar, flex, and hybrid tables. All data must be encoded in Unicode UTF-8 format. The fcsvparser parser supports the RFC 4180 standard for CSV data, and other options, to accommodate variations in CSV file format definitions. Invalid records are rejected. For more information about data formats, see Handling Non-UTF-8 input.

This parser is for use in Flex tables only. All flex parsers store the data as a single VMap in the LONG VARBINAR_raw__ column. If a data row is too large to fit in the column, it is rejected. Vertica supports null values for loading data with NULL-specified columns.

Syntax

FCSVPARSER ( [parameter='value'[,...]] )

Parameters

type
The default parameter values for the parser, one of the following strings:
  • rfc4180

  • traditional

You do not have to use the type parameter when loading data that conforms to the RFC 4180 standard (such as MS Excel files). See Loading CSV data for the RFC4180 default parameters, and other options you can specify for traditional CSV files.

Default: RFC4180

delimiter
A single-character value used to separate fields in the CSV data.

Default: , (for rfc4180 and traditional)

escape
A single-character value used as an escape character to interpret the next character in the data literally.

Default:

  • rfc4180: "

  • traditional: \

enclosed_by
A single-character value. Use enclosed_by to include a value that is identical to the delimiter, but should be interpreted literally. For example, if the data delimiter is a comma (,), and you want to use a comma within the data ("my name is jane, and his is jim").

Default: "

record_terminator
A single-character value used to specify the end of a record.

Default:

  • rfc4180: \n

  • traditional: \r\n

header
Boolean, specifies whether to use the first row of data as a header column. When header=true (default), and no header exists, fcsvparser uses a default column heading. The default header consists of ucoln, where n is the column offset number, starting with 0 for the first column. You can specify custom column heading names using the header_names parameter, described next.

If you specify header=false, the fcsvparser parses the first row of input as data, rather than as column headers.

Default: true

header_names
A list of column header names, delimited by the character defined by the parser's delimiter parameter. Use this parameter to specify header names in a CSV file without a header row, or to override the column names present in the CSV source. To override one or more existing column names, specify the header names to use. This parameter overrides any header row in the data.
trim
Boolean, specifies whether to trim white space from header names and key values.

Default: true

omit_empty_keys
Boolean, specifies how the parser handles header keys without values. If true, keys with an empty value in the header row are not loaded.

Default: false

reject_on_duplicate
Boolean, specifies whether to ignore duplicate records (false), or to reject duplicates (true). In either case, the load continues.

Default: false

reject_on_empty_key
Boolean, specifies whether to reject any row containing a key without a value.

Default: false

reject_on_materialized_type_error
Boolean, specifies whether to reject any materialized column value that the parser cannot coerce into a compatible data type. See Loading CSV data.

Default: false

Examples

This example shows how you can use fcsvparser to load a flex table, build a view, and then query that view.

  1. Create a flex table for CSV data:

    => CREATE FLEX TABLE rfc();
    CREATE TABLE
    
  2. Use fcsvparser to load the data from STDIN. Specify that no header exists, and enter some data as shown:

    => COPY rfc FROM stdin PARSER fcsvparser(header='false');
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> 10,10,20
    >> 10,"10",30
    >> 10,"20""5",90
    >> \.
    
  3. Run the compute_flextable_keys_and_build_view function, and query the rfc_view. Notice that the default enclosed_by character permits an escape character (") within a field ("20""5"). Thus, the resulting value was parsed correctly. Since no header existed in the input data, the function added ucoln for each column:

    => SELECT compute_flextable_keys_and_build_view('rfc');
                               compute_flextable_keys_and_build_view
    --------------------------------------------------------------------------------------------
     Please see public.rfc_keys for updated keys
    The view public.rfc_view is ready for querying
    (1 row)
    
    => SELECT * FROM rfc_view;
     ucol0 | ucol1 | ucol2
    -------+-------+-------
     10    | 10    | 20
     10    | 10    | 30
     10    | 20"5  | 90
    (3 rows)
    

    For more information and examples using other parameters of this parser, see Loading CSV data.

See also