Loading CSV data
Use the fcsvparser
to load data in CSV format (comma-separated values). Because no formal CSV standard exists, Vertica supports the RFC 4180 standard as the default behavior for fcsvparser
. Other parser parameters simplify various combinations of CSV options into columnar or flex tables. Using fcsvparser
parses the following CSV data formats:
-
RFC 4180: The RFC4180 CSV format parser for Vertica flex tables. The parameters for this format are fixed and cannot be changed.
-
Traditional: The traditional CSV parser lets you specify the parameter values such as delimiter or record terminator. For a detailed list of parameters, see FCSVPARSER.
Using default parser settings
These fixed parameter settings apply to the RCF4180 format.
You may use the same value for enclosed_by
and escape
. Other values must be unique.
Parameter | Data Type | Fixed Value (RCF4180) | Default Value (Traditional) |
---|---|---|---|
delimiter |
CHAR |
, |
, |
enclosed_by |
CHAR |
" |
" |
escape |
CHAR |
" |
\ |
record_terminator |
CHAR |
\n or \r\n |
\n or \r\n |
Use the type
parameter to indicate either an RFC 4180-compliant file or a traditional-compliant file. You can specify type
as RCF4180
. However, you must first verify that the data is compatible with the preceding fixed values for parameters of the RFC4180 format. The default value of the type
parameter is RFC4180.
Loading CSV data (RFC4180)
This example shows how you can use fcsvparser
to load a flex table, build a view, and then query that view.
-
Create a flex table for CSV data:
-
Use
fcsvparser
to load the data from STDIN. Specify that no header exists, and enter some data as shown: -
Run the
compute_flextable_keys_and_build_view
function, and query therfc_view
. Notice that the defaultenclosed_by
character permits an escape character ("
) within a field ("20""5"
). Thus, the resulting value was parsed correctly. Since no header existed in the input data, the function addeducol
n
for each column:
Loading CSV data (traditional)
Follow these steps to use fcsvparser
to load data in traditional CSV data format using fcsvparser
.
In this example, the CSV file uses $
as a delimiter
and #
as a record_terminator
. The sample CSV file to load has the following contents:
-
Create a flex table:
-
Load the data in flex table using
fscvparser
with parameterstype='traditional'
,delimiter='$'
andrecord_terminator='#'
: -
View the data loaded in the flex table:
Rejecting duplicate values
You can reject duplicate values using the reject_on_duplicate=true
option with the fcsvparser
. The load continues after it rejects a duplicate value. The next example shows how to use this parameter and then displays the specified exception and rejected data files. Saving rejected data to a table, rather than a file, includes both the data and its exception.
Rejecting data on materialized column type errors
The fcsvparser
parser has a Boolean parameter, reject_on_materialized_type_error
. Setting this parameter to true
causes rows to be rejected if both the following conditions exist in the input data:
-
Includes keys matching an existing materialized column
-
Has a key value that cannot be coerced into the materialized column's data type
The following examples illustrate setting this parameter.
-
Create a table,
reject_true_false
, with two real columns: -
Load CSV data into the table (from STDIN), using the
fcsvparser
withreject_on_materialized_type_error=false
. Whilefalse
is the default value, you can specify it explicitly, as shown. Additionally, set the parameterheader=true
to specify the columns for input values: -
Invoke
maptostring
to display the table values after loading data: -
Truncate the table to empty the data stored in the table:
-
Reload the same data again, but this time, set
reject_on_materialized_type_error=true
: -
Call
maptostring
to display the table contents. Only two rows are currently loaded, whereas the previous results had four rows. The rows having input values with incorrect data type have been rejected:
Note
The parserfcsvparser
uses null
values if there is a type mismatch and you set the reject_on_materialized_type_error
parameter to false
.
Rejecting or omitting empty rows
Valid CSV files can include empty key and value pairs. Such rows are invalid for SQL. You can control the behavior for empty rows by either rejecting or omitting them, using two boolean FCSVPARSER
parameters:
reject_on_empty_key
omit_empty_keys
The following example illustrates how to set these parameters:
-
Create a flex table:
-
Load CSV data into the table (from STDIN), using the
fcsvparser
withreject_on_empty_key=false
. Whilefalse
is the default value, you can specify it explicitly, as shown. Additionally, set the parameterheader=true
to specify the columns for input values: -
Invoke
maptostring
to display the table values after loading data: -
Truncate the table to empty the data stored in the table:
-
Reload the same data again, but this time, set
reject_on_empty_key=true
: -
Call
maptostring
to display the table contents. No rows are loaded because one of the keys is empty: -
Truncate the table to empty the data stored in the table:
-
Reload the same data again, but this time, set
omit_empty_keys=true
: -
Call
maptostring
to display the table contents. One row is now loaded, and the rows with empty keys are omitted:
Note
If no header names exist,fcsvparser
uses a default header of ucol
n
, where n is the column offset number. If a table header name and key name match, the parser loads the column with values associated with the matching key name.
Using the NULL parameter
Use the COPY NULL
metadata parameter with fcsvparser
to load NULL values into a flex table.
The next example uses this parameter:
-
Create a flex table:
-
Load CSV data in the flex table using STDIN and the NULL parameter:
-
Use the
compute_flextable_keys_and_build_view
function to compute keys and build the flex view: -
View the flex view and replace the NULL values:
Handling column headings
The fcsvparser lets you specify your own column headings with the HEADER_NAMES=
parameter. This parameter entirely replaces column names in the CSV source header row.
For example, to use these six column headings for a CSV file you are loading, use the fcsvparser parameter as follows:
HEADER_NAMES='FIRST, LAST, SOCIAL_SECURITY, TOWN, STATE, COUNTRY'
Supplying fewer header names than existing data columns causes fcsvparser to use default names after those you supply. Default header names consist of ucol
n
, where n is the column offset number, starting at 0
for the first column. For example, if you supply four header names for a 6-column table, fcsvparser supplies the default names ucol4
and ucol5
, following the fourth header name you provide.
If you supply more headings than the existing table columns, any additional headings remain unused.