Loading CSV data
Use the fcsvparser
to load data in CSV format (comma-separated values). Because no formal CSV standard exists, Vertica supports the RFC 4180 standard as the default behavior for fcsvparser
. Other parser parameters simplify various combinations of CSV options into columnar or flex tables. Using fcsvparser
parses the following CSV data formats:
-
RFC 4180: The RFC4180 CSV format parser for Vertica flex tables. The parameters for this format are fixed and cannot be changed.
-
Traditional: The traditional CSV parser lets you specify the parameter values such as delimiter or record terminator. For a detailed list of parameters, see FCSVPARSER.
Using default parser settings
These fixed parameter settings apply to the RCF4180 format.
You may use the same value for enclosed_by
and escape
. Other values must be unique.
Parameter | Data Type | Fixed Value (RCF4180) | Default Value (Traditional) |
---|---|---|---|
delimiter |
CHAR |
, |
, |
enclosed_by |
CHAR |
" |
" |
escape |
CHAR |
" |
\ |
record_terminator |
CHAR |
\n or \r\n |
\n or \r\n |
Use the type
parameter to indicate either an RFC 4180-compliant file or a traditional-compliant file. You can specify type
as RCF4180
. However, you must first verify that the data is compatible with the preceding fixed values for parameters of the RFC4180 format. The default value of the type
parameter is RFC4180.
Loading CSV data (RFC4180)
This example shows how you can use fcsvparser
to load a flex table, build a view, and then query that view.
-
Create a flex table for CSV data:
=> CREATE FLEX TABLE rfc(); CREATE TABLE
-
Use
fcsvparser
to load the data from STDIN. Specify that no header exists, and enter some data as shown:=> COPY rfc FROM stdin PARSER fcsvparser(header='false'); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> 10,10,20 >> 10,"10",30 >> 10,"20""5",90 >> \.
-
Run the
compute_flextable_keys_and_build_view
function, and query therfc_view
. Notice that the defaultenclosed_by
character permits an escape character ("
) within a field ("20""5"
). Thus, the resulting value was parsed correctly. Since no header existed in the input data, the function addeducol
n
for each column:=> SELECT compute_flextable_keys_and_build_view('rfc'); compute_flextable_keys_and_build_view -------------------------------------------------------------------------------------------- Please see public.rfc_keys for updated keys The view public.rfc_view is ready for querying (1 row) => SELECT * FROM rfc_view; ucol0 | ucol1 | ucol2 -------+-------+------- 10 | 10 | 20 10 | 10 | 30 10 | 20"5 | 90 (3 rows)
Loading CSV data (traditional)
Follow these steps to use fcsvparser
to load data in traditional CSV data format using fcsvparser
.
In this example, the CSV file uses $
as a delimiter
and #
as a record_terminator
. The sample CSV file to load has the following contents:
$ more /home/dbadmin/flex/flexData1.csv
sno$name$age$gender#
1$John$14$male#
2$Mary$23$female#
3$Mark$35$male#
-
Create a flex table:
=> CREATE FLEX TABLE csv_basic(); CREATE TABLE
-
Load the data in flex table using
fscvparser
with parameterstype='traditional'
,delimiter='$'
andrecord_terminator='#'
:=> COPY csv_basic FROM '/home/dbadmin/flex/flexData2.csv' PARSER fcsvparser(type='traditional', delimiter='$', record_terminator='#'); Rows Loaded ------------- 3 (1 row)
-
View the data loaded in the flex table:
=> SELECT maptostring(__raw__) FROM csv_basic; maptostring ---------------------------------------------------------------------------------- { "age" : "14", "gender" : "male", "name" : "John", "sno" : "1" } { "age" : "23", "gender" : "female", "name" : "Mary", "sno" : "2" } { "age" : "35", "gender" : "male", "name" : "Mark", "sno" : "3" } (3 rows)
Rejecting duplicate values
You can reject duplicate values using the reject_on_duplicate=true
option with the fcsvparser
. The load continues after it rejects a duplicate value. The next example shows how to use this parameter and then displays the specified exception and rejected data files. Saving rejected data to a table, rather than a file, includes both the data and its exception.
=> CREATE FLEX TABLE csv_basic();
CREATE TABLE
=> COPY csv_basic FROM stdin PARSER fcsvparser(reject_on_duplicate=true)
exceptions '/home/dbadmin/load_errors/except.out' rejected data '/home/dbadmin/load_errors/reject.out';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> A|A
>> 1|2
>> \.
=> \! cat /home/dbadmin/load_errors/reject.out
A|A
=> \! cat /home/dbadmin/load_errors/except.out
COPY: Input record 1 has been rejected (Processed a header row with duplicate keys with
reject_on_duplicate specified; rejecting.). Please see /home/dbadmin/load_errors/reject.out,
record 1 for the rejected record.
COPY: Loaded 0 rows, rejected 1 rows.
Rejecting data on materialized column type errors
The fcsvparser
parser has a Boolean parameter, reject_on_materialized_type_error
. Setting this parameter to true
causes rows to be rejected if both the following conditions exist in the input data:
-
Includes keys matching an existing materialized column
-
Has a key value that cannot be coerced into the materialized column's data type
The following examples illustrate setting this parameter.
-
Create a table,
reject_true_false
, with two real columns:=> CREATE FLEX TABLE reject_true_false(one int, two int); CREATE TABLE
-
Load CSV data into the table (from STDIN), using the
fcsvparser
withreject_on_materialized_type_error=false
. Whilefalse
is the default value, you can specify it explicitly, as shown. Additionally, set the parameterheader=true
to specify the columns for input values:=> COPY reject_true_false FROM stdin PARSER fcsvparser(reject_on_materialized_type_error=false,header=true); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> one,two >> 1,2 >> "3","four" >> "five",6 >> 7,8 >> \.
-
Invoke
maptostring
to display the table values after loading data:=> SELECT maptostring(__raw__), one, two FROM reject_true_false; maptostring | one | two ----------------------------------------+-----+----- { "one" : "1", "two" : "2" } | 1 | 2 { "one" : "3", "two" : "four" } | 3 | { "one" : "five", "two" : "6" } | | 6 { "one" : "7", "two" : "8" } | 7 | 8 (4 rows)
-
Truncate the table to empty the data stored in the table:
=> TRUNCATE TABLE reject_true_false; TRUNCATE TABLE
-
Reload the same data again, but this time, set
reject_on_materialized_type_error=true
:=> COPY reject_true_false FROM stdin PARSER fcsvparser(reject_on_materialized_type_error=true,header=true); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> one,two >> 1,2 >> "3","four" >> "five",6 >> 7,8 >> \.
-
Call
maptostring
to display the table contents. Only two rows are currently loaded, whereas the previous results had four rows. The rows having input values with incorrect data type have been rejected:=> SELECT maptostring(__raw__), one, two FROM reject_true_false; maptostring | one | two -------------------------------------+-----+----- { "one" : "1", "two" : "2" } | 1 | 2 { "one" : "7", "two" : "8" } | 7 | 8 (2 rows)
Note
The parserfcsvparser
uses null
values if there is a type mismatch and you set the reject_on_materialized_type_error
parameter to false
.
Rejecting or omitting empty rows
Valid CSV files can include empty key and value pairs. Such rows are invalid for SQL. You can control the behavior for empty rows by either rejecting or omitting them, using two boolean FCSVPARSER
parameters:
reject_on_empty_key
omit_empty_keys
The following example illustrates how to set these parameters:
-
Create a flex table:
=> CREATE FLEX TABLE csv_basic(); CREATE TABLE
-
Load CSV data into the table (from STDIN), using the
fcsvparser
withreject_on_empty_key=false
. Whilefalse
is the default value, you can specify it explicitly, as shown. Additionally, set the parameterheader=true
to specify the columns for input values:=> COPY csv_basic FROM stdin PARSER fcsvparser(reject_on_empty_key=false,header=true); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> ,num >> 1,2 >> \.
-
Invoke
maptostring
to display the table values after loading data:=>SELECT maptostring(__raw__) FROM csv_basic; maptostring ---------------------------------- { "" : "1", "num" : "2" } (1 row)
-
Truncate the table to empty the data stored in the table:
=> TRUNCATE TABLE csv_basic; TRUNCATE TABLE
-
Reload the same data again, but this time, set
reject_on_empty_key=true
:=> COPY csv_basic FROM stdin PARSER fcsvparser(reject_on_empty_key=true,header=true); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> ,num >> 1,2 >> \.
-
Call
maptostring
to display the table contents. No rows are loaded because one of the keys is empty:=> SELECT maptostring(__raw__) FROM csv_basic; maptostring ------------- (0 rows)
-
Truncate the table to empty the data stored in the table:
=> TRUNCATE TABLE csv_basic; TRUNCATE TABLE
-
Reload the same data again, but this time, set
omit_empty_keys=true
:=> COPY csv_basic FROM stdin PARSER fcsvparser(omit_empty_keys=true,header=true); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> ,num >> 1,2 >> \.
-
Call
maptostring
to display the table contents. One row is now loaded, and the rows with empty keys are omitted:=> SELECT maptostring(__raw__) FROM csv_basic; maptostring --------------------- { "num" : "2" } (1 row)
Note
If no header names exist,fcsvparser
uses a default header of ucol
n
, where n is the column offset number. If a table header name and key name match, the parser loads the column with values associated with the matching key name.
Using the NULL parameter
Use the COPY NULL
metadata parameter with fcsvparser
to load NULL values into a flex table.
The next example uses this parameter:
-
Create a flex table:
=> CREATE FLEX TABLE fcsv(c1 int); CREATE TABLE
-
Load CSV data in the flex table using STDIN and the NULL parameter:
=> COPY fcsv FROM STDIN PARSER fcsvparser() NULL 'NULL' ; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> a,b,c1 >> 10,20,NULL >> 20,30,50 >> 20,30,40 >> \.
-
Use the
compute_flextable_keys_and_build_view
function to compute keys and build the flex view:=> SELECT compute_flextable_keys_and_build_view('fcsv'); compute_flextable_keys_and_build_view -------------------------------------------------------------------------------- Please see public.fcsv_keys for updated keys The view public.fcsv_view is ready for querying (1 row)
-
View the flex view and replace the NULL values:
=> SELECT * FROM public.fcsv_view; a | b | c1 ----+----+---- 20 | 30 | 50 10 | 20 | 20 | 30 | 40 (3 rows) => SELECT a,b, ISNULL(c1,-1) from public.fcsv_view; a | b | ISNULL ----+----+-------- 20 | 30 | 50 10 | 20 | -1 20 | 30 | 40 (3 rows)
Handling column headings
The fcsvparser lets you specify your own column headings with the HEADER_NAMES=
parameter. This parameter entirely replaces column names in the CSV source header row.
For example, to use these six column headings for a CSV file you are loading, use the fcsvparser parameter as follows:
HEADER_NAMES='FIRST, LAST, SOCIAL_SECURITY, TOWN, STATE, COUNTRY'
Supplying fewer header names than existing data columns causes fcsvparser to use default names after those you supply. Default header names consist of ucol
n
, where n is the column offset number, starting at 0
for the first column. For example, if you supply four header names for a 6-column table, fcsvparser supplies the default names ucol4
and ucol5
, following the fourth header name you provide.
If you supply more headings than the existing table columns, any additional headings remain unused.