This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Using flex table parsers

You can load flex tables with one of several parsers, using the options that these parsers support.

1: Loading columnar tables with flex parsers
2: Loading CSV data
3: Loading delimited data

You can load flex tables with one of several parsers, using the options that these parsers support.

In addition to the parsers listed in this section, the following parsers described in Data formats in Data load support flex tables:

1 - Loading columnar tables with flex parsers

You can use any of the flex parsers to load data into columnar tables.

You can use any of the flex parsers to load data into columnar tables. Using the flex table parsers to load columnar tables gives you the capability to mix data loads in one table. For example, you can load JSON data in one session and delimited data in another.

Note

For Avro data, you can load only data into a columnar table, not the schema. For flex tables, Avro schema information is required to be embedded in the data.

The following basic examples illustrate how you can use flex parsers with columnar tables.

Create a columnar table, super, with two columns, age and name:

=> CREATE TABLE super(age INT, name VARCHAR);
CREATE TABLE

Enter JSON values from STDIN, using the fjsonparser():

=> COPY super FROM stdin PARSER fjsonparser();
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> {"age": 5, "name": "Tim"}
>>  {"age": 3}
>>  {"name": "Fred"}
>>  {"name": "Bob", "age": 10}
>> \.

Query the table to see the values you entered:


=> SELECT * FROM super;
 age | name
-----+------
     | Fred
  10 | Bob
   5 | Tim
   3 |
(4 rows)

Enter some delimited values from STDIN, using the fdelimitedparser():

=> COPY super FROM stdin PARSER fdelimitedparser();
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> name |age
>> Tim|50
>> |30
>> Fred|
>> Bob|100
>> \.

Query the flex table. Both JSON and delimited data are saved in the same columnar table, super.

=> SELECT * FROM super;
 age | name
-----+------
  50 | Tim
  30 |
   3 |
   5 | Tim
 100 | Bob
     | Fred
  10 | Bob
     | Fred
(8 rows)

Use the reject_on_materialized_type_error parameter to avoid loading data with type mismatch. If reject_on_materialized_type_error is set to false, the flex parser will accept the data with type mismatch. Consider the following example:

Assume that the CSV file to be loaded has the following sample contents:

$ cat json.dat
{"created_by":"system","site_source":"flipkart_india_kol","updated_by":"system1","invoice_id":"INVDPKOL100",
"vendor_id":"VEN15731","total_quantity":12,"created_at":"2012-01-09 23:15:52.0"}
{"created_by":"system","site_source":"flipkart_india_kol","updated_by":"system2","invoice_id":"INVDPKOL101",
"vendor_id":"VEN15732","total_quantity":14,"created_at":"hello"}

Create a columnar table.

=> CREATE TABLE hdfs_test (
site_source VARCHAR(200),
total_quantity int ,
vendor_id varchar(200),
invoice_id varchar(200),
updated_by varchar(200),
created_by varchar(200),
created_at timestamp
);

Load JSON data.

=>COPY hdfs_test FROM '/home/dbadmin/json.dat' PARSER fjsonparser() ABORT ON ERROR;
Rows Loaded
-------------
2
(1 row)

View the contents.

=> SELECT * FROM hdfs_test;
site_source | total_quantity | vendor_id | invoice_id | updated_by | created_by | created_at
--------------------+----------------+-----------+-------------+------------+------------+---------------------
flipkart_india_kol | 12 | VEN15731 | INVDPKOL100 | system1 | system | 2012-01-09 23:15:52
flipkart_india_kol | 14 | VEN15732 | INVDPKOL101 | system2 | system |
(2 rows)

If reject_on_materialized_type_error parameter is set to true, you will receive errors when loading the sample JSON data.

=> COPY hdfs_test FROM '/home/dbadmin/data/flex/json.dat' PARSER fjsonparser(reject_on_materialized_type_error=true) ABORT ON ERROR;
ERROR 2035:  COPY: Input record 2 has been rejected (Rejected by user-defined parser)

2 - Loading CSV data

Use the fcsvparser to load data in CSV format (comma-separated values).

Use the fcsvparser to load data in CSV format (comma-separated values). Because no formal CSV standard exists, Vertica supports the RFC 4180 standard as the default behavior for fcsvparser. Other parser parameters simplify various combinations of CSV options into columnar or flex tables. Using fcsvparser parses the following CSV data formats:

RFC 4180: The RFC4180 CSV format parser for Vertica flex tables. The parameters for this format are fixed and cannot be changed.
Traditional: The traditional CSV parser lets you specify the parameter values such as delimiter or record terminator. For a detailed list of parameters, see FCSVPARSER.

Using default parser settings

These fixed parameter settings apply to the RCF4180 format.

You may use the same value for enclosed_by and escape. Other values must be unique.

Parameter	Data Type	Fixed Value (RCF4180)	Default Value (Traditional)
`delimiter`	`CHAR`	`,`	`,`
`enclosed_by`	`CHAR`	`"`	`"`
`escape`	`CHAR`	`"`	`\`
`record_terminator`	`CHAR`	`\n` or `\r\n`	`\n` or `\r\n`

Use the type parameter to indicate either an RFC 4180-compliant file or a traditional-compliant file. You can specify type as RCF4180. However, you must first verify that the data is compatible with the preceding fixed values for parameters of the RFC4180 format. The default value of the type parameter is RFC4180.

Loading CSV data (RFC4180)

This example shows how you can use fcsvparser to load a flex table, build a view, and then query that view.

Create a flex table for CSV data:

=> CREATE FLEX TABLE rfc();
CREATE TABLE

Use fcsvparser to load the data from STDIN. Specify that no header exists, and enter some data as shown:

=> COPY rfc FROM stdin PARSER fcsvparser(header='false');
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 10,10,20
>> 10,"10",30
>> 10,"20""5",90
>> \.

Run the compute_flextable_keys_and_build_view function, and query the rfc_view. Notice that the default enclosed_by character permits an escape character (") within a field ("20""5"). Thus, the resulting value was parsed correctly. Since no header existed in the input data, the function added ucoln for each column:

=> SELECT compute_flextable_keys_and_build_view('rfc');
                           compute_flextable_keys_and_build_view
--------------------------------------------------------------------------------------------
 Please see public.rfc_keys for updated keys
The view public.rfc_view is ready for querying
(1 row)

=> SELECT * FROM rfc_view;
 ucol0 | ucol1 | ucol2
-------+-------+-------
 10    | 10    | 20
 10    | 10    | 30
 10    | 20"5  | 90
(3 rows)

Loading CSV data (traditional)

Follow these steps to use fcsvparser to load data in traditional CSV data format using fcsvparser.

In this example, the CSV file uses $ as a delimiter and # as a record_terminator. The sample CSV file to load has the following contents:

$ more /home/dbadmin/flex/flexData1.csv
sno$name$age$gender#
1$John$14$male#
2$Mary$23$female#
3$Mark$35$male#

Create a flex table:

=> CREATE FLEX TABLE csv_basic();
CREATE TABLE

Load the data in flex table using fscvparser with parameters type='traditional', delimiter='$' and record_terminator='#' :

=> COPY csv_basic FROM '/home/dbadmin/flex/flexData2.csv' PARSER fcsvparser(type='traditional',
delimiter='$', record_terminator='#');
Rows Loaded
-------------
3
(1 row)

View the data loaded in the flex table:

=> SELECT maptostring(__raw__) FROM csv_basic;
maptostring
----------------------------------------------------------------------------------
{
"age" : "14",
"gender" : "male",
"name" : "John",
"sno" : "1"
}
{
"age" : "23",
"gender" : "female",
"name" : "Mary",
"sno" : "2"
}
{
"age" : "35",
"gender" : "male",
"name" : "Mark",
"sno" : "3"
}
(3 rows)

Rejecting duplicate values

You can reject duplicate values using the reject_on_duplicate=true option with the fcsvparser. The load continues after it rejects a duplicate value. The next example shows how to use this parameter and then displays the specified exception and rejected data files. Saving rejected data to a table, rather than a file, includes both the data and its exception.

=> CREATE FLEX TABLE csv_basic();
CREATE TABLE
=> COPY csv_basic FROM stdin PARSER fcsvparser(reject_on_duplicate=true)
exceptions '/home/dbadmin/load_errors/except.out' rejected data '/home/dbadmin/load_errors/reject.out';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> A|A
>> 1|2
>> \.
=> \! cat /home/dbadmin/load_errors/reject.out
A|A
=> \! cat /home/dbadmin/load_errors/except.out
COPY: Input record 1 has been rejected (Processed a header row with duplicate keys with
reject_on_duplicate specified; rejecting.).  Please see /home/dbadmin/load_errors/reject.out,
record 1 for the rejected record.
COPY: Loaded 0 rows, rejected 1 rows.

Rejecting data on materialized column type errors

The fcsvparser parser has a Boolean parameter, reject_on_materialized_type_error. Setting this parameter to true causes rows to be rejected if both the following conditions exist in the input data:

Includes keys matching an existing materialized column
Has a key value that cannot be coerced into the materialized column's data type

The following examples illustrate setting this parameter.

Create a table, reject_true_false, with two real columns:

=> CREATE FLEX TABLE reject_true_false(one int, two int);
CREATE TABLE

Load CSV data into the table (from STDIN), using the fcsvparser with reject_on_materialized_type_error=false. While false is the default value, you can specify it explicitly, as shown. Additionally, set the parameter header=true to specify the columns for input values:

=> COPY reject_true_false FROM stdin PARSER fcsvparser(reject_on_materialized_type_error=false,header=true);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> one,two
>> 1,2
>> "3","four"
>> "five",6
>> 7,8
>> \.

Invoke maptostring to display the table values after loading data:

=> SELECT maptostring(__raw__), one, two FROM reject_true_false;
maptostring               | one | two
----------------------------------------+-----+-----
{
"one" : "1",
"two" : "2"
}
|   1 |   2
{
"one" : "3",
"two" : "four"
}
|   3 |
{
"one" : "five",
"two" : "6"
}
|     |   6
{
"one" : "7",
"two" : "8"
}
|   7 |   8
(4 rows)

Truncate the table to empty the data stored in the table:
```
=> TRUNCATE TABLE reject_true_false;
TRUNCATE TABLE
```

Reload the same data again, but this time, set reject_on_materialized_type_error=true:

=> COPY reject_true_false FROM stdin PARSER fcsvparser(reject_on_materialized_type_error=true,header=true);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> one,two
>> 1,2
>> "3","four"
>> "five",6
>> 7,8
>> \.

Call maptostring to display the table contents. Only two rows are currently loaded, whereas the previous results had four rows. The rows having input values with incorrect data type have been rejected:

=> SELECT maptostring(__raw__), one, two FROM reject_true_false;
maptostring             | one | two
-------------------------------------+-----+-----
{
"one" : "1",
"two" : "2"
}
|   1 |   2
{
"one" : "7",
"two" : "8"
}
|   7 |   8
(2 rows)

Note

The parser fcsvparser uses null values if there is a type mismatch and you set the reject_on_materialized_type_error parameter to false.

Rejecting or omitting empty rows

Valid CSV files can include empty key and value pairs. Such rows are invalid for SQL. You can control the behavior for empty rows by either rejecting or omitting them, using two boolean FCSVPARSER parameters:

reject_on_empty_key
omit_empty_keys

The following example illustrates how to set these parameters:

Create a flex table:

 => CREATE FLEX TABLE csv_basic();
CREATE TABLE

Load CSV data into the table (from STDIN), using the fcsvparser with reject_on_empty_key=false. While false is the default value, you can specify it explicitly, as shown. Additionally, set the parameter header=true to specify the columns for input values:
```
=> COPY csv_basic FROM stdin PARSER fcsvparser(reject_on_empty_key=false,header=true);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> ,num
>> 1,2
>> \.
```

Invoke maptostring to display the table values after loading data:

=>SELECT maptostring(__raw__) FROM csv_basic;
maptostring
----------------------------------
{
"" : "1",
"num" : "2"
}

(1 row)

Truncate the table to empty the data stored in the table:
```
=> TRUNCATE TABLE csv_basic;
TRUNCATE TABLE
```

Reload the same data again, but this time, set reject_on_empty_key=true:

=> COPY csv_basic FROM stdin PARSER fcsvparser(reject_on_empty_key=true,header=true);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> ,num
>> 1,2
>> \.

Call maptostring to display the table contents. No rows are loaded because one of the keys is empty:
```
=> SELECT maptostring(__raw__) FROM csv_basic;
maptostring
-------------
(0 rows)
```
Truncate the table to empty the data stored in the table:
```
=> TRUNCATE TABLE csv_basic;
TRUNCATE TABLE
```

Reload the same data again, but this time, set omit_empty_keys=true:

=> COPY csv_basic FROM stdin PARSER fcsvparser(omit_empty_keys=true,header=true);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> ,num
>> 1,2
>> \.

Call maptostring to display the table contents. One row is now loaded, and the rows with empty keys are omitted:

=> SELECT maptostring(__raw__) FROM csv_basic;
maptostring
---------------------
{
"num" : "2"
}
(1 row)

Note

If no header names exist, fcsvparser uses a default header of ucoln, where n is the column offset number. If a table header name and key name match, the parser loads the column with values associated with the matching key name.

Using the NULL parameter

Use the COPY NULL metadata parameter with fcsvparser to load NULL values into a flex table.

The next example uses this parameter:

Create a flex table:

=> CREATE FLEX TABLE fcsv(c1 int);
CREATE TABLE

Load CSV data in the flex table using STDIN and the NULL parameter:

=> COPY fcsv FROM STDIN PARSER fcsvparser() NULL 'NULL' ;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> a,b,c1
>> 10,20,NULL
>> 20,30,50
>> 20,30,40
>> \.

Use the compute_flextable_keys_and_build_view function to compute keys and build the flex view:

=> SELECT compute_flextable_keys_and_build_view('fcsv');
compute_flextable_keys_and_build_view
--------------------------------------------------------------------------------
Please see public.fcsv_keys for updated keys
The view public.fcsv_view is ready for querying
(1 row)

View the flex view and replace the NULL values:

=> SELECT * FROM public.fcsv_view;
a  | b  | c1
----+----+----
20 | 30 | 50
10 | 20 |
20 | 30 | 40
(3 rows)
=> SELECT a,b, ISNULL(c1,-1) from public.fcsv_view;
a  | b  | ISNULL
----+----+--------
20 | 30 |     50
10 | 20 |     -1
20 | 30 |     40
(3 rows)

Handling column headings

The fcsvparser lets you specify your own column headings with the HEADER_NAMES= parameter. This parameter entirely replaces column names in the CSV source header row.

For example, to use these six column headings for a CSV file you are loading, use the fcsvparser parameter as follows:

HEADER_NAMES='FIRST, LAST, SOCIAL_SECURITY, TOWN, STATE, COUNTRY'

Supplying fewer header names than existing data columns causes fcsvparser to use default names after those you supply. Default header names consist of ucoln, where n is the column offset number, starting at 0 for the first column. For example, if you supply four header names for a 6-column table, fcsvparser supplies the default names ucol4 and ucol5, following the fourth header name you provide.

If you supply more headings than the existing table columns, any additional headings remain unused.

3 - Loading delimited data

You can load flex tables with one of two delimited parsers, fdelimitedparser or fdelimitedpairparser.

You can load flex tables with one of two delimited parsers, fdelimitedparser or fdelimitedpairparser.

Use fdelimitedpairparser when the data specifies column names with the data in each row.
Use fdelimitedparser when the data does not specify column names or has a header row for column names.

This section describes using some options that fdelimitedpairparser and fdelimitedparser support.

Rejecting duplicate values

You can reject duplicate values using the reject_on_duplicate=true option with the fdelimitedparser. The load continues after it rejects a duplicate value. The next example shows how to use this parameter and then displays the specified exception and rejected data files. Saving rejected data to a table, rather than a file, includes both the data and its exception.

=> CREATE FLEX TABLE delim_dupes();
CREATE TABLE
=> COPY delim_dupes FROM stdin PARSER fdelimitedparser(reject_on_duplicate=true)
exceptions '/home/dbadmin/load_errors/except.out' rejected data '/home/dbadmin/load_errors/reject.out';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> A|A
>> 1|2
>> \.
=> \! cat /home/dbadmin/load_errors/reject.out
A|A
=> \! cat /home/dbadmin/load_errors/except.out
COPY: Input record 1 has been rejected (Processed a header row with duplicate keys with
reject_on_duplicate specified; rejecting.).  Please see /home/dbadmin/load_errors/reject.out,
record 1 for the rejected record.
COPY: Loaded 0 rows, rejected 1 rows.

Rejecting materialized column type errors

Both the fjsonparser and fdelimitedparser parsers have a boolean parameter, reject_on_materialized_type_error. Setting this parameter to true causes rows to be rejected if both the following conditions exist in the input data:

Includes keys matching an existing materialized column
Has a value that cannot be coerced into the materialized column's data type

Suppose the flex table has a materialized column, OwnerPercent, declared as a FLOAT. Trying to load a row with an OwnerPercent key that has a VARCHAR value causes fdelimitedparser to reject the data row.

The following examples illustrate setting this parameter.

Create a table, reject_true_false, with two real columns:

=> CREATE FLEX TABLE reject_true_false(one VARCHAR, two INT);
CREATE TABLE

Load JSON data into the table (from STDIN), using the fjsonparser with reject_on_materialized_type_error=false. While false is the default value, the following example specifies it explicitly for illustration:

=> COPY reject_true_false FROM stdin PARSER fjsonparser(reject_on_materialized_type_error=false);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> {"one": 1, "two": 2}
>> {"one": "one", "two": "two"}
>> {"one": "one", "two": 2}
>> \.

Invoke maptostring to display the table values after loading data:

=>  SELECT maptostring(__raw__), one, two FROM reject_true_false;
               maptostring        | one | two
----------------------------------+-----+-----
 {
   "one" : "one",
   "two" : "2"
}
   | one |   2
{
   "one" : "1",
   "two" : "2"
}
   | 1   |   2
 {
   "one" : "one",
   "two" : "two"
}
   | one |
(3 rows)

Truncate the table:
```
=> TRUNCATE TABLE reject_true_false;
```

Reload the same data again, but this time, set reject_on_materialized_type_error=true:

=> COPY reject_true_false FROM stdin PARSER fjsonparser(reject_on_materialized_type_error=true);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> {"one": 1, "two": 2}
>> {"one": "one", "two": "two"}
>> {"one": "one", "two": 2}
>> \.

Call maptostring to display the table contents. Only two rows were loaded, whereas the previous results had three rows:

=> SELECT maptostring(__raw__), one, two FROM reject_true_false;
              maptostring              | one | two
---------------------------------------+-----+-----
 {
   "one" : "1",
   "two" : "2"
}
   | 1   |   2
 {
   "one" : "one",
   "two" : "2"
}
 | one |   2
(2 rows)