You can load flex tables with one of several parsers, using the options that these parsers support.
In addition to the parsers listed in this section, the following parsers described in Data formats in Data load support flex tables:
This is the multi-page printable view of this section. Click here to print.
You can load flex tables with one of several parsers, using the options that these parsers support.
In addition to the parsers listed in this section, the following parsers described in Data formats in Data load support flex tables:
You can use any of the flex parsers to load data into columnar tables. Using the flex table parsers to load columnar tables gives you the capability to mix data loads in one table. For example, you can load JSON data in one session and delimited data in another.
The following basic examples illustrate how you can use flex parsers with columnar tables.
Create a columnar table, super
, with two columns, age
and name
:
=> CREATE TABLE super(age INT, name VARCHAR);
CREATE TABLE
Enter JSON values from STDIN, using the fjsonparser()
:
=> COPY super FROM stdin PARSER fjsonparser();
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> {"age": 5, "name": "Tim"}
>> {"age": 3}
>> {"name": "Fred"}
>> {"name": "Bob", "age": 10}
>> \.
Query the table to see the values you entered:
=> SELECT * FROM super;
age | name
-----+------
| Fred
10 | Bob
5 | Tim
3 |
(4 rows)
Enter some delimited values from STDIN, using the fdelimitedparser()
:
=> COPY super FROM stdin PARSER fdelimitedparser();
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> name |age
>> Tim|50
>> |30
>> Fred|
>> Bob|100
>> \.
Query the flex table. Both JSON and delimited data are saved in the same columnar table, super
.
=> SELECT * FROM super;
age | name
-----+------
50 | Tim
30 |
3 |
5 | Tim
100 | Bob
| Fred
10 | Bob
| Fred
(8 rows)
Use the reject_on_materialized_type_error
parameter to avoid loading data with type mismatch. If reject_on_materialized_type_error
is set to false
, the flex parser will accept the data with type mismatch. Consider the following example:
Assume that the CSV file to be loaded has the following sample contents:
$ cat json.dat
{"created_by":"system","site_source":"flipkart_india_kol","updated_by":"system1","invoice_id":"INVDPKOL100",
"vendor_id":"VEN15731","total_quantity":12,"created_at":"2012-01-09 23:15:52.0"}
{"created_by":"system","site_source":"flipkart_india_kol","updated_by":"system2","invoice_id":"INVDPKOL101",
"vendor_id":"VEN15732","total_quantity":14,"created_at":"hello"}
Create a columnar table.
=> CREATE TABLE hdfs_test (
site_source VARCHAR(200),
total_quantity int ,
vendor_id varchar(200),
invoice_id varchar(200),
updated_by varchar(200),
created_by varchar(200),
created_at timestamp
);
Load JSON data.
=>COPY hdfs_test FROM '/home/dbadmin/json.dat' PARSER fjsonparser() ABORT ON ERROR;
Rows Loaded
-------------
2
(1 row)
View the contents.
=> SELECT * FROM hdfs_test;
site_source | total_quantity | vendor_id | invoice_id | updated_by | created_by | created_at
--------------------+----------------+-----------+-------------+------------+------------+---------------------
flipkart_india_kol | 12 | VEN15731 | INVDPKOL100 | system1 | system | 2012-01-09 23:15:52
flipkart_india_kol | 14 | VEN15732 | INVDPKOL101 | system2 | system |
(2 rows)
If reject_on_materialized_type_error
parameter is set to true
, you will receive errors when loading the sample JSON data.
=> COPY hdfs_test FROM '/home/dbadmin/data/flex/json.dat' PARSER fjsonparser(reject_on_materialized_type_error=true) ABORT ON ERROR;
ERROR 2035: COPY: Input record 2 has been rejected (Rejected by user-defined parser)
Use the fcsvparser
to load data in CSV format (comma-separated values). Because no formal CSV standard exists, Vertica supports the RFC 4180 standard as the default behavior for fcsvparser
. Other parser parameters simplify various combinations of CSV options into columnar or flex tables. Using fcsvparser
parses the following CSV data formats:
RFC 4180: The RFC4180 CSV format parser for Vertica flex tables. The parameters for this format are fixed and cannot be changed.
Traditional: The traditional CSV parser lets you specify the parameter values such as delimiter or record terminator. For a detailed list of parameters, see FCSVPARSER.
These fixed parameter settings apply to the RCF4180 format.
You may use the same value for enclosed_by
and escape
. Other values must be unique.
Parameter | Data Type | Fixed Value (RCF4180) | Default Value (Traditional) |
---|---|---|---|
delimiter |
CHAR |
, |
, |
enclosed_by |
CHAR |
" |
" |
escape |
CHAR |
" |
\ |
record_terminator |
CHAR |
\n or \r\n |
\n or \r\n |
Use the type
parameter to indicate either an RFC 4180-compliant file or a traditional-compliant file. You can specify type
as RCF4180
. However, you must first verify that the data is compatible with the preceding fixed values for parameters of the RFC4180 format. The default value of the type
parameter is RFC4180.
This example shows how you can use fcsvparser
to load a flex table, build a view, and then query that view.
Create a flex table for CSV data:
=> CREATE FLEX TABLE rfc();
CREATE TABLE
Use fcsvparser
to load the data from STDIN. Specify that no header exists, and enter some data as shown:
=> COPY rfc FROM stdin PARSER fcsvparser(header='false');
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 10,10,20
>> 10,"10",30
>> 10,"20""5",90
>> \.
Run the compute_flextable_keys_and_build_view
function, and query the rfc_view
. Notice that the default enclosed_by
character permits an escape character ("
) within a field ("20""5"
). Thus, the resulting value was parsed correctly. Since no header existed in the input data, the function added ucol
n
for each column:
=> SELECT compute_flextable_keys_and_build_view('rfc');
compute_flextable_keys_and_build_view
--------------------------------------------------------------------------------------------
Please see public.rfc_keys for updated keys
The view public.rfc_view is ready for querying
(1 row)
=> SELECT * FROM rfc_view;
ucol0 | ucol1 | ucol2
-------+-------+-------
10 | 10 | 20
10 | 10 | 30
10 | 20"5 | 90
(3 rows)
Follow these steps to use fcsvparser
to load data in traditional CSV data format using fcsvparser
.
In this example, the CSV file uses $
as a delimiter
and #
as a record_terminator
. The sample CSV file to load has the following contents:
$ more /home/dbadmin/flex/flexData1.csv
sno$name$age$gender#
1$John$14$male#
2$Mary$23$female#
3$Mark$35$male#
Create a flex table:
=> CREATE FLEX TABLE csv_basic();
CREATE TABLE
Load the data in flex table using fscvparser
with parameters type='traditional'
, delimiter='$'
and record_terminator='#'
:
=> COPY csv_basic FROM '/home/dbadmin/flex/flexData2.csv' PARSER fcsvparser(type='traditional',
delimiter='$', record_terminator='#');
Rows Loaded
-------------
3
(1 row)
View the data loaded in the flex table:
=> SELECT maptostring(__raw__) FROM csv_basic;
maptostring
----------------------------------------------------------------------------------
{
"age" : "14",
"gender" : "male",
"name" : "John",
"sno" : "1"
}
{
"age" : "23",
"gender" : "female",
"name" : "Mary",
"sno" : "2"
}
{
"age" : "35",
"gender" : "male",
"name" : "Mark",
"sno" : "3"
}
(3 rows)
You can reject duplicate values using the reject_on_duplicate=true
option with the fcsvparser
. The load continues after it rejects a duplicate value. The next example shows how to use this parameter and then displays the specified exception and rejected data files. Saving rejected data to a table, rather than a file, includes both the data and its exception.
=> CREATE FLEX TABLE csv_basic();
CREATE TABLE
=> COPY csv_basic FROM stdin PARSER fcsvparser(reject_on_duplicate=true)
exceptions '/home/dbadmin/load_errors/except.out' rejected data '/home/dbadmin/load_errors/reject.out';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> A|A
>> 1|2
>> \.
=> \! cat /home/dbadmin/load_errors/reject.out
A|A
=> \! cat /home/dbadmin/load_errors/except.out
COPY: Input record 1 has been rejected (Processed a header row with duplicate keys with
reject_on_duplicate specified; rejecting.). Please see /home/dbadmin/load_errors/reject.out,
record 1 for the rejected record.
COPY: Loaded 0 rows, rejected 1 rows.
The fcsvparser
parser has a Boolean parameter, reject_on_materialized_type_error
. Setting this parameter to true
causes rows to be rejected if both the following conditions exist in the input data:
Includes keys matching an existing materialized column
Has a key value that cannot be coerced into the materialized column's data type
The following examples illustrate setting this parameter.
Create a table, reject_true_false
, with two real columns:
=> CREATE FLEX TABLE reject_true_false(one int, two int);
CREATE TABLE
Load CSV data into the table (from STDIN), using the fcsvparser
with reject_on_materialized_type_error=false
. While false
is the default value, you can specify it explicitly, as shown. Additionally, set the parameter header=true
to specify the columns for input values:
=> COPY reject_true_false FROM stdin PARSER fcsvparser(reject_on_materialized_type_error=false,header=true);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> one,two
>> 1,2
>> "3","four"
>> "five",6
>> 7,8
>> \.
Invoke maptostring
to display the table values after loading data:
=> SELECT maptostring(__raw__), one, two FROM reject_true_false;
maptostring | one | two
----------------------------------------+-----+-----
{
"one" : "1",
"two" : "2"
}
| 1 | 2
{
"one" : "3",
"two" : "four"
}
| 3 |
{
"one" : "five",
"two" : "6"
}
| | 6
{
"one" : "7",
"two" : "8"
}
| 7 | 8
(4 rows)
Truncate the table to empty the data stored in the table:
=> TRUNCATE TABLE reject_true_false;
TRUNCATE TABLE
Reload the same data again, but this time, set reject_on_materialized_type_error=true
:
=> COPY reject_true_false FROM stdin PARSER fcsvparser(reject_on_materialized_type_error=true,header=true);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> one,two
>> 1,2
>> "3","four"
>> "five",6
>> 7,8
>> \.
Call maptostring
to display the table contents. Only two rows are currently loaded, whereas the previous results had four rows. The rows having input values with incorrect data type have been rejected:
=> SELECT maptostring(__raw__), one, two FROM reject_true_false;
maptostring | one | two
-------------------------------------+-----+-----
{
"one" : "1",
"two" : "2"
}
| 1 | 2
{
"one" : "7",
"two" : "8"
}
| 7 | 8
(2 rows)
fcsvparser
uses null
values if there is a type mismatch and you set the reject_on_materialized_type_error
parameter to false
.
Valid CSV files can include empty key and value pairs. Such rows are invalid for SQL. You can control the behavior for empty rows by either rejecting or omitting them, using two boolean FCSVPARSER
parameters:
reject_on_empty_key
omit_empty_keys
The following example illustrates how to set these parameters:
Create a flex table:
=> CREATE FLEX TABLE csv_basic();
CREATE TABLE
Load CSV data into the table (from STDIN), using the fcsvparser
with reject_on_empty_key=false
. While false
is the default value, you can specify it explicitly, as shown. Additionally, set the parameter header=true
to specify the columns for input values:
=> COPY csv_basic FROM stdin PARSER fcsvparser(reject_on_empty_key=false,header=true);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> ,num
>> 1,2
>> \.
Invoke maptostring
to display the table values after loading data:
=>SELECT maptostring(__raw__) FROM csv_basic;
maptostring
----------------------------------
{
"" : "1",
"num" : "2"
}
(1 row)
Truncate the table to empty the data stored in the table:
=> TRUNCATE TABLE csv_basic;
TRUNCATE TABLE
Reload the same data again, but this time, set reject_on_empty_key=true
:
=> COPY csv_basic FROM stdin PARSER fcsvparser(reject_on_empty_key=true,header=true);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> ,num
>> 1,2
>> \.
Call maptostring
to display the table contents. No rows are loaded because one of the keys is empty:
=> SELECT maptostring(__raw__) FROM csv_basic;
maptostring
-------------
(0 rows)
Truncate the table to empty the data stored in the table:
=> TRUNCATE TABLE csv_basic;
TRUNCATE TABLE
Reload the same data again, but this time, set omit_empty_keys=true
:
=> COPY csv_basic FROM stdin PARSER fcsvparser(omit_empty_keys=true,header=true);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> ,num
>> 1,2
>> \.
Call maptostring
to display the table contents. One row is now loaded, and the rows with empty keys are omitted:
=> SELECT maptostring(__raw__) FROM csv_basic;
maptostring
---------------------
{
"num" : "2"
}
(1 row)
fcsvparser
uses a default header of ucol
n
, where n is the column offset number. If a table header name and key name match, the parser loads the column with values associated with the matching key name.
Use the COPY NULL
metadata parameter with fcsvparser
to load NULL values into a flex table.
The next example uses this parameter:
Create a flex table:
=> CREATE FLEX TABLE fcsv(c1 int);
CREATE TABLE
Load CSV data in the flex table using STDIN and the NULL parameter:
=> COPY fcsv FROM STDIN PARSER fcsvparser() NULL 'NULL' ;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> a,b,c1
>> 10,20,NULL
>> 20,30,50
>> 20,30,40
>> \.
Use the compute_flextable_keys_and_build_view
function to compute keys and build the flex view:
=> SELECT compute_flextable_keys_and_build_view('fcsv');
compute_flextable_keys_and_build_view
--------------------------------------------------------------------------------
Please see public.fcsv_keys for updated keys
The view public.fcsv_view is ready for querying
(1 row)
View the flex view and replace the NULL values:
=> SELECT * FROM public.fcsv_view;
a | b | c1
----+----+----
20 | 30 | 50
10 | 20 |
20 | 30 | 40
(3 rows)
=> SELECT a,b, ISNULL(c1,-1) from public.fcsv_view;
a | b | ISNULL
----+----+--------
20 | 30 | 50
10 | 20 | -1
20 | 30 | 40
(3 rows)
The fcsvparser lets you specify your own column headings with the HEADER_NAMES=
parameter. This parameter entirely replaces column names in the CSV source header row.
For example, to use these six column headings for a CSV file you are loading, use the fcsvparser parameter as follows:
HEADER_NAMES='FIRST, LAST, SOCIAL_SECURITY, TOWN, STATE, COUNTRY'
Supplying fewer header names than existing data columns causes fcsvparser to use default names after those you supply. Default header names consist of ucol
n
, where n is the column offset number, starting at 0
for the first column. For example, if you supply four header names for a 6-column table, fcsvparser supplies the default names ucol4
and ucol5
, following the fourth header name you provide.
If you supply more headings than the existing table columns, any additional headings remain unused.
You can load flex tables with one of two delimited parsers, fdelimitedparser
or fdelimitedpairparser
.
Use fdelimitedpairparser
when the data specifies column names with the data in each row.
Use fdelimitedparser
when the data does not specify column names or has a header row for column names.
This section describes using some options that fdelimitedpairparser
and fdelimitedparser support.
You can reject duplicate values using the reject_on_duplicate=true
option with the fdelimitedparser
. The load continues after it rejects a duplicate value. The next example shows how to use this parameter and then displays the specified exception and rejected data files. Saving rejected data to a table, rather than a file, includes both the data and its exception.
=> CREATE FLEX TABLE delim_dupes();
CREATE TABLE
=> COPY delim_dupes FROM stdin PARSER fdelimitedparser(reject_on_duplicate=true)
exceptions '/home/dbadmin/load_errors/except.out' rejected data '/home/dbadmin/load_errors/reject.out';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> A|A
>> 1|2
>> \.
=> \! cat /home/dbadmin/load_errors/reject.out
A|A
=> \! cat /home/dbadmin/load_errors/except.out
COPY: Input record 1 has been rejected (Processed a header row with duplicate keys with
reject_on_duplicate specified; rejecting.). Please see /home/dbadmin/load_errors/reject.out,
record 1 for the rejected record.
COPY: Loaded 0 rows, rejected 1 rows.
Both the fjsonparser
and fdelimitedparser
parsers have a boolean parameter, reject_on_materialized_type_error
. Setting this parameter to true
causes rows to be rejected if both the following conditions exist in the input data:
Includes keys matching an existing materialized column
Has a value that cannot be coerced into the materialized column's data type
Suppose the flex table has a materialized column, OwnerPercent
, declared as a FLOAT
. Trying to load a row with an OwnerPercent
key that has a VARCHAR
value causes fdelimitedparser
to reject the data row.
The following examples illustrate setting this parameter.
Create a table, reject_true_false
, with two real columns:
=> CREATE FLEX TABLE reject_true_false(one VARCHAR, two INT);
CREATE TABLE
Load JSON data into the table (from STDIN
), using the fjsonparser
with reject_on_materialized_type_error=false
. While false
is the default value, the following example specifies it explicitly for illustration:
=> COPY reject_true_false FROM stdin PARSER fjsonparser(reject_on_materialized_type_error=false);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> {"one": 1, "two": 2}
>> {"one": "one", "two": "two"}
>> {"one": "one", "two": 2}
>> \.
Invoke maptostring
to display the table values after loading data:
=> SELECT maptostring(__raw__), one, two FROM reject_true_false;
maptostring | one | two
----------------------------------+-----+-----
{
"one" : "one",
"two" : "2"
}
| one | 2
{
"one" : "1",
"two" : "2"
}
| 1 | 2
{
"one" : "one",
"two" : "two"
}
| one |
(3 rows)
Truncate the table:
=> TRUNCATE TABLE reject_true_false;
Reload the same data again, but this time, set reject_on_materialized_type_error=true
:
=> COPY reject_true_false FROM stdin PARSER fjsonparser(reject_on_materialized_type_error=true);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> {"one": 1, "two": 2}
>> {"one": "one", "two": "two"}
>> {"one": "one", "two": 2}
>> \.
Call maptostring
to display the table contents. Only two rows were loaded, whereas the previous results had three rows:
=> SELECT maptostring(__raw__), one, two FROM reject_true_false;
maptostring | one | two
---------------------------------------+-----+-----
{
"one" : "1",
"two" : "2"
}
| 1 | 2
{
"one" : "one",
"two" : "2"
}
| one | 2
(2 rows)