Bulk loading data into flex tables
You bulk load data into a flex table with a COPY statement, specifying one of the flex parsers:
-
FAVROPARSER
-
FCEFPARSER
-
FCSVPARSER
-
FDELIMITEDPAIRPARSER
-
FDELIMITEDPARSER
-
FJSONPARSER
-
FREGEXPARSER
All flex parsers store the data as a single-value VMap. They reside in the VARBINARY __raw__
column, which is a real column with a NOT NULL constraint. The VMap is encoded into a single binary value for storage in the __raw__
column. The encoding places the value strings in a contiguous block, followed by the key strings. Vertica supports null values within the VMap for keys with NULL-specified columns. The key and value strings represent the virtual columns and their values in your flex table.
If a flex table data row is too large to fit in the __raw__
column, it is rejected. By default, the rejected data and exceptions files are stored in the standard CopyErrorLogs
location, a subdirectory of the catalog directory:
v_mart_node003_catalog/CopyErrorLogs/trans-STDIN-copy-from-exceptions.1
v_mart_node003_catalog/CopyErrorLogs/trans-STDIN-copy-rejections.1
Flex tables do not copy any rejected data, due to disk space considerations. The rejected data file exists, but it contains only a new line character for every rejected record. The corresponding exceptions file lists the reason why each record was rejected.
You can specify a different path and file for the rejected data and exceptions files. To do so, use the COPY options REJECTED DATA and EXCEPTIONS, respectively. You can also save load rejections and exceptions in a table. For more information, see Data load.
Basic flex table load and query
Loading data into a flex table is similar to loading data into a regular columnar table. The difference is that you must use the PARSER option with one of the flex parsers:
=> COPY darkdata FROM '/home/dbadmin/data/tweets_12.json' PARSER fjsonparser();
Rows Loaded
-------------
12
(1 row)
Note
You can use many additional COPY parameters as required but not all are supported.Loading data into flex table real columns
If you create a hybrid flex table with one or more real column definitions, COPY evaluates each virtual column key name during data load and automatically populates real columns with the values from their virtual column counterparts. For columns of scalar types, COPY also loads the keys and values as part of the VMap data in the __raw__
column. For columns of complex types, COPY does not add the values to the __raw__
column.
Note
Over time, storing values in both column types can impact your licensed data limits. For more information about Vertica licenses, see Managing licenses.For example, continuing with the JSON example:
-
Create a flex table with a column definition of one of the keys in the data you will load:
=> CREATE FLEX TABLE darkdata1 ("user.lang" VARCHAR); CREATE TABLE
-
Load data into the table:
=> COPY darkdata1 FROM '/test/vertica/flextable/DATA/tweets_12.json' PARSER fjsonparser(); Rows Loaded ------------- 12 (1 row)
-
Query the real column directly:
=> SELECT "user.lang" FROM darkdata1; user.lang ----------- es es tr it en en en en (12 rows)
Empty column rows indicate NULL values. For more information about how NULLs are handled in flex tables, see NULL value.
-
You can query for other virtual columns with similar results:
=> SELECT "user.name" FROM darkdata1; user.name --------------------- I'm Toaster⥠Flu Beach seydo shi The End Uptown gentleman. ~G A B R I E L A ⿠Frederick Danjou laughing at clouds. (12 rows)
Note
While the results for these two queries are similar, the difference in accessing the keys and their values is significant. Data foruser.lang
has been materialized into a real table column, whileuser.name
remains a virtual column. For production-level data usage (rather than test data sets), materializing flex table data improves query performance significantly.
Handling default values during loading
You can create a flex table with a real column, named for a virtual column that exists in your incoming data. For example, if the data you load has a user.lang
virtual column, define the flex table with that column. You can also specify a default column value when creating the flex table with real columns. The next example shows how to define a real column (user.lang
), which has a default value from a virtual column (user.name
):
=> CREATE FLEX TABLE darkdata1 ("user.lang" LONG VARCHAR default "user.name");
When you load data into a flex table, COPY uses values from the flex table data, ignoring the default column definition. Loading data into a flex table requires MAPLOOKUP to find keys that match any real column names. A match exists when the incoming data has a virtual column with the same name as a real column. When COPY detects a match, it populates the column with values. COPY returns either a value or NULL for each row, so real columns always have values.
For example, after creating the flex table described in the previous example, load data with COPY:
=> COPY darkdata1 FROM '/test/vertica/flextable/DATA/tweets_12.json' PARSER fjsonparser();
Rows Loaded
-------------
12
(1 row)
If you query the table after loading, the data shows that the values for the user.lang
column were extracted:
-
From the data being loaded — values for the
user.lang
virtual column -
With NULL — rows without values
In this case, the table column default value for user.lang
was ignored:
=> SELECT "user.lang" FROM darkdata1;
user.lang
-----------
it
en
es
en
en
es
tr
en
(12 rows)
Using COPY to specify default column values
You can add an expression to a COPY statement to specify default column values when loading data. For flex tables, specifying any column information requires that you list the __raw__
column explicitly. The following example shows how to use an expression for the default column value. In this case, loading populates the defined user.lang
column with data from the input data user.name
values:
=> COPY darkdata1(__raw__, "user.lang" as "user.name"::VARCHAR)
FROM '/test/vertica/flextable/DATA/tweets_12.json' PARSER fjsonparser();
Rows Loaded
-------------
12
(1 row)
=> SELECT "user.lang" FROM darkdata1;
user.lang
---------------------
laughing at clouds.
Avita Desai
I'm Toasterâ¥
Uptown gentleman.
~G A B R I E L A â¿
Flu Beach
seydo shi
The End
(12 rows)
You can specify default values when adding columns, as described in Altering Flex Tables. When you do so, a different behavior results. For more information about using COPY, its expressions and parameters, see Getting Data into Vertica and the COPY reference page.