Bulk loading data into flex tables

You bulk load data into a flex table with a COPY statement, specifying one of the flex parsers:.

You bulk load data into a flex table with a COPY statement, specifying one of the flex parsers:

  • FAVROPARSER
  • FCEFPARSER
  • FCSVPARSER
  • FDELIMITEDPAIRPARSER
  • FDELIMITEDPARSER
  • FJSONPARSER
  • FREGEXPARSER

All flex parsers store the data as a single-value VMap. They reside in the VARBINARY __raw__ column, which is a real column with a NOT NULL constraint. The VMap is encoded into a single binary value for storage in the __raw__ column. The encoding places the value strings in a contiguous block, followed by the key strings. Vertica supports null values within the VMap for keys with NULL-specified columns. The key and value strings represent the virtual columns and their values in your flex table.

If a flex table data row is too large to fit in the __raw__ column, it is rejected. By default, the rejected data and exceptions files are stored in the standard CopyErrorLogs location, a subdirectory of the catalog directory:

v_mart_node003_catalog/CopyErrorLogs/trans-STDIN-copy-from-exceptions.1
v_mart_node003_catalog/CopyErrorLogs/trans-STDIN-copy-rejections.1

Flex tables do not copy any rejected data, due to disk space considerations. The rejected data file exists, but it contains only a new line character for every rejected record. The corresponding exceptions file lists the reason why each record was rejected.

You can specify a different path and file for the rejected data and exceptions files. To do so, use the COPY options REJECTED DATA and EXCEPTIONS, respectively. You can also save load rejections and exceptions in a table. For more information, see Data load.

Basic flex table load and query

Loading data into a flex table is similar to loading data into a regular columnar table. The difference is that you must use the PARSER option with one of the flex parsers:

=> COPY darkdata FROM '/home/dbadmin/data/tweets_12.json' PARSER fjsonparser();
 Rows Loaded
-------------
          12
(1 row)

Loading data into flex table real columns

If you create a hybrid flex table with one or more real column definitions, COPY evaluates each virtual column key name during data load and automatically populates real columns with the values from their virtual column counterparts. For columns of scalar types, COPY also loads the keys and values as part of the VMap data in the __raw__ column. For columns of complex types, COPY does not add the values to the __raw__ column.

For example, continuing with the JSON example:

  1. Create a flex table with a column definition of one of the keys in the data you will load:

    => CREATE FLEX TABLE darkdata1 ("user.lang" VARCHAR);
    CREATE TABLE
    
  2. Load data into the table:

    => COPY darkdata1 FROM '/test/vertica/flextable/DATA/tweets_12.json' PARSER fjsonparser();
     Rows Loaded
    -------------
              12
    (1 row)
    
  3. Query the real column directly:

    => SELECT "user.lang" FROM darkdata1;
     user.lang
    -----------
     es
     es
     tr
     it
     en
     en
     en
     en
    (12 rows)
    

    Empty column rows indicate NULL values. For more information about how NULLs are handled in flex tables, see NULL value.

  4. You can query for other virtual columns with similar results:

    => SELECT "user.name" FROM darkdata1;
          user.name
    ---------------------
     I'm Toasterâ¥
     Flu Beach
     seydo shi
     The End
     Uptown gentleman.
     ~G A B R I E L A â¿
     Frederick Danjou
     laughing at clouds.
    (12 rows)
    

Handling default values during loading

You can create a flex table with a real column, named for a virtual column that exists in your incoming data. For example, if the data you load has a user.lang virtual column, define the flex table with that column. You can also specify a default column value when creating the flex table with real columns. The next example shows how to define a real column (user.lang), which has a default value from a virtual column (user.name):

=> CREATE FLEX TABLE darkdata1 ("user.lang" LONG VARCHAR default "user.name");

When you load data into a flex table, COPY uses values from the flex table data, ignoring the default column definition. Loading data into a flex table requires MAPLOOKUP to find keys that match any real column names. A match exists when the incoming data has a virtual column with the same name as a real column. When COPY detects a match, it populates the column with values. COPY returns either a value or NULL for each row, so real columns always have values.

For example, after creating the flex table described in the previous example, load data with COPY:


=> COPY darkdata1 FROM '/test/vertica/flextable/DATA/tweets_12.json' PARSER fjsonparser();
 Rows Loaded
-------------
          12
(1 row)

If you query the table after loading, the data shows that the values for the user.lang column were extracted:

  • From the data being loaded — values for the user.lang virtual column

  • With NULL — rows without values

In this case, the table column default value for user.lang was ignored:

=> SELECT "user.lang" FROM darkdata1;
 user.lang
-----------
 it
 en
 es
 en
 en
 es
 tr
 en
(12 rows)

Using COPY to specify default column values

You can add an expression to a COPY statement to specify default column values when loading data. For flex tables, specifying any column information requires that you list the __raw__ column explicitly. The following example shows how to use an expression for the default column value. In this case, loading populates the defined user.lang column with data from the input data user.name values:

=> COPY darkdata1(__raw__, "user.lang" as "user.name"::VARCHAR)
   FROM '/test/vertica/flextable/DATA/tweets_12.json' PARSER fjsonparser();
 Rows Loaded
-------------
          12
(1 row)
=> SELECT "user.lang" FROM darkdata1;
      user.lang
---------------------
 laughing at clouds.
 Avita Desai
 I'm Toasterâ¥
 Uptown gentleman.
 ~G A B R I E L A â¿
 Flu Beach
 seydo shi
 The End
(12 rows)

You can specify default values when adding columns, as described in Altering Flex Tables. When you do so, a different behavior results. For more information about using COPY, its expressions and parameters, see Getting Data into Vertica and the COPY reference page.