Understanding flex tables

The term (sometimes called or data) does not indicate that the data you load into flex tables is entirely without structure.

The term unstructured data (sometimes called semi-structured or dark data) does not indicate that the data you load into flex tables is entirely without structure. However, you may not know the data's composition or the inconsistencies of its design. In some cases, the data might not be relational.

Your data might have some structure (like JSON and delimited data). Data might be semi-structured or stringently structured, but in ways that you either do not know about or do not expect. The term flexible data encompasses these and other kinds of data. You can load your flexible data directly into a flex table and query its contents.

To summarize, you can load data first, without knowing its structure, and then query its content after a few simple transformations. In some cases, you already know the data structure, such as some keys in the data. If so, you can query these values explicitly as soon as you load the data.

Storing flex table data

While you can store unstructured data in a flex table __raw__ column, that column is implemented as a real column.

Vertica compresses __raw__ column data by about one half (1/2). While this factor is less than the compression rate for real columns, the reduction is significant for large amounts (more than 1TB) of unstructured data. After compression is complete, Vertica writes the data to disk. This approach maintains K-safety in your cluster and supports standard recovery processes should node failures occur. Flex tables are included in full backups (or, if you choose, in object-level backups).

What happens when you create flex tables?

Whenever you execute a CREATE FLEX TABLE statement, Vertica creates three objects, as follows:

  • The flexible table (flex_table)

  • An associated keys table (flex_table_keys)

  • A default view for the main table (flex_table_view)

The _keys and _view objects are dependents of the parent, flex_table. Dropping the flex table also removes its dependents, although you can drop the _keys or _view objects independently.

You can create a flex table without specifying any column definitions. When you do so, Vertica automatically creates two tables, the named flex table (such as darkdata) and its associated keys table, darkdata_keys:

=> CREATE flex table darkdata();
=> \dt dark*
                List of tables
 Schema |     Name      | Kind  |  Owner  | Comment
 public | darkdata      | table | dbadmin |
 public | darkdata_keys | table | dbadmin |
(2 rows)

Each flex table has two default columns, __raw__ and __identity__. The __raw__ column exists in every flex table to hold the data you load. The __identity__ column is auto-incrementing. Vertica uses the __identity__ column for segmentation and sort order when no other column definitions exist. The flex keys table has three columns:

=> SELECT * FROM darkdata_keys;
 key_name | frequency | data_type_guess
(0 rows)

If you define columns when creating a flex table, Vertica still creates the __raw__ column. However, the table has no __identity__ column because columns are specified for segmentation and sort order. Two tables are created automatically, as shown in the following example:

=> CREATE FLEX TABLE darkdata1 (name VARCHAR);

=> SELECT * FROM darkdata1;
 __raw__ | name
(0 rows)

=> \d darkdata1*
                                            List of Fields by Tables
 Schema |   Table   | Column  |          Type          |  Size  | Default | Not Null | Primary Key | Foreign Key
 public | darkdata1 | __raw__ | long varbinary(130000) | 130000 |         | t        | f           |
 public | darkdata1 | name    | varchar(80)            |     80 |         | f        | f           |
(2 rows)

=> \dt darkdata1*
                   List of tables
 Schema |      Name      | Kind  |  Owner  | Comment
 public | darkdata1      | table | dbadmin |
 public | darkdata1_keys | table | dbadmin |
(2 rows)

For more examples, see Creating flex tables.

Creating superprojections automatically

In addition to creating two tables for each flex table, Vertica creates superprojections for both the main flex table and its associated keys table. Using the \dj command, you can display the projections created automatically for the darkdata and darkdata1 tables in this set of examples:

=> \dj darkdata*
                           List of projections
 Schema |          Name           |  Owner  |       Node       | Comment
 public | darkdata1_b0            | dbadmin |                  |
 public | darkdata1_b1            | dbadmin |                  |
 public | darkdata1_keys_super    | dbadmin | v_vmart_node0001 |
 public | darkdata1_keys_super    | dbadmin | v_vmart_node0003 |
 public | darkdata1_keys_super    | dbadmin | v_vmart_node0004 |
 public | darkdata_b0             | dbadmin |                  |
 public | darkdata_b1             | dbadmin |                  |
 public | darkdata_keys__super    | dbadmin | v_vmart_node0001 |
 public | darkdata_keys_super     | dbadmin | v_vmart_node0003 |
 public | darkdata_keys_super     | dbadmin | v_vmart_node0004 |
(10 rows)

Default flex table view

When you create a flex table, you also create a default view. This default view uses the table name with a _view suffix. If you query the default view, you are prompted to use the COMPUTE_FLEXTABLE_KEYS_AND_BUILD_VIEW function. This view enables you to update the view after you load data so that it includes all keys and values:

=> \dv darkdata*
                  List of View Fields
 Schema |      View      | Column |     Type     | Size
 public | darkdata_view  | status | varchar(124) |  124
 public | darkdata1_view | status | varchar(124) |  124
(2 rows)

For more information, see Updating flex table views.

Flex functions

Three sets of functions support flex tables and extracting data into VMaps:

Using clients with flex tables

You can use the Vertica-supported client drivers with flex tables as follows:

  • To load data into a flex table, you can use the INSERT or COPY LOCAL statement with the appropriate flex parser.

  • The driver metadata APIs return only real columns. For example, if you select * from a flex table with a single materialized column, the statement returns that column and the __raw__. However, it does not return virtual columns from within __raw__. To access virtual columns and their values, query the associated _keys table, just as you would in vsql.