Understanding flex tables
The term unstructured data (sometimes called semi-structured or dark data) does not indicate that the data you load into flex tables is entirely without structure. However, you may not know the data's composition or the inconsistencies of its design. In some cases, the data might not be relational.
Your data might have some structure (like JSON and delimited data). Data might be semi-structured or stringently structured, but in ways that you either do not know about or do not expect. The term flexible data encompasses these and other kinds of data. You can load your flexible data directly into a flex table and query its contents.
To summarize, you can load data first, without knowing its structure, and then query its content after a few simple transformations. In some cases, you already know the data structure, such as some keys in the data. If so, you can query these values explicitly as soon as you load the data.
Storing flex table data
While you can store unstructured data in a flex table __raw__
column, that column is implemented as a real column.
Vertica compresses __raw__
column data by about one half (1/2). While this factor is less than the compression rate for real columns, the reduction is significant for large amounts (more than 1TB) of unstructured data. After compression is complete, Vertica writes the data to disk. This approach maintains K-safety in your cluster and supports standard recovery processes should node failures occur. Flex tables are included in full backups (or, if you choose, in object-level backups).
What happens when you create flex tables?
Whenever you execute a CREATE FLEX TABLE statement, Vertica creates three objects, as follows:
-
The flexible table (
flex_table
) -
An associated keys table (
flex_table
_keys
) -
A default view for the main table (
flex_table
_view
)
The _keys
and _view
objects are dependents of the parent, flex_table
. Dropping the flex table also removes its dependents, although you can drop the _keys
or _view
objects independently.
You can create a flex table without specifying any column definitions. When you do so, Vertica automatically creates two tables, the named flex table (such as darkdata
) and its associated keys table, darkdata_keys
:
=> CREATE flex table darkdata();
CREATE TABLE
=> \dt dark*
List of tables
Schema | Name | Kind | Owner | Comment
--------+---------------+-------+---------+---------
public | darkdata | table | dbadmin |
public | darkdata_keys | table | dbadmin |
(2 rows)
Each flex table has two default columns, __raw__
and __identity__
. The __raw__
column exists in every flex table to hold the data you load. The __identity__
column is auto-incrementing. Vertica uses the __identity__
column for segmentation and sort order when no other column definitions exist. The flex keys table has three columns:
=> SELECT * FROM darkdata_keys;
key_name | frequency | data_type_guess
----------+-----------+-----------------
(0 rows)
If you define columns when creating a flex table, Vertica still creates the __raw__
column. However, the table has no __identity__
column because columns are specified for segmentation and sort order. Two tables are created automatically, as shown in the following example:
=> CREATE FLEX TABLE darkdata1 (name VARCHAR);
CREATE TABLE
=> SELECT * FROM darkdata1;
__raw__ | name
---------+------
(0 rows)
=> \d darkdata1*
List of Fields by Tables
Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key
--------+-----------+---------+------------------------+--------+---------+----------+-------------+-------------
public | darkdata1 | __raw__ | long varbinary(130000) | 130000 | | t | f |
public | darkdata1 | name | varchar(80) | 80 | | f | f |
(2 rows)
=> \dt darkdata1*
List of tables
Schema | Name | Kind | Owner | Comment
--------+----------------+-------+---------+---------
public | darkdata1 | table | dbadmin |
public | darkdata1_keys | table | dbadmin |
(2 rows)
For more examples, see Creating flex tables.
Creating superprojections automatically
In addition to creating two tables for each flex table, Vertica creates superprojections for both the main flex table and its associated keys table. Using the \dj
command, you can display the projections created automatically for the darkdata
and darkdata1
tables in this set of examples:
=> \dj darkdata*
List of projections
Schema | Name | Owner | Node | Comment
--------+-------------------------+---------+------------------+---------
public | darkdata1_b0 | dbadmin | |
public | darkdata1_b1 | dbadmin | |
public | darkdata1_keys_super | dbadmin | v_vmart_node0001 |
public | darkdata1_keys_super | dbadmin | v_vmart_node0003 |
public | darkdata1_keys_super | dbadmin | v_vmart_node0004 |
public | darkdata_b0 | dbadmin | |
public | darkdata_b1 | dbadmin | |
public | darkdata_keys__super | dbadmin | v_vmart_node0001 |
public | darkdata_keys_super | dbadmin | v_vmart_node0003 |
public | darkdata_keys_super | dbadmin | v_vmart_node0004 |
(10 rows)
Default flex table view
When you create a flex table, you also create a default view. This default view uses the table name with a _view
suffix. If you query the default view, you are prompted to use the COMPUTE_FLEXTABLE_KEYS_AND_BUILD_VIEW function. This view enables you to update the view after you load data so that it includes all keys and values:
=> \dv darkdata*
List of View Fields
Schema | View | Column | Type | Size
--------+----------------+--------+--------------+------
public | darkdata_view | status | varchar(124) | 124
public | darkdata1_view | status | varchar(124) | 124
(2 rows)
For more information, see Updating flex table views.
Flex functions
Three sets of functions support flex tables and extracting data into VMaps:
Using clients with flex tables
You can use the Vertica-supported client drivers with flex tables as follows:
-
To load data into a flex table, you can use the INSERT or COPY LOCAL statement with the appropriate flex parser.
-
The driver metadata APIs return only real columns. For example, if you select
*
from a flex table with a single materialized column, the statement returns that column and the__raw__
. However, it does not return virtual columns from within__raw__
. To access virtual columns and their values, query the associated_keys
table, just as you would invsql
.