Getting started

The following tutorial demonstrates the basics of creating, exploring, and using flex tables.

The following tutorial demonstrates the basics of creating, exploring, and using flex tables.

These examples use a data file named mountains.json with the following contents:

{"name": "Everest", "type":"mountain", "height":29029, "hike_safety": 34.1}
{"name": "Mt St Helens", "type":"volcano", "height":29029, "hike_safety": 15.4}
{"name": "Denali", "type":"mountain", "height":17000, "hike_safety": 12.2}
{"name": "Kilimanjaro", "type":"mountain", "height":14000 }
{"name": "Mt Washington", "type":"mountain", "hike_safety": 50.6}

Create a flex table

Create a flex table and load the sample data as follows:

=> CREATE FLEX TABLE mountains();

=> COPY mountains FROM '/home/dbadmin/data/mountains.json'
    PARSER FJSONPARSER();
 Rows Loaded
-------------
           5
(1 row)

You can now query this table, just as if you had created the table with explicit columns:

=> SELECT name, type, height FROM mountains;
     name      |   type   | height
---------------+----------+--------
 Everest       | mountain | 29029
 Mt St Helens  | volcano  | 29029
 Denali        | mountain | 17000
 Kilimanjaro   | mountain | 14000
 Mt Washington | mountain |
(5 rows)

The __raw__ column

Data loaded into a flex table is added to a special column named __raw__, which holds a map of all the keys and values. The following example shows this column, with carriage returns added for readability:

=> \x
Expanded display is on.
=> SELECT * FROM mountains;
[ RECORD 1 ]+----------------------------------------------------------------------------
__identity__ | 1
__raw__      | \001\000\000\000,\000\000\000\004\000\000\000\024\000\000\000\031\000\000\000\
035\000\000\000$\000\000\0002902934.1Everestmountain\004\000\000\000\024\000\000\000\032\000\
000\000%\000\000\000)\000\000\000heighthike_safetynametype
[ RECORD 2 ]+----------------------------------------------------------------------------
__identity__ | 2
__raw__      | \001\000\000\0000\000\000\000\004\000\000\000\024\000\000\000\031\000\000\000\
035\000\000\000)\000\000\0002902915.4Mt St Helensvolcano\004\000\000\000\024\000\000\000\032\000\
000\000%\000\000\000)\000\000\000heighthike_safetynametype
[ RECORD 3 ]+----------------------------------------------------------------------------
__identity__ | 3
__raw__      | \001\000\000\000+\000\000\000\004\000\000\000\024\000\000\000\031\000\000\000\
035\000\000\000#\000\000\0001700012.2Denalimountain\004\000\000\000\024\000\000\000\032\000\000
\000%\000\000\000)\000\000\000heighthike_safetynametype
[ RECORD 4 ]+----------------------------------------------------------------------------
__identity__ | 4
__raw__      | \001\000\000\000(\000\000\000\003\000\000\000\020\000\000\000\025\000\000\000\
000\000\00014000Kilimanjaromountain\003\000\000\000\020\000\000\000\026\000\000\000\032\000\
000\000heightnametype
[ RECORD 5 ]+----------------------------------------------------------------------------
__identity__ | 5
__raw__      | \001\000\000\000)\000\000\000\003\000\000\000\020\000\000\000\024\000\000\000\
000\000\00050.6Mt Washingtonmountain\003\000\000\000\020\000\000\000\033\000\000\000\037\000\
000\000hike_safetynametype

As this example demonstrates, querying __raw__ directly is not generally helpful. Instead, you can use functions to extract values from it. Use the MAPTOSTRING function to see this column in a more readable JSON format:

=> SELECT MAPTOSTRING(__raw__) FROM mountains;
                                         MAPTOSTRING
------------------------------------------------------------------------------------------
  {
   "height" : "29029",
   "hike_safety" : "34.1",
   "name" : "Everest",
   "type" : "mountain"
}

 {
   "height" : "29029",
   "hike_safety" : "15.4",
   "name" : "Mt St Helens",
   "type" : "volcano"
}

 {
   "height" : "17000",
   "hike_safety" : "12.2",
   "name" : "Denali",
   "type" : "mountain"
}

 {
   "height" : "14000",
   "name" : "Kilimanjaro",
   "type" : "mountain"
}

 {
   "hike_safety" : "50.6",
   "name" : "Mt Washington",
   "type" : "mountain"
}

You can use the COMPUTE_FLEXTABLE_KEYS function to extract information about the keys in the __raw__ column into a new table. The new table shows the key names, frequencies, and inferred data types:

=> SELECT COMPUTE_FLEXTABLE_KEYS('mountains');
    COMPUTE_FLEXTABLE_KEYS
---------------------------------------------------
Please see public.mountains_keys for updated keys
(1 row)

=> SELECT * FROM public.mountains_keys;
  key_name   | frequency | data_type_guess
-------------+-----------+-----------------
 height      |         4 | Integer
 hike_safety |         4 | Numeric(6,2)
 name        |         5 | Varchar(26)
 type        |         5 | Varchar(20)
(4 rows)

Vertica generates this table automatically when you create a flex table.

Review data types

Vertica infers data types from the raw data. You can adjust these types, either to be more specific or more general. The example that follows changes the NUMERIC value to FLOAT. Note also that VARCHAR sizes are based on the strings in the data and might not be large enough for future data. You should review and adjust data types before loading more data into a new flex table.

When you create a flex table, you can create a corresponding view. If you make changes in the data types in the keys table, regenerate the view. The following example demonstrates this process.

First, create the view:

=> SELECT BUILD_FLEXTABLE_VIEW('mountains');
                 BUILD_FLEXTABLE_VIEW
------------------------------------------------------
 The view public.mountains_view is ready for querying
(1 row)

=> SELECT * FROM public.mountains_view;
     name      |   type   | height | hike_safety
---------------+----------+--------+-------------
 Everest       | mountain |  29029 |       34.10
 Mt St Helens  | volcano  |  29029 |       15.40
 Denali        | mountain |  17000 |       12.20
 Kilimanjaro   | mountain |  14000 |
 Mt Washington | mountain |        |       50.60
(5 rows)

Query the VIEW_COLUMNS system table to review the columns and data types:

=> SELECT column_name, data_type FROM VIEW_COLUMNS
   WHERE table_name = 'mountains_view';
 column_name |  data_type
-------------+--------------
 name        | varchar(26)
 type        | varchar(20)
 height      | int
 hike_safety | numeric(6,2)
(4 rows)

The view has the same data types as the keys table. Now update one of these data types in the keys table. After making the change, rebuild the view:

=> UPDATE mountains_keys SET data_type_guess = 'float' WHERE key_name = 'hike_safety';
 OUTPUT
--------
      1
(1 row)

=> COMMIT;

=> SELECT BUILD_FLEXTABLE_VIEW('mountains');
                 BUILD_FLEXTABLE_VIEW
------------------------------------------------------
 The view public.mountains_view is ready for querying
(1 row)

Now, repeat the VIEW_COLUMNS query and note the changed type:

=> SELECT column_name, data_type FROM VIEW_COLUMNS WHERE table_name = 'mountains_view';
 column_name |  data_type
-------------+-------------
 name        | varchar(26)
 type        | varchar(20)
 height      | int
 hike_safety | float
(4 rows)

Create a hybrid flex table

If you already know that some of the data you load and query regularly needs full Vertica performance and support, you can create a hybrid flex table. A hybrid flex table has one or more real columns that you define, as well as the __raw__ column that holds all of the data. Querying real columns is faster than querying the __raw__ column. You can define default values for the columns.

When creating a hybrid flex table, specify real columns as you would for other tables:

=> CREATE FLEX TABLE mountains_hybrid(
    name VARCHAR(41) DEFAULT name::VARCHAR(41),
    hike_safety FLOAT DEFAULT hike_safety::float);

=> COPY mountains_hybrid FROM '/home/dbadmin/Downloads/mountains.json'
    PARSER FJSONPARSER();
 Rows Loaded
-------------
           5
(1 row)

You can compute keys and build the view in a single step:

=> SELECT COMPUTE_FLEXTABLE_KEYS_AND_BUILD_VIEW('mountains_hybrid');
                                        COMPUTE_FLEXTABLE_KEYS_AND_BUILD_VIEW
------------------------------------------------------------------------------------------------
 Please see public.mountains_hybrid_keys for updated keys
The view public.mountains_hybrid_view is ready for querying
(1 row)

The keys table shows the data types you declared for the real columns and inferred types for other fields:

=> SELECT * FROM mountains_hybrid_keys;
  key_name   | frequency | data_type_guess
-------------+-----------+-----------------
 height      |         4 | Integer
 hike_safety |         4 | float
 name        |         5 | varchar(41)
 type        |         5 | Varchar(20)
(4 rows)

If you create a basic flex table, you can later promote one or more virtual columns to real columns by using ALTER TABLE. See Materializing flex tables.

Materialize virtual columns in a hybrid flex table

After you explore your flex table data, you can promote one or more virtual columns in your flex table to real columns. You do not need to create a separate columnar table.

The MATERIALIZE_FLEXTABLE_COLUMNS function creates real columns for a specified number of virtual columns, starting with the most-used:

=> SELECT MATERIALIZE_FLEXTABLE_COLUMNS('mountains_hybrid', 3);
                                 MATERIALIZE_FLEXTABLE_COLUMNS
--------------------------------------------------------------------------------------------
 The following columns were added to the table public.mountains_hybrid:
    type
    height
For more details, run the following query:
SELECT * FROM v_catalog.materialize_flextable_columns_results WHERE table_schema = 'public' and table_name = 'mountains_hybrid';

(1 row)

The call specified three columns to materialize, but the table was created with two real columns, leaving only two more virtual columns. After materializing columns, you can check the results in the MATERIALIZE_FLEXTABLE_COLUMNS_RESULTS system table. The following output shows two columns with a status of ADDED and one with a status of EXISTS:

=> \x
Expanded display is on.
=> SELECT table_id, table_schema, table_name, key_name, status, message FROM MATERIALIZE_FLEXTABLE_COLUMNS_RESULTS
WHERE table_name = 'mountains_hybrid';
-[ RECORD 1 ]+-------------------------------------------------------
table_id     | 45035996273708192
table_schema | public
table_name   | mountains_hybrid
key_name     | type
status       | ADDED
message      | Added successfully
-[ RECORD 2 ]+-------------------------------------------------------
table_id     | 45035996273708192
table_schema | public
table_name   | mountains_hybrid
key_name     | height
status       | ADDED
message      | Added successfully
-[ RECORD 3 ]+-------------------------------------------------------
table_id     | 45035996273708192
table_schema | public
table_name   | mountains_hybrid
key_name     | name
status       | EXISTS
message      | Column of same name already exists in table definition

View the definition of the hybrid table after these changes:

=> \d mountains_hybrid
List of Fields by Tables
-[ RECORD 1 ]-------------------------------------------------------------------
Schema      | public
Table       | mountains_hybrid
Column      | __raw__
Type        | long varbinary(130000)
Size        | 130000
Default     |
Not Null    | t
Primary Key | f
Foreign Key |
-[ RECORD 2 ]-------------------------------------------------------------------
Schema      | public
Table       | mountains_hybrid
Column      | name
Type        | varchar(41)
Size        | 41
Default     | (public.MapLookup(mountains_hybrid.__raw__, 'name'))::varchar(41)
Not Null    | f
Primary Key | f
Foreign Key |
-[ RECORD 3 ]-------------------------------------------------------------------
Schema      | public
Table       | mountains_hybrid
Column      | hike_safety
Type        | float
Size        | 8
Default     | (public.MapLookup(mountains_hybrid.__raw__, 'hike_safety'))::float
Not Null    | f
Primary Key | f
Foreign Key |
-[ RECORD 4 ]-------------------------------------------------------------------
Schema      | public
Table       | mountains_hybrid
Column      | type
Type        | varchar(20)
Size        | 20
Default     | (public.MapLookup(mountains_hybrid.__raw__, 'type'))::!varchar(20)
Not Null    | f
Primary Key | f
Foreign Key |
-[ RECORD 5 ]-------------------------------------------------------------------
Schema      | public
Table       | mountains_hybrid
Column      | height
Type        | int
Size        | 8
Default     | (public.MapLookup(mountains_hybrid.__raw__, 'height'))::!int
Not Null    | f
Primary Key | f
Foreign Key |