Flexible complex types

When defining tables, you can use strongly-typed complex types to fully describe any combination of structs and arrays.

When defining tables, you can use strongly-typed complex types to fully describe any combination of structs and arrays. However, there are times when you might prefer not to:

  • If the data contains a struct with a very large number of fields, and in your queries you will need only a few of them, you can avoid having to enumerate the rest in the table DDL. Further, a deeply-nested set of structs could exceed the nesting limit for the table if fully specified.

  • If the data schema is still evolving, you can delay finalizing strongly-typed DDL.

  • If you anticipate the introduction of new fields in the data, you can use flexible types to discover them. A table with strong typing, on the other hand, would silently ignore those values. For an example of using flexible types to discover new fields, see Strong and Flexible Typing.

Flexible types are a way to store complex or unstructured data as a binary blob in one column, in a way that allows access to individual elements of that data. This is the same approach that Vertica uses with flex tables, which support loading unstructured or semi-structured data. In a flex table, all data from a source is loaded into a single VMap column named __raw__. From this column you can materialize other columns, such as a specific field in JSON data, or use special lookup functions in queries to read values directly out of the __raw__ column.

Vertica uses a similar approach with complex types. You can describe the types fully using the ROW and ARRAY types in your table definition, or you can instead treat a complex type as a flexible type and not fully describe it. Each complex type that you choose to treat this way becomes its own flex-style column. You are not limited to a single column containing all data as in flex tables; instead, you can treat any complex type column, no matter how deeply it nests other types, as one flex-like column.

Defining flexible columns

To use a flexible complex type, declare the column as LONG VARBINARY. You might also need to set other parameters in the parser, as described in the parser documentation.

Consider a Parquet file with a restaurants table and the following columns:

  • name: varchar

  • cuisine type: varchar

  • location (cities): array[varchar]

  • menu: array of structs, each struct having an item name and a price

This data contains two complex columns, location (an array) and menu (an array of structs). The following example defines both columns as flexible columns by using LONG VARBINARY:

=> CREATE EXTERNAL TABLE restaurants(name VARCHAR, cuisine VARCHAR,
        location_city LONG VARBINARY, menu LONG VARBINARY)
    AS COPY FROM '/data/rest*.parquet'
    PARQUET(allow_long_varbinary_match_complex_type='True');

The allow_long_varbinary_match_complex_type parameter is specific to the Parquet parser. It is required if you define any column as a flexible type. Without this parameter, Vertica tries to match the LONG VARBINARY declaration in the table to a VARBINARY column in the Parquet file, finds a complex type instead, and reports a data-type mismatch.

You need not treat all complex columns as flexible types. The following definition is also valid:

=> CREATE EXTERNAL TABLE restaurants(
            name VARCHAR, cuisine VARCHAR,
            location_city ARRAY[VARCHAR,50],
            menu LONG VARBINARY)
    AS COPY FROM '/data/rest*.parquet'
    PARQUET(allow_long_varbinary_match_complex_type='True');

For many common data formats, you can use the INFER_TABLE_DDL function to derive a table definition from a data file. This function uses strong typing for complex types in almost all cases.

Querying flexible columns

Flexible columns are stored as LONG VARBINARY, so selecting them directly produces unhelpful results. Instead, use the flex mapping functions to extract values from these columns. The MAPTOSTRING function translates the complex type to JSON, as shown in the following example:

=> SELECT name, location_city, MAPTOSTRING(menu) AS menu FROM restaurants;
       name        |       location_city        |                    menu
-------------------+---------+------------------+----------------------------------------
 Bob's pizzeria    | ["Cambridge","Pittsburgh"] | {
    "0": {
        "item": "cheese pizza",
        "price": "$8.25"
    },
    "1": {
        "item": "spinach pizza",
        "price": "$10.50"
    }
}
 Bakersfield Tacos | ["Pittsburgh"]             | {
    "0": {
        "item": "veggie taco",
        "price": "$9.95"
    },
    "1": {
        "item": "steak taco",
        "price": "$10.95"
    }
}
(2 rows)

The menu column is an array of structs. Notice that the output is a set of key/value pairs, with the key being the array index. Bob's Pizzeria has two items on its menu, and each value is a struct. The first item ("0") is a struct with an "item" value of "cheese pizza" and a "price" of "$8.25".

You can use keys to access specific values. The following example selects the first menu item from each restaurant. Note that all keys are strings, even array indexes:

=> SELECT name, location_city, menu['0']['item'] AS item, menu['0']['price'] AS price FROM restaurants;
       name        |       location_city        |     item     | price
-------------------+----------------------------+--------------+-------
 Bob's pizzeria    | ["Cambridge","Pittsburgh"] | cheese pizza | $8.25
 Bakersfield Tacos | ["Pittsburgh"]             | veggie taco  | $9.95
(2 rows)

Instead of accessing specific indexes, you can use the MAPITEMS function in a subquery to explode a flexible type, as in the following example:

=>  SELECT name, location_city, menu_items['item'], menu_items['price']
    FROM (SELECT mapitems(menu, name, location_city) OVER(PARTITION BEST)
         AS (indexes, menu_items, name, location_city)
    FROM restaurants) explode_menu;
       name        |       location_city        |  menu_items   | menu_items
-------------------+----------------------------+---------------+------------
 Bob's pizzeria    | ["Cambridge","Pittsburgh"] | cheese pizza  | $8.25
 Bob's pizzeria    | ["Cambridge","Pittsburgh"] | spinach pizza | $10.50
 Bakersfield Tacos | ["Pittsburgh"]             | veggie taco   | $9.95
 Bakersfield Tacos | ["Pittsburgh"]             | steak taco    | $10.95
(4 rows)

For a complete list of flex mapping functions, see Flex data functions.

JSON and Avro flexible types

The parsers for JSON and Avro support both flexible and strong types for complex types. When using flexible complex types or loading into a flex table, use the flatten_maps and flatten_arrays parameters to control how the parser handles complex data. These parsers ignore these parameters for strongly-typed complex types.

The following example demonstrates the use of flexible complex types. Consider a JSON file containing the following data:

{
    "name" : "Bob's pizzeria",
    "cuisine" : "Italian",
    "location_city" : ["Cambridge", "Pittsburgh"],
    "menu" : [{"item" : "cheese pizza", "price" : "$8.25"},
              {"item" : "spinach pizza", "price" : "$10.50"}]
}
{
    "name" : "Bakersfield Tacos",
    "cuisine" : "Mexican",
    "location_city" : ["Pittsburgh"],
    "menu" : [{"item" : "veggie taco", "price" : "$9.95"},
              {"item" : "steak taco", "price" : "$10.95"}]
}

Create a table, using LONG VARBINARY for the flexible complex types, and load data specifying these parameters:

=> CREATE TABLE restaurant(name VARCHAR, cuisine VARCHAR,
        location_city LONG VARBINARY, menu LONG VARBINARY);

=> COPY restaurant FROM '/data/restaurant.json'
   PARSER FJSONPARSER(flatten_maps=false, flatten_arrays=false);

You can use Flex functions and direct access (through indexes) to return readable values:

=> SELECT MAPTOSTRING(location_city), MAPTOSTRING(menu) FROM restaurant;
                   maptostring                   |             maptostring
-------------------------------------------------+--------------------------------------------------------
 {
    "0": "Cambridge",
    "1": "Pittsburgh"
} | {
    "0": {
        "item": "cheese pizza",
        "price": "$8.25"
    },
    "1": {
        "item": "spinach pizza",
        "price": "$10.50"
    }
}
 {
    "0": "Pittsburgh"
}                       | {
    "0": {
        "item": "veggie taco",
        "price": "$9.95"
    },
    "1": {
        "item": "steak taco",
        "price": "$10.95"
    }
}
(2 rows)

=> SELECT menu['0']['item'] FROM restaurant;
     menu
--------------
 cheese pizza
 veggie taco
(2 rows)

The COPY statement shown in this example sets flatten_maps to false. Without that change, the keys for the complex columns would not work as expected, because record and array keys would be "flattened" at the top level. Querying menu['0']['item'] would produce no results. Instead, query flattened values as in the following example:

=> SELECT menu['0.item'] FROM restaurant;
     menu
--------------
 veggie taco
 cheese pizza
(2 rows)

Flattening directives apply to the entire COPY statement. You cannot flatten some columns and not others, or prevent flattening values in a complex column that is itself within a flattened flex table. Because flexible complex types and strongly-typed complex types require different values for flattening, you cannot combine strong and flexible complex types in the same load operation.