FAVROPARSER
Parses data from an Avro file. The following requirements apply:
-
Avro files must be encoded in the Avro binary serialization encoding format, described in the Apache Avro standard. The parser also supports Snappy and deflate compression.
-
FAVROPARSER does not support Avro files with separate schema files. The Avro file must include the schema.
You can load complex types in the Avro source (arrays, structs, or combinations) with strong typing or as flexible complex types. A flexible complex type is loaded into a VMap column, as in flex tables. To load complex types as VMap columns, specify a column type of LONG VARBINARY. To preserve the indexing in complex types, set flatten_maps
to false.
This parser can notify you if it finds keys in the data that are not part of the table definition. See Unmatched Keys.
When loading into a flex table, Vertica loads all data into the __raw__
(VMap) column, including complex types found in the data.
This parser does not support apportioned load or cooperative parse.
Syntax
FAVROPARSER ( [parameter=value[,...]] )
Parameters
flatten_maps
- Boolean, whether to flatten all Avro maps. Key names are concatenated with nested levels. This value is recursive and affects all data in the load.
This parameter applies only to flex tables or VMap columns and is ignored when loading strongly-typed complex types.
Default: true
flatten_arrays
- Boolean, whether to flatten all Avro arrays. Key names are concatenated with nested levels. This value is recursive and affects all data in the load.
This parameter applies only to flex tables or VMap columns and is ignored when loading strongly-typed complex types.
Default: false
flatten_records
- Boolean, whether to flatten all Avro records. Key names are concatenated with nested levels. This value is recursive and affects all data in the load.
This parameter applies only to flex tables or VMap columns and is ignored when loading strongly-typed complex types.
Default: true
reject_on_materialized_type_error
Boolean, whether to reject a data row that contains a materialized column value that cannot be coerced into a compatible data type. If the value is false and the type cannot be coerced, the parser sets the value in that column to NULL.
If the column is an array and the data to be loaded is too large, then false sets the column value to NULL and true rejects the row.
If the column is a strongly-typed complex type, as opposed to a flexible complex type, then a type mismatch anywhere in the complex type causes the entire column to be treated as a mismatch. The parser does not partially load complex types.
Default: false
suppress_warnings
- String, which warnings to suppress:
-
unmatched_key
(see Unmatched Keys) -
true
ort
(suppress all warnings) -
false
orf
(do not suppress warnings)
Default: false
-
Primitive data types
FAVROPARSER supports the following primitive data types, including as element types and field values in complex types.
AVRO Data Type | Vertica Data Type | Value |
---|---|---|
NULL | NULL value | No value |
boolean | Boolean data type | A binary value |
int | INTEGER | 32-bit signed integer |
long | INTEGER | 64-bit signed integer |
float |
Synonymous with 64-bit IEEE FLOAT |
Single precision (32-bit) IEEE 754 floating-point number |
double | DOUBLE PRECISION (FLOAT) | Double precision (64-bit) IEEE 754 floating-point number |
bytes | VARBINARY | Sequence of 8-bit unsigned bytes |
string | VARCHAR | Unicode character sequence |
Note
Vertica does not have an explicit 4-byte (32-bit integer) or smaller types. Instead, Vertica encoding and compression automatically eliminate the storage overhead of values that require less than 64 bits.Avro logical types
FAVROPARSER supports the following Avro logical types. The target column must use a Vertica data type that supports the logical type. When you attempt to load data using an invalid logical type, the logical type is ignored and the underlying Avro type is used.
AVRO Logical Type | Base Avro Type | Supported Vertica Data Types |
---|---|---|
0 < 0 ≤ |
bytes or fixed |
Vertica rejects the value if:
If the data type for the target column uses the default precision setting, the precision setting in the Avro schema overrides the default. |
date |
integer |
DATE, Character |
time-micros |
long |
The time logical type does not provide a time zone value. For target columns that use the TIMETZ data type, Vertica uses UTC as the default. |
time-millis |
int | |
timestamp-micros |
long |
TIMESTAMP/TIMESTAMPTZ, TIME/TIMETZ For |
timestamp-millis |
long | |
duration |
fixed |
INTERVAL, Character |
Avro complex data types
The Avro format supports several complex data types. When loading into strongly-typed columns, you can use the ROW and ARRAY types to represent them. For example, Avro Record and Enums are structs (ROWs); see the Avro specification.
You can use ARRAY[ROW] to match an Avro map. You must name the ROW fields key
and value
. These are the names that the Avro format uses for those fields in the data, and the parser relies on field names to match data to table columns.
If the total size of an array exceeds the size defined by the target table, the parser sets the value to null.
When loading into flex tables or using flexible complex types, this parser handles Avro complex types as follows:
Record
The name of each field is used as a virtual column name. If flatten_records
is true and several nesting levels are present, Vertica concatenates the record names to create the key name.
Map
The value of each map key is used as a virtual column name. If flatten_maps
is true and several nesting levels are present, Vertica concatenates the key names to create the key name.
Enum
Vertica treats Avro Enums like records, with the name of the Enum as the key and the value as the value.
Array
Vertica treats Avro Arrays as key/value pairs. By default, the index of each element is the key. In the following example, product_detail
is a Record with a field, product_category
, that is an Array:
=> CREATE FLEX TABLE products;
CREATE TABLE
=> COPY products FROM :datafile WITH PARSER FAVROPARSER();
Rows Loaded
-------------
2
(1 row)
=> SELECT MAPTOSTRING(__raw__) FROM products ORDER BY __identity__;
maptostring
--------------------------------------------------------------------------------
{
"__name__": "Order",
"customer_id": "111222",
"order_details": {
"0.__name__": "OrderDetail",
"0.product_detail.__name__": "Product",
"0.product_detail.price": "46.21",
"0.product_detail.product_category": {
"0": "electronics",
"1": "printers",
"2": "computers"
},
"0.product_detail.product_description": "hp printer X11ew description :\
P",
"0.product_detail.product_hash": "\u0000\u0001\u0002\u0003\u0004",
"0.product_detail.product_id": "999012",
"0.product_detail.product_map.one": "1.1",
"0.product_detail.product_map.two": "1.1",
"0.product_detail.product_name": "hp printer X11ew",
"0.product_detail.product_status": "ONLY_FEW_LEFT",
"0.quantity": "3",
"0.total": "354.34"
},
"order_id": "2389646",
"total": "132.43"
}
...
If flatten_arrays
is true and several nesting levels are present, Vertica concatenates the indices to create the key name.
=> COPY products FROM :datafile WITH PARSER FAVROPARSER(flatten_arrays=true);
Rows Loaded
-------------
2
(1 row)
=> SELECT MAPTOSTRING(__raw__) FROM products ORDER BY __identity__;
maptostring
--------------------------------------------------------------------------------
{
"__name__": "Order",
"customer_id": "111222",
"order_details.0.__name__": "OrderDetail",
"order_details.0.product_detail.__name__": "Product",
"order_details.0.product_detail.price": "46.21",
"order_details.0.product_detail.product_category.0": "electronics",
"order_details.0.product_detail.product_category.1": "printers",
"order_details.0.product_detail.product_category.2": "computers",
"order_details.0.product_detail.product_description": "hp printer X11ew des\
cription :P",
"order_details.0.product_detail.product_hash": "\u0000\u0001\u0002\u0003\u0\
004",
"order_details.0.product_detail.product_id": "999012",
"order_details.0.product_detail.product_map.one": "1.1",
"order_details.0.product_detail.product_map.two": "1.1",
"order_details.0.product_detail.product_name": "hp printer X11ew",
"order_details.0.product_detail.product_status": "ONLY_FEW_LEFT",
"order_details.0.quantity": "3",
"order_details.0.total": "354.34",
"order_id": "2389646",
"total": "132.43"
}
...
Union
Vertica treats Avro Unions as arrays.
Unmatched keys
Data being loaded can contain keys that are not part of the table definition. If you are loading into a flex table (or a flexible complex type column), no data is lost. For a table with strongly-defined columns, however, new keys cannot be loaded because the table does not have a place to put them.
This parser emits warnings if it finds new keys and if both of the following are true:
-
The target table is not a flex table.
-
The new key is not nested within a flexible complex type column.
New keys are logged in the UDX_EVENTS system table. If a new key is a complex type with nested keys, only the top-level key is logged. When you see a warning about unmatched keys, you can query this table and then use ALTER TABLE to modify your table definition for future loads.
Querying an external table loads data and thus can trigger these warnings. To prevent them, set the suppress_warnings
parameter to 'unmatched_keys' or 'true':
=> CREATE EXTERNAL TABLE restaurants(
name VARCHAR(50),
menu ARRAY[ROW(item VARCHAR(50), price NUMERIC(8,2)),100])
AS COPY FROM '/data/rest.json'
PARSER FAVROPARSER(suppress_warnings='unmatched_key');
Examples
This example shows how to create and load a flex table with Avro data using favroparser
. After loading the data, you can query virtual columns:
=> CREATE FLEX TABLE avro_basic();
CREATE TABLE
=> COPY avro_basic FROM '/home/dbadmin/data/weather.avro' PARSER FAVROPARSER();
Rows Loaded
-------------
5
(1 row)
=> SELECT station, temp, time FROM avro_basic;
station | temp | time
---------+------+---------------
mohali | 0 | -619524000000
lucknow | 22 | -619506000000
norwich | -11 | -619484400000
ams | 111 | -655531200000
baddi | 78 | -655509600000
(5 rows)
For more information, see Avro data.