FAVROPARSER (parser)
Parses data from an Avro file. The input file must use binary serialization encoding. Use this parser to load data into columnar, flex, and hybrid tables.
You can load complex types in the Avro source (arrays, structs, or combinations) with strong typing or as flexible complex types. A flexible complex type is loaded into a VMap column, as in flex tables. To load complex types as VMap columns, specify a column type of LONG VARBINARY. To preserve the indexing in complex types, set flatten_maps
to false.
When loading into a flex table, Vertica loads all data into the __raw__ (VMap) column, including complex types found in the data.
Note
FAVROPARSER does not support Avro files with separate schema files. The Avro file must have its related schema in the file you are loading.Syntax
FAVROPARSER ( [parameter=value[,...]] )
Parameters
flatten_maps
- Boolean, whether to flatten all Avro maps. Key names are concatenated with nested levels. This value is recursive and affects all data in the load.
This parameter applies only to flex tables or VMap columns and is ignored when loading strongly-typed complex types.
Default: true
flatten_arrays
- Boolean, whether to flatten all Avro arrays. Key names are concatenated with nested levels. This value is recursive and affects all data in the load.
This parameter applies only to flex tables or VMap columns and is ignored when loading strongly-typed complex types.
Default: false
flatten_records
- Boolean, whether to flatten all Avro records. Key names are concatenated with nested levels. This value is recursive and affects all data in the load.
This parameter applies only to flex tables or VMap columns and is ignored when loading strongly-typed complex types.
Default: true
reject_on_materialized_type_error
Boolean, whether to reject a data row that contains a materialized column value that cannot be coerced into a compatible data type. If the value is false and the type cannot be coerced, the parser sets the value in that column to null.
If the column is a strongly-typed complex type, as opposed to a flexible complex type, then a type mismatch anywhere in the complex type causes the entire column to be treated as a mismatch. The parser does not partially load complex types.
Default: false
Primitive data types
FAVROPARSER supports the following primitive data types, including as element types and field values in complex types.
AVRO Data Type | Vertica Data Type | Value |
---|---|---|
NULL | NULL value | No value |
boolean | Boolean data type | A binary value |
int | INTEGER | 32-bit signed integer |
long | INTEGER | 64-bit signed integer |
float |
Synonymous with 64-bit IEEE FLOAT |
Single precision (32-bit) IEEE 754 floating-point number |
double | DOUBLE PRECISION (FLOAT) | Double precision (64-bit) IEEE 754 floating-point number |
bytes | VARBINARY | Sequence of 8-bit unsigned bytes |
string | VARCHAR | Unicode character sequence |
Note
Vertica does not have an explicit 4-byte (32-bit integer) or smaller types. Instead, Vertica encoding and compression automatically eliminate the storage overhead of values that require less than 64 bits.Avro logical types
FAVROPARSER supports the following Avro logical types. The target column must use a Vertica data type that supports the logical type. When you attempt to load data using an invalid logical type, the logical type is ignored and the underlying Avro type is used.
AVRO Logical Type | Base Avro Type | Supported Vertica Data Types |
---|---|---|
0 < 0 ≤ |
bytes or fixed |
Vertica rejects the value if:
If the data type for the target column uses the default precision setting, the precision setting in the Avro schema overrides the default. |
date |
integer |
DATE, Character |
time-micros |
long |
The time logical type does not provide a time zone value. For target columns that use the TIMETZ data type, Vertica uses UTC as the default. |
time-millis |
int | |
timestamp-micros |
long |
TIMESTAMP/TIMESTAMPTZ, TIME/TIMETZ For |
timestamp-millis |
long | |
duration |
fixed |
INTERVAL, Character |
Avro complex data types
The Avro format supports several complex data types. When loading into strongly-typed columns, you can use the ROW and ARRAY types to represent them. For example, Avro Record and Enums are structs (ROWs); see the Avro specification.
You can use ARRAY[ROW] to match an Avro map. You must name the ROW fields key
and value
. These are the names that the Avro format uses for those fields in the data, and the parser relies on field names to match data to table columns.
When loading into flex tables or using flexible complex types, this parser handles Avro complex types as follows:
Record
The name of each field is used as a virtual column name. If flatten_records
is true and several nesting levels are present, Vertica concatenates the record names to create the key name.
Map
The value of each map key is used as a virtual column name. If flatten_maps
is true and several nesting levels are present, Vertica concatenates the key names to create the key name.
Enum
Vertica treats Avro Enums like records, with the name of the Enum as the key and the value as the value.
Array
Vertica treats Avro Arrays as key/value pairs. By default, the index of each element is the key. In the following example, product_detail
is a Record with a field, product_category
, that is an Array:
=> CREATE FLEX TABLE products;
CREATE TABLE
=> COPY products FROM :datafile WITH PARSER FAVROPARSER();
Rows Loaded
-------------
2
(1 row)
=> SELECT MAPTOSTRING(__raw__) FROM products ORDER BY __identity__;
maptostring
--------------------------------------------------------------------------------
{
"__name__": "Order",
"customer_id": "111222",
"order_details": {
"0.__name__": "OrderDetail",
"0.product_detail.__name__": "Product",
"0.product_detail.price": "46.21",
"0.product_detail.product_category": {
"0": "electronics",
"1": "printers",
"2": "computers"
},
"0.product_detail.product_description": "hp printer X11ew description :\
P",
"0.product_detail.product_hash": "\u0000\u0001\u0002\u0003\u0004",
"0.product_detail.product_id": "999012",
"0.product_detail.product_map.one": "1.1",
"0.product_detail.product_map.two": "1.1",
"0.product_detail.product_name": "hp printer X11ew",
"0.product_detail.product_status": "ONLY_FEW_LEFT",
"0.quantity": "3",
"0.total": "354.34"
},
"order_id": "2389646",
"total": "132.43"
}
...
If flatten_arrays
is true and several nesting levels are present, Vertica concatenates the indices to create the key name.
=> COPY products FROM :datafile WITH PARSER FAVROPARSER(flatten_arrays=true);
Rows Loaded
-------------
2
(1 row)
=> SELECT MAPTOSTRING(__raw__) FROM products ORDER BY __identity__;
maptostring
--------------------------------------------------------------------------------
{
"__name__": "Order",
"customer_id": "111222",
"order_details.0.__name__": "OrderDetail",
"order_details.0.product_detail.__name__": "Product",
"order_details.0.product_detail.price": "46.21",
"order_details.0.product_detail.product_category.0": "electronics",
"order_details.0.product_detail.product_category.1": "printers",
"order_details.0.product_detail.product_category.2": "computers",
"order_details.0.product_detail.product_description": "hp printer X11ew des\
cription :P",
"order_details.0.product_detail.product_hash": "\u0000\u0001\u0002\u0003\u0\
004",
"order_details.0.product_detail.product_id": "999012",
"order_details.0.product_detail.product_map.one": "1.1",
"order_details.0.product_detail.product_map.two": "1.1",
"order_details.0.product_detail.product_name": "hp printer X11ew",
"order_details.0.product_detail.product_status": "ONLY_FEW_LEFT",
"order_details.0.quantity": "3",
"order_details.0.total": "354.34",
"order_id": "2389646",
"total": "132.43"
}
...
Union
Vertica treats Avro Unions as arrays.
Examples
This example shows how to create and load a flex table with Avro data using favroparser
. After loading the data, you can query virtual columns:
=> CREATE FLEX TABLE avro_basic();
CREATE TABLE
=> COPY avro_basic FROM '/home/dbadmin/data/weather.avro' PARSER FAVROPARSER();
Rows Loaded
-------------
5
(1 row)
=> SELECT station, temp, time FROM avro_basic;
station | temp | time
---------+------+---------------
mohali | 0 | -619524000000
lucknow | 22 | -619506000000
norwich | -11 | -619484400000
ams | 111 | -655531200000
baddi | 78 | -655509600000
(5 rows)
For more information, see Avro data.