ORC data

The ORC (Optimized Row Columnar) format is a column-oriented file format.

The ORC (Optimized Row Columnar) format is a column-oriented file format. Vertica has a parser for this format that can take advantage of the columnar layout.

In the COPY statement, specify the parser as follows:

=> COPY tableName FROM path ORC[(...)];

The parser takes several optional parameters; see the ORC (parser) reference page. If the data is partitioned, see in particular the hive_partition_cols parameter and Using partition columns. While partitioning is primarily a benefit when using external tables, you can load partitioned data into Vertica tables using this parser.

Be aware that if you load from multiple files in the same COPY statement, and any of them is aborted, the entire load aborts. This behavior differs from that for delimited files, where the COPY statement loads what it can and ignores the rest.

Supported data types

Vertica can natively read columns of all Hive primitive data types. For a complete list, see HIVE Data Types (specifically the Numeric, Date/Time, String, and Misc lists). Vertica can also load UUIDs and complex types (arrays and structs in any combination).

The data types you specify for COPY or CREATE EXTERNAL TABLE AS COPY must exactly match the types in the data, though Vertica permits implicit casting among compatible numeric types.

Timestamps and time zones

To correctly report timestamps, Vertica must know what time zone the data was written in. Older versions of the ORC format do not record the time zone.

Hive version 1.2.0 and later records the writer time zone in the stripe footer. Vertica uses that time zone to make sure the timestamp values read into the database match the ones written in the source file. For ORC files that are missing this time zone information, Vertica assumes the values were written in the local time zone and logs an ORC_FILE_INFO event in the QUERY_EVENTS system table. Check for events of this type after your first query to verify that timestamps are being handled as you expected.