Data load

New features for loading data: automatic load pipelines.

Automatically load new files

A data loader automatically loads new files from a location, so that you do not have to add them to Vertica manually. Automatically loading new data into ROS tables is an alternative to using external tables and can save on API costs for object stores.

A data loader is tied to a path for data and a target table. When executed, the loader attempts to load files that it has not previously loaded. A loader has a retry limit to prevent malformed files from being tried over and over. Each loader records monitoring information in an associated table.

To run a data loader periodically, you can use a scheduled stored procedure to execute the loader.

For details and an example, see Automatic load.

ORC parser supports loose schema matching

By default, the ORC parser uses strong schema matching. This means that the load must consume all columns in the data and in the order they occur in the data. You can, instead, use loose schema matching, which allows you to select the columns you want and ignore the rest. Loose schema matching depends on the names of the columns in the data rather than their order, so the column names in your table must match those in the data. Types must match or be coercible. Loose schema matching for ORC behaves the same way as it does for Parquet. For details on how to use loose schema matching, see the ORC reference page.

Partitioned paths

Vertica previously supported partition pruning for Hive-style partitioned data. Vertica now supports loading and pruning from any partitioned path. For example, given paths like /data/2023/01, you can now read the year and month values from the path, and at query time Vertica automatically skips reading partition directories that are not needed. See Partitioned data.