Loading data

Data files are sometimes partitioned in the file system using the directory structure.

Partitioned file paths

Data files are sometimes partitioned in the file system using the directory structure. Partitioning moves values out of the raw data, where they have to be included for each row, and into the directory structure, saving disk space. Partitioning can also improve query performance by allowing entire directories to be skipped.

Previously, only the Parquet and ORC parsers could take advantage of partitioned file paths. Now COPY supports partitioned file paths for all parsers using the new PARTITION COLUMNS option. The hive_partition_cols parameter for the Parquet and ORC parsers is deprecated.

The hive_partition_cols parameter has the following behavior changes from previous releases:

  • Nested partition directories must appear in consistent order in the file system. The following path pattern is invalid:

    /data/created=2022-01-01/region=north
    /data/region=south/created=2022-01-02
    
  • If the column value cannot be parsed from the directory name, COPY rejects the path instead of treating the value as null.

  • If the path is missing a declared partition column, COPY always returns an error. Previously, if do_soft_schema_match_by_name was true in the Parquet parser, the parser filled the column with nulls.

  • Partition columns are no longer required to be the last columns in the table definition.

See Partitioned file paths.