GET_METADATA

Returns the metadata of a Parquet file.

Returns the metadata of a Parquet file. Metadata includes the number and sizes of row groups, column names, and information about chunks and compression. Metadata is returned as JSON.

This function inspects one file. Parquet data usually spans many files in a single directory; choose one. The function does not accept a directory name as an argument.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

GET_METADATA( 'filename' )

Arguments

filename
The name of a Parquet file. Any path that is valid for COPY is valid for this function. This function does not operate on files in other formats.

Privileges

Superuser, or non-superuser with READ privileges on the USER-accessible storage location (see GRANT (storage location)).

Examples

You must call this function with a single file, not a directory or glob:

=> SELECT GET_METADATA('/data/emp-row.parquet');
                GET_METADATA
----------------------------------------------------------------------------------------------------
 schema:
required group field_id=-1 spark_schema {
  optional int32 field_id=-1 employeeID;
  optional group field_id=-1 personal {
    optional binary field_id=-1 name (String);
    optional group field_id=-1 address {
      optional binary field_id=-1 street (String);
      optional binary field_id=-1 city (String);
      optional int32 field_id=-1 zipcode;
    }
    optional int32 field_id=-1 taxID;
  }
  optional binary field_id=-1 department (String);
}

 data page version:
  data page v1

 metadata:
{
  "FileName": "/data/emp-row.parquet",
  "FileFormat": "Parquet",
  "Version": "1.0",
  "CreatedBy": "parquet-mr version 1.10.1 (build a89df8f9932b6ef6633d06069e50c9b7970bebd1)",
  "TotalRows": "4",
  "NumberOfRowGroups": "1",
  "NumberOfRealColumns": "3",
  "NumberOfColumns": "7",
  "Columns": [
     { "Id": "0", "Name": "employeeID", "PhysicalType": "INT32", "ConvertedType": "NONE", "LogicalType": {"Type": "None"} },
     { "Id": "1", "Name": "personal.name", "PhysicalType": "BYTE_ARRAY", "ConvertedType": "UTF8", "LogicalType": {"Type": "String"} },
     { "Id": "2", "Name": "personal.address.street", "PhysicalType": "BYTE_ARRAY", "ConvertedType": "UTF8", "LogicalType": {"Type": "String"} },
     { "Id": "3", "Name": "personal.address.city", "PhysicalType": "BYTE_ARRAY", "ConvertedType": "UTF8", "LogicalType": {"Type": "String"} },
     { "Id": "4", "Name": "personal.address.zipcode", "PhysicalType": "INT32", "ConvertedType": "NONE", "LogicalType": {"Type": "None"} },
     { "Id": "5", "Name": "personal.taxID", "PhysicalType": "INT32", "ConvertedType": "NONE", "LogicalType": {"Type": "None"} },
     { "Id": "6", "Name": "department", "PhysicalType": "BYTE_ARRAY", "ConvertedType": "UTF8", "LogicalType": {"Type": "String"} }
  ],
  "RowGroups": [
     {
       "Id": "0",  "TotalBytes": "642",  "TotalCompressedBytes": "0",  "Rows": "4",
       "ColumnChunks": [
          {"Id": "0", "Values": "4", "StatsSet": "True", "Stats": {"NumNulls": "0", "DistinctValues": "0", "Max": "51513", "Min": "17103" },
           "Compression": "SNAPPY", "Encodings": "PLAIN RLE BIT_PACKED ", "UncompressedSize": "67", "CompressedSize": "69" },
          {"Id": "1", "Values": "4", "StatsSet": "True", "Stats": {"NumNulls": "0", "DistinctValues": "0", "Max": "Sheldon Cooper", "Min": "Howard Wolowitz" },
           "Compression": "SNAPPY", "Encodings": "PLAIN RLE BIT_PACKED ", "UncompressedSize": "142", "CompressedSize": "145" },
          {"Id": "2", "Values": "4", "StatsSet": "True", "Stats": {"NumNulls": "0", "DistinctValues": "0", "Max": "52 Broad St", "Min": "100 Main St Apt 4A" },
           "Compression": "SNAPPY", "Encodings": "PLAIN RLE BIT_PACKED ", "UncompressedSize": "139", "CompressedSize": "123" },
          {"Id": "3", "Values": "4", "StatsSet": "True", "Stats": {"NumNulls": "0", "DistinctValues": "0", "Max": "Pasadena", "Min": "Pasadena" },
           "Compression": "SNAPPY", "Encodings": "RLE PLAIN_DICTIONARY BIT_PACKED ", "UncompressedSize": "95", "CompressedSize": "99" },
          {"Id": "4", "Values": "4", "StatsSet": "True", "Stats": {"NumNulls": "0", "DistinctValues": "0", "Max": "91021", "Min": "91001" },
           "Compression": "SNAPPY", "Encodings": "PLAIN RLE BIT_PACKED ", "UncompressedSize": "68", "CompressedSize": "70" },
          {"Id": "5", "Values": "4", "StatsSet": "True", "Stats": {"NumNulls": "4", "DistinctValues": "0", "Max": "0", "Min": "0" },
           "Compression": "SNAPPY", "Encodings": "PLAIN RLE BIT_PACKED ", "UncompressedSize": "28", "CompressedSize": "30" },
          {"Id": "6", "Values": "4", "StatsSet": "True", "Stats": {"NumNulls": "0", "DistinctValues": "0", "Max": "Physics", "Min": "Astronomy" },
           "Compression": "SNAPPY", "Encodings": "RLE PLAIN_DICTIONARY BIT_PACKED ", "UncompressedSize": "103", "CompressedSize": "107" }
        ]
     }
  ]
}

(1 row)