CREATE EXTERNAL TABLE ICEBERG
Creates an external table for data stored by Apache Iceberg. An Iceberg table consists of data files and metadata describing the schema. Unlike other external tables, an Iceberg external table need not specify column definitions (DDL). The information is read from Iceberg metadata at query time. For certain data types you can adjust column definitions, for example to specify VARCHAR sizes.
A single Iceberg table can have more than one metadata file, each describing a different version of the table. You can create an external table using either the base location of the table or a specific metadata file.
If a metadata file specifies columns or struct fields that are not present in the data, Vertica treats the missing values as NULL.
All Iceberg files, both data and metadata, must be accessible to all database nodes.
Iceberg can store data in several file formats. Vertica can read Iceberg data in the Parquet format only.
Syntax
CREATE EXTERNAL TABLE [[database.]schema.]table
STORED BY ICEBERG LOCATION { path | metadata-file }
[COLUMN TYPES (column-name type[,...])] ;
Arguments
[
database
.]
schema
Database and schema. The default schema is
public
. If you specify a database, it must be the current database.table
- Name of the table to create, which must be unique among names of all sequences, tables, projections, views, and models within the schema.
STORED BY ICEBERG LOCATION {
path
|
metadata-file
}
- Location of Iceberg data, one of:
-
Base location of an Iceberg File System table. Vertica uses the latest metadata file.
-
Path to a metadata file with a name ending in
.metadata.json
. Vertica uses this metadata file even if it is not the latest.
On S3, an Iceberg table is not a File System table but a metastore. This means you cannot specify a base location on S3. You must specify the full path to a metadata file.
If the paths embedded in the Iceberg data are not accessible to Vertica, use the IcebergPathMapping configuration parameter to provide mappings. See Path Prefixes.
-
COLUMN TYPES (
column-name
type
[,...])
- Column names and types for VARCHAR, VARBINARY, or ARRAY columns or ROW fields only. You can specify types only to set lengths or array bounds, not type coercion. See Data Types. If you do not specify a type, the table uses the Vertica defaults.
You cannot specify any other column properties, such as defaults or constraints.
Columns that are specified but not found in the Iceberg schema are ignored.
Privileges
Superuser, or non-superuser with the following privileges:
-
READ privileges on the USER-accessible storage location. See GRANT (storage location)
-
Full access (including SELECT) to an external table that the user has privileges to create
Path prefixes
Iceberg tables store file paths in the metadata as absolute URIs (host and port). Sometimes this URI differs from the URI that Vertica can use to access the data. This can particularly be an issue for files stored on HDFS, where the metadata can use a different URI scheme and port number than what Vertica expects.
To change the URIs, set the IcebergPathMapping configuration parameter. The value is a list of one or more pairs of Iceberg URI prefixes and corresponding Vertica prefixes:
=> ALTER SESSION SET IcebergPathMapping=
'{"hdfs://node-196.example.com:9000":"webhdfs://node-196.example.com:9870"}';
Include only the URI prefix (up through the port), not complete paths. If IcebergPathMapping contains more than one mapping that could apply, Vertica uses the longest entry that matches.
You can set IcebergPathMapping at the database, session, or user level.
Data types
The following table shows the mappings of Iceberg data types to Vertica data types. For types that allow it, you can use the COLUMN TYPES clause to override these defaults.
Iceberg Type | Vertica Type | Allows Override? |
---|---|---|
boolean | BOOLEAN | No |
int (32-bit) | INT | No |
long (64-bit) | INT | No |
float (32-bit) | FLOAT | No |
double (64-bit) | FLOAT | No |
decimal(precision, scale) | NUMERIC with same precision and scale | No |
date | DATE | No |
time | TIME | No |
timestamp | TIMESTAMP | No |
timestamptz | TIMESTAMP WITH TIMEZONE | No |
string | VARCHAR(80) | VARCHAR or LONG VARCHAR with custom length |
uuid | UUID | No |
fixed(length) |
BINARY(length) if length <= 65000 LONG VARBINARY(length) otherwise |
No |
binary (variable length) | VARBINARY(80) | VARBINARY or LONG VARBINARY with custom length |
struct | ROW | No, but you can override individual fields if their types permit |
list | ARRAY (default bound) | ARRAY with custom bound |
map<key, value> | ARRAY[ROW(key, value)] (default bound) | ARRAY[ROW(key, value)] with custom bound |
Restrictions
The following restrictions apply to external tables backed by Iceberg:
-
Data files must be in Parquet format and have Iceberg field IDs.
-
Iceberg column defaults are not supported.
-
Iceberg delete files are not supported.
-
Malformed data is an error that aborts the load. You cannot reject bad data and continue.
-
VARCHAR values are not truncated. If a string is too long, it is treated as an error.
The following restrictions apply to queries of Iceberg tables:
-
You cannot use a column in an Iceberg table as a DEFAULT or SET USING option in another table. The following example is an error:
=> CREATE TABLE t( id INT DEFAULT (SELECT COUNT(*) FROM iceberg_table)); ERROR 0: Default and set using expressions cannot refer to external iceberg tables
-
Errors in Iceberg data or metadata, such as missing files or type mismatches, can manifest as query errors such as the following:
=> SELECT * FROM iceberg_table; ERROR 0: Problem reading metadata for table iceberg_table. Detail: Could not determine type of column a. User specified type: int. Iceberg type: boolean
Examples
The following example creates a table based on the Parquet data files with no overrides:
=> CREATE EXTERNAL TABLE sales
STORED BY ICEBERG LOCATION 's3:/sales/*';
In the following example, the data uses a struct for the shipping address, with fields for street address (string), city (string), and zip code (integer). The following table definition overrides the default VARCHAR lengths. Note that the zip code is not included in COLUMN TYPES overrides. The ROW column contains only the fields being changed, but all fields including the zip code are part of the table definition and are included in query results:
=> CREATE EXTERNAL TABLE sales
STORED BY ICEBERG LOCATION 's3:/sales/*'
COLUMN TYPES (address ROW(street VARCHAR(50), city VARCHAR(50)));
See also
EXTERNAL_DATA_FILES_PRUNED event in QUERY_EVENTS.