CREATE EXTERNAL TABLE ICEBERG

Creates an external table for data stored by Apache Iceberg.

Creates an external table for data stored by Apache Iceberg. An Iceberg table consists of data files and metadata describing the schema. Unlike other external tables, an Iceberg external table need not specify column definitions (DDL). The information is read from Iceberg metadata at query time. For certain data types you can adjust column definitions, for example to specify VARCHAR sizes.

A single Iceberg table can have more than one metadata file, each describing a different version of the table. Iceberg supports two types of tables based on how their metadata files are stored:

File System Tables - In this type, the writer is responsible for naming and organizing the metadata files. For file system tables, you can create an external table using either the base location of the table or a specific metadata file.
Metastore Tables - In this type, a separate metastore handles metadata management. Vertica supports the AWS Glue metastore and all metastores that comply with the Iceberg Standard Rest Open API, such as Project Nessie and others.

AWS Glue tables - You can create an external table by specifying the Glue database location and using the glue-db and glue-table parameters.
Rest Open API tables - You can create an external table by specifying the HTTP or HTTPs endpoint location and using the rest-auth parameter.

If a metadata file specifies columns or struct fields that are not present in the data, Vertica treats the missing values as NULL.

All Iceberg files, both data and metadata, must be accessible to all database nodes.

Iceberg can store data in several file formats. Vertica can read Iceberg data in the Parquet format only and with either version 1 or version 2 metadata. If the Parquet files encode field IDs, Vertica uses them directly. Otherwise, Vertica uses Iceberg fallback name mapping to read field IDs from the metadata (the schema.name-mapping.default property). If the metadata does not contain a mapping for a field, Vertica sets the column values to NULL.

Syntax

CREATE EXTERNAL TABLE [[{namespace. | database. }]schema.]table
   STORED BY ICEBERG LOCATION { path | metadata-file | glue-path | rest-endpoint }
   [GLUE_DB glue-db]
   [GLUE_TABLE glue-table]
   [REST_AUTH rest-auth]
   [COLUMN TYPES (column-name type[,...])] ;

Arguments

{ database | namespace }

Name of the database or namespace that contains table:

Database name: If specified, it must be the current database.
Namespace name (Eon Mode only): You must specify the namespace of objects in non-default namespaces. If no namespace is provided, Vertica assumes the object is in the default namespace.

schema

Name of the schema, by default public. If you specify the namespace or database name, you must provide the schema name, even if the schema is public.

If you specify a schema name that contains a period, the part before the period cannot be the same as the name of an existing namespace.

table

Name of the table to create, which must be unique among names of all sequences, tables, projections, views, and models within the schema.

glue-db

Name of the AWS Glue database. If specified glue-table must also be specified.

glue-table

Name of the AWS Glue table. If specified glue-db must also be specified.

rest-auth

A JSON parameter for authentication credentials to access an Iceberg table through a REST OpenAPI endpoint. It can include either a bearerToken (used for stateless authentication) or OAuth2 fields to acquire its own tokens.

oauthTokenUri: The URL used to obtain the OAuth2 token.
oauthClientId: The client ID used for OAuth2 authentication.
oauthClientSecret: The secret associated with the client ID.
oauthAllowedScopes: A list of scopes determining the access granted.

STORED BY ICEBERG LOCATION { path|metadata-file|glue-path|rest-endpoint }

Location of Iceberg data, one of:

Base location of an Iceberg File System table. Vertica uses the latest metadata file.
Path to a metadata file with a name ending in .metadata.json. Vertica uses this metadata file even if it is not the latest.
AWS Glue database path (glue-path): An S3 path where the specified Glue database (glue-db) is located, or the S3 bucket containing this path.
HTTP or HTTPS endpoint (rest-endpoint) of a table managed by an Iceberg metastore that follows Iceberg's REST OpenAPI specification.

If the paths embedded in the Iceberg data are not accessible to Vertica, use the IcebergPathMapping configuration parameter to provide mappings. See Path Prefixes.

COLUMN TYPES (column-name type[,...])

Column names and types for VARCHAR, VARBINARY, or ARRAY columns or ROW fields only. You can specify types only to set lengths or array bounds, not type coercion. See Data Types. If you do not specify a type, the table uses the Vertica defaults.

You cannot specify any other column properties, such as defaults or constraints.

Columns that are specified but not found in the Iceberg schema are ignored.

Privileges

Superuser, or non-superuser with the following privileges:

READ privileges on the USER-accessible storage location. See GRANT (storage location)
Full access (including SELECT) to an external table that the user has privileges to create

Path prefixes

Iceberg tables store file paths in the metadata as absolute URIs (host and port). Sometimes this URI differs from the URI that Vertica can use to access the data. This can particularly be an issue for files stored on HDFS, where the metadata can use a different URI scheme and port number than what Vertica expects.

To change the URIs, set the IcebergPathMapping configuration parameter. The value is a list of one or more pairs of Iceberg URI prefixes and corresponding Vertica prefixes:

=> ALTER SESSION SET IcebergPathMapping=
   '{"hdfs://node-196.example.com:9000":"webhdfs://node-196.example.com:9870"}';

Include only the URI prefix (up through the port), not complete paths. If IcebergPathMapping contains more than one mapping that could apply, Vertica uses the longest entry that matches.

You can set IcebergPathMapping at the database, session, or user level.

Data types

The following table shows the mappings of Iceberg data types to Vertica data types. For types that allow it, you can use the COLUMN TYPES clause to override these defaults.

Iceberg Type	Vertica Type	Allows Override?
boolean	BOOLEAN	No
int (32-bit)	INT	No
long (64-bit)	INT	No
float (32-bit)	FLOAT	No
double (64-bit)	FLOAT	No
decimal(precision, scale)	NUMERIC with same precision and scale	No
date	DATE	No
time	TIME	No
timestamp	TIMESTAMP	No
timestamptz	TIMESTAMP WITH TIMEZONE	No
string	VARCHAR(80)	VARCHAR or LONG VARCHAR with custom length
uuid	UUID	No
fixed(length)	BINARY(length) if length <= 65000 LONG VARBINARY(length) otherwise	No
binary (variable length)	VARBINARY(80)	VARBINARY or LONG VARBINARY with custom length
struct	ROW	No, but you can override individual fields if their types permit
list	ARRAY (default bound)	ARRAY with custom bound
map<key, value>	ARRAY[ROW(key, value)] (default bound)	ARRAY[ROW(key, value)] with custom bound

Restrictions

The following restrictions apply to external tables backed by Iceberg:

Data files must be in Parquet format.
Metadata files must use version 1 or version 2 format.
Iceberg column defaults are not supported.
Iceberg delete files are not supported.
Malformed data is an error that aborts the load. You cannot reject bad data and continue.
VARCHAR values are not truncated. If a string is too long, it is treated as an error.

The following restrictions apply to queries of Iceberg tables:

You cannot use a column in an Iceberg table as a DEFAULT or SET USING option in another table. The following example is an error:

=> CREATE TABLE t(
    id INT DEFAULT (SELECT COUNT(*) FROM iceberg_table));
ERROR 0:  Default and set using expressions cannot refer to external iceberg tables

Errors in Iceberg data or metadata, such as missing files or type mismatches, can manifest as query errors such as the following:

=> SELECT * FROM iceberg_table;
ERROR 0:  Problem reading metadata for table iceberg_table. Detail: Could not determine type of column a. User specified type: int. Iceberg type: boolean

Examples

The following example creates a table based on the Parquet data files with no overrides:

=> CREATE EXTERNAL TABLE sales
   STORED BY ICEBERG LOCATION 's3:/sales/*';

In the following example, the data uses a struct for the shipping address, with fields for street address (string), city (string), and zip code (integer). The following table definition overrides the default VARCHAR lengths. Note that the zip code is not included in COLUMN TYPES overrides. The ROW column contains only the fields being changed, but all fields including the zip code are part of the table definition and are included in query results:

=> CREATE EXTERNAL TABLE sales
   STORED BY ICEBERG LOCATION 's3:/sales/*'
   COLUMN TYPES (address ROW(street VARCHAR(50), city VARCHAR(50)));

The following example creates an external table named users_glue for the Iceberg table users in the Glue database testdb. The metadata is stored at s3://vertica-fleeting/my-iceberg:

=> CREATE EXTERNAL TABLE users_glue
   STORED BY ICEBERG LOCATION 's3://vertica-fleeting/my-iceberg'
   GLUE_DB 'testdb'
   GLUE_TABLE 'users';

The following example creates an external table named rest_t1 for the Iceberg table my_t1, managed by an Iceberg metastore through a REST OpenAPI endpoint. Authentication is provided using a bearerToken:

=> CREATE EXTERNAL TABLE rest_t1
   STORED BY ICEBERG LOCATION 'https://myhost:19120/iceberg/v1/main/namespaces/my_ice_db/tables/my_t1'
   REST_AUTH '{"bearerToken": "eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICIzWFRweHB"}';

The following example creates an external table named rest_t2 for the Iceberg table my_t2, managed by an Iceberg metastore through a REST OpenAPI endpoint. Authentication is provided using OAuth2 fields:

=> CREATE EXTERNAL TABLE rest_t2
   STORED BY ICEBERG LOCATION 'https://myhost:19120/iceberg/v1/main/namespaces/my_ice_db/tables/my_t2'
   REST_AUTH '{"oauthTokenUri": "http://myhost:8080/realms/iceberg/protocol/openid-connect/token",
         "oauthClientId": "client1",
         "oauthClientSecret": "s3cr3t",
         "oauthAllowedScopes": ["profile", "catalog"]}';