How external tables differ from native tables
You can use external tables in the same ways you use Vertica native tables. Because the data is external to the database, however, there are some differences in how external tables operate.
Data
The data for an external table can reside anywhere, so long as all database nodes can access it. S3, HDFS, and NFS mount points are common places to find external data. Naturally, querying external data can incur some latency compared to querying locally-stored ROS data, but Vertica has optimizations that can reduce the impact. For example, Vertica can take advantage of node and rack locality for HDFS data.
Because the data is external, Vertica loads external data each time you query it. Vertica is optimized to reduce the volume of read data, including predicate pushdown and partition pruning for formats that support partitioning. The ORC and Parquet formats support these optimizations.
Because the data is read at query time, you must ensure that your users have and retain permission to read the data in its original location. Depending on where the data is stored, you might need to take additional steps to manage access, such as creating AWS IAM roles on S3.
Because the data is not stored in Vertica, external tables do not use superprojections and buddy projections.
Resource consumption
External tables add very little to the Vertica catalog, which reduces the resources that queries consume. Because the data is not stored in Vertica, external tables are not affected by the Tuple Mover and do not cause ROS pushback. Vertica uses a small amount of memory when reading external table data, because the table contents are not part of your database and are parsed each time the external table is used.
Backup and restore
Because the data in external tables is managed outside of Vertica, only the external table definitions, not the data files, are included in database backups. Arrange for a separate backup process for your external table data.
DML support
External tables allow you to read external data. They do not allow you to modify it. Some DML operations are therefore not available for external tables, including:
-
DELETE FROM
-
INSERT INTO
-
SELECT...FOR UPDATE
Sequences and identity columns
The COPY statement definition for external tables can include identity columns and sequences. Whenever a select statement queries the external table, sequences and identity columns are re-evaluated. This results in changing the external table column values, even if the underlying external table data remains the same.