CREATE EXTERNAL TABLE AS COPY

CREATE EXTERNAL TABLE AS COPY creates a table definition for data external to your Vertica database.

CREATE EXTERNAL TABLE AS COPY creates a table definition for data external to your Vertica database. This statement is a combination of the CREATE TABLE and COPY statements, supporting a subset of each statement's parameters.

Canceling a CREATE EXTERNAL TABLE AS COPY statement can cause unpredictable results. If you need to make a change, allow the statement to complete, drop the table, and then retry.

You can use ALTER TABLE to change the data types of columns instead of dropping and recreating the table.

You can use CREATE EXTERNAL TABLE AS COPY with any types except types from the Place package.

Syntax

CREATE EXTERNAL TABLE [ IF NOT EXISTS ] [[database.]schema.]table-name 
    ( column-definition[,...] )
[{INCLUDE | EXCLUDE} [SCHEMA] PRIVILEGES]
AS COPY
    [ ( { column-as-expression | column }
       [ DELIMITER [ AS ] 'char' ]
       [ ENCLOSED [ BY ] 'char' ]
       [ ENFORCELENGTH ]
       [ ESCAPE [ AS ] 'char' | NO ESCAPE ]
       [ FILLER datatype ]
       [ FORMAT 'format' ]
       [ NULL [ AS ] 'string' ]
       [ TRIM 'byte' ]
    [,...] ) ]
    [ COLUMN OPTION ( column 
       [ DELIMITER [ AS ] 'char' ]
       [ ENCLOSED [ BY ] 'char' ]
       [ ENFORCELENGTH ]
       [ ESCAPE [ AS ] 'char' | NO ESCAPE ]
       [ FORMAT 'format' ]
       [ NULL [ AS ] 'string' ]
       [ TRIM 'byte' ]
    [,...] ) ]
FROM {
      'path-to-data' [ ON { nodename | (nodeset) | ANY NODE } ] [ input-format ] }[,...]
      | [ WITH ] SOURCE source( [arg=value[,...] ] )
}
      [ NATIVE
      | FIXEDWIDTH COLSIZES {( integer )[,...]}
      | NATIVE VARCHAR
      | ORC
      | PARQUET
      ]
   [ ABORT ON ERROR ]
   [ DELIMITER [ AS ] 'char' ]
   [ ENCLOSED BY 'char' [ AND 'char' ] ]
   [ ENFORCELENGTH ]
   [ ERROR TOLERANCE ]
   [ ESCAPE AS 'char' | NO ESCAPE ]
   [ EXCEPTIONS 'path' [ ON nodename ] [,...] ]
   [ [ WITH ] FILTER filter( [ arg=value[,...] ] ) ]
   [ NULL [ AS ] 'string' ]
   [ [ WITH ] PARSER parser([arg=value [,...] ]) ]
   [ RECORD TERMINATOR 'string' ]
   [ REJECTED DATA 'path' [ ON nodename ] [,...] ]
   [ REJECTMAX integer ]
   [ SKIP integer ]
   [ SKIP BYTES integer ]
   [ TRAILING NULLCOLS ]
   [ TRIM 'byte' ]

Parameters

For all supported parameters, see the CREATE TABLE and COPY statements. For information on using this statement with UDLs, see User-defined load (UDL).

For additional guidance on using COPY parameters, see Specifying where to load data from.

Privileges

Superuser, or non-superuser with the following privileges:

  • READ privileges on the USER-accessible storage location, see GRANT (storage location)

  • Full access (including SELECT) to an external table that the user has privileges to create

ORC and Parquet data

When using the ORC and Parquet formats, Vertica supports some additional options in the COPY statement and data structures for columns. See ORC (parser) and PARQUET (parser).

If ORC or Parquet data is partitioned, Vertica expects Hive-style partitioning. If you see unexpected results when reading data, verify that globs in your file paths correctly align with the partition structure. See Troubleshooting external tables.

Examples

The following example defines an external table for delimited data stored in HDFS:

=> CREATE EXTERNAL TABLE sales (itemID INT, date DATE, price FLOAT)
    AS COPY FROM 'hdfs:///dat/ext1.csv' DELIMITER ',';

The following example uses data in the ORC format that is stored in S3. The data has two partition columns, represented as directories in the file structure. For more information about partitions, see Using partition columns.

=> CREATE EXTERNAL TABLE transactions (id int, name varchar(50), created date, region varchar(50))
   AS COPY FROM 's3://datalake/sales/*/*/*'
   ORC(hive_partition_cols='created,region');

The following example shows how you can read from all Parquet files in a local directory, with no partitions and no globs:

=> CREATE EXTERNAL TABLE sales (itemID INT, date DATE, price FLOAT)
    AS COPY FROM '/data/sales/*.parquet' PARQUET;

The following example creates an external table using partitioned data in the ORC format. The table includes four columns. Two columns, "id" and "name", are in the data files. The other two, "created" and "region", are partition columns. For more about partition columns, see Using partition columns.

=> CREATE EXTERNAL TABLE t (id int, name varchar(50), created date, region varchar(50))
   AS COPY FROM 'hdfs:///path/*/*/*'
   ORC(hive_partition_cols='created,region');

The following example creates an external table from data in Google Cloud Storage:

=> CREATE EXTERNAL TABLE sales (itemID INT, date DATE, price FLOAT)
    AS COPY FROM 'gs://data/sales/*.csv';

The following example creates an external table for data containing arrays:

=> CREATE EXTERNAL TABLE cust (cust_custkey int, cust_custname varchar(50), cust_custstaddress ARRAY[varchar(100)],
   cust_custaddressln2 ARRAY[varchar(100)], cust_custcity ARRAY[varchar(50)], cust_custstate ARRAY[char(2)], cust_custzip ARRAY[int],
   cust_email varchar(50), cust_phone varchar(30))
   AS COPY FROM '   ' PARQUET;

The following examples create external tables from data in the local file system:

=> CREATE EXTERNAL TABLE ext1 (x integer) AS COPY FROM '/tmp/ext1.dat' DELIMITER ',';
=> CREATE EXTERNAL TABLE ext2 (x integer) AS COPY FROM '/tmp/ext2.dat.bz2' BZIP DELIMITER ',';
=> CREATE EXTERNAL TABLE ext3 (x integer, y integer) AS COPY (x as '5', y) FROM '/tmp/ext3.dat.bz2' BZIP DELIMITER ',';

To allow users without superuser access to use external tables with data on the local file system, S3, or GCS, create a location for 'user' usage and grant access to it. This example shows granting access to a user named Bob to any external table whose data is located under /tmp (including in subdirectories to any depth):

=> CREATE LOCATION '/tmp' ALL NODES USAGE 'user';
=> GRANT ALL ON LOCATION '/tmp' to Bob;

The following example shows CREATE EXTERNAL TABLE using a user-defined source:

=> CREATE SOURCE curl AS LANGUAGE 'C++' NAME 'CurlSourceFactory' LIBRARY curllib;
=> CREATE EXTERNAL TABLE curl_table1 as COPY SOURCE CurlSourceFactory;

See also

Creating external tables