Examples

For additional COPY examples, see the reference pages for specific parsers, including: DELIMITED (Parser), ORC (Parser), PARQUET (Parser), FJSONPARSER (Parser), and FAVROPARSER (Parser).

For additional COPY examples, see the reference pages for specific parsers, including: DELIMITED, ORC, PARQUET, FJSONPARSER, and FAVROPARSER.

Specifying string options

Use COPY with FORMAT, DELIMITER, NULL, and ENCLOSED BY options:

=> COPY public.customer_dimension (customer_since FORMAT 'YYYY')
   FROM STDIN
   DELIMITER ','
   NULL AS 'null'
   ENCLOSED BY '"';

Use COPY with DELIMITER and NULL options. This example sets and references a vsql variable for the input file:

=> \set input_file ../myCopyFromLocal/large_table.gzip
=> COPY store.store_dimension
   FROM :input_file
   DELIMITER '|'
   NULL ''
   RECORD TERMINATOR E'\f';

Including multiple source files

Create a table and then copy multiple source files to it:

=> CREATE TABLE sampletab (a int);
CREATE TABLE

=> COPY sampletab FROM '/home/dbadmin/one.dat', 'home/dbadmin/two.dat';
 Rows Loaded
-------------
           2
(1 row)

Use wildcards to indicate a group of files:

=> COPY myTable FROM 'webhdfs:///mydirectory/ofmanyfiles/*.dat';

Wildcards can include regular expressions:

=> COPY myTable FROM 'webhdfs:///mydirectory/*_[0-9]';

Specify multiple paths in a single COPY statement:

=> COPY myTable FROM 'webhdfs:///data/sales/01/*.dat', 'webhdfs:///data/sales/02/*.dat',
    'webhdfs:///data/sales/historical.dat';

Distributing a load

Load data that is shared across all nodes. Vertica distributes the load across all nodes, if possible:

=> COPY sampletab FROM '/data/file.dat' ON ANY NODE;

Load data from two files. Because the first load file does not specify nodes (or ON ANY NODE), the initiator performs the load. Loading the second file is distributed across all nodes:

=> COPY sampletab FROM '/data/file1.dat', '/data/file2.dat' ON ANY NODE;

Specify different nodes for each load file:

=> COPY sampletab FROM '/data/file1.dat' ON (v_vmart_node0001, v_vmart_node0002),
    '/data/file2.dat' ON (v_vmart_node0003, v_vmart_node0004);

Loading data from shared storage

To load data from shared storage, use URLs in the corresponding schemes:

HDFS: [[s]web]hdfs://[nameservice]/path
S3: s3://bucket/path
Google Cloud: gs://bucket/path
Azure: azb://account/container/path

Note

Loads from HDFS, S3, GCS, and Azure default to ON ANY NODE; you do not need to specify it.

Load a file stored in HDFS using the default name node or name service:

=> COPY t FROM 'webhdfs:///opt/data/file1.dat';

Load data from a particular HDFS name service (testNS). You specify a name service if your database is configured to read from more than one HDFS cluster:

=> COPY t FROM 'webhdfs://testNS/opt/data/file2.csv';

Load data from an S3 bucket:

=> COPY t FROM 's3://AWS_DataLake/*' ORC;

Partitioned data

Data files can be partitioned using the directory structure, such as:

path/created=2016-11-01/region=northeast/*
path/created=2016-11-01/region=central/*
path/created=2016-11-01/region=southeast/*
path/created=2016-11-01/...
path/created=2016-11-02/region=northeast/*
path/created=2016-11-02/region=central/*
path/created=2016-11-02/region=southeast/*
path/created=2016-11-02/...
path/created=2016-11-03/...
path/...

Load partition columns using the PARTITION COLUMNS option:

=> CREATE EXTERNAL TABLE records (id int, name varchar(50), created date, region varchar(50))
   AS COPY FROM 'webhdfs:///path/*/*/*'
   PARTITION COLUMNS created, region;

Using filler columns

In the following example, the table has columns for first name, last name, and full name, but the data being loaded contains columns for first, middle, and last names. The COPY statement reads all of the source data but only loads the source columns for first and last names. It constructs the data for the full name by concatenating each of the source data columns, including the middle name. The middle name is read as a FILLER column so it can be used in the concatenation, but is ignored otherwise. (There is no table column for middle name.)

=> CREATE TABLE names(first VARCHAR(20), last VARCHAR(20), full VARCHAR(60));
CREATE TABLE
=> COPY names(first,
              middle FILLER VARCHAR(20),
              last,
              full AS first||' '||middle||' '||last)
      FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> Marc|Gregory|Smith
>> Sue|Lucia|Temp
>> Jon|Pete|Hamilton
>> \.
=> SELECT * from names;
 first |   last   |        full
-------+----------+--------------------
 Jon   | Hamilton | Jon Pete Hamilton
 Marc  | Smith    | Marc Gregory Smith
 Sue   | Temp     | Sue Lucia Temp
(3 rows)

Loading data into a flex table

Create a Flex table and copy JSON data into it using FJSONPARSER:

=> CREATE FLEX TABLE darkdata();
CREATE TABLE
=> COPY tweets FROM '/myTest/Flexible/DATA/tweets_12.json' PARSER FJSONPARSER();
 Rows Loaded
-------------
          12
(1 row)

Using named pipes

COPY supports named pipes that follow the same naming conventions as file names on the given file system. Permissions are open, write, and close.

Create named pipe, pipe1, and set two vsql variables:

=> \! mkfifo  pipe1
=> \set dir `pwd`/
=> \set file '''':dir'pipe1'''

Copy an uncompressed file from the named pipe:

=> \! cat pf1.dat > pipe1 &
=> COPY large_tbl FROM :file delimiter '|';
=> SELECT * FROM large_tbl;
=> COMMIT;

Loading compressed data

Copy a GZIP file from a named pipe and uncompress it:

=> \! gzip pf1.dat
=> \! cat pf1.dat.gz > pipe1 &
=> COPY large_tbl FROM :file ON site01 GZIP delimiter '|';
=> SELECT * FROM large_tbl;
=> COMMIT;
=> \!gunzip pf1.dat.gz