Examples

For additional COPY examples, see the reference pages for specific parsers, including: DELIMITED (Parser), ORC (Parser), PARQUET (Parser), FJSONPARSER (Parser), and FAVROPARSER (Parser).

For additional COPY examples, see the reference pages for specific parsers, including: DELIMITED, ORC, PARQUET, FJSONPARSER, and FAVROPARSER.

Specifying string options

Use COPY with FORMAT, DELIMITER, NULL, and ENCLOSED BY options:

=> COPY public.customer_dimension (customer_since FORMAT 'YYYY')
   FROM STDIN
   DELIMITER ','
   NULL AS 'null'
   ENCLOSED BY '"';

This example sets and references a vsql variable for the input file:

=> \set input_file ../myCopyFromLocal/large_table.gzip
=> COPY store.store_dimension
   FROM :input_file
   DELIMITER '|'
   NULL ''
   RECORD TERMINATOR E'\f';

Including multiple source files

Create a table and then copy multiple source files to it:

=> CREATE TABLE sampletab (a int);
CREATE TABLE

=> COPY sampletab FROM '/home/dbadmin/one.dat', 'home/dbadmin/two.dat';
 Rows Loaded
-------------
           2
(1 row)

Use wildcards to indicate a group of files:

=> COPY myTable FROM 'webhdfs:///mydirectory/ofmanyfiles/*.dat';

Wildcards can include regular expressions:

=> COPY myTable FROM 'webhdfs:///mydirectory/*_[0-9]';

Specify multiple paths in a single COPY statement:

=> COPY myTable FROM 'webhdfs:///data/sales/01/*.dat', 'webhdfs:///data/sales/02/*.dat',
    'webhdfs:///data/sales/historical.dat';

Distributing a load

Load data that is shared across all nodes. Vertica distributes the load across all nodes, if possible:

=> COPY sampletab FROM '/data/file.dat' ON ANY NODE;

Load data from two files. Because the first load file does not specify nodes (or ON ANY NODE), the initiator performs the load. Loading the second file is distributed across all nodes:

=> COPY sampletab FROM '/data/file1.dat', '/data/file2.dat' ON ANY NODE;

Specify different nodes for each load file:

=> COPY sampletab FROM '/data/file1.dat' ON (v_vmart_node0001, v_vmart_node0002),
    '/data/file2.dat' ON (v_vmart_node0003, v_vmart_node0004);

Loading data from shared storage

To load data from shared storage, use URLs in the corresponding schemes:

  • HDFS: [[s]web]hdfs://[nameservice]/path

  • S3: s3://bucket/path

  • Google Cloud: gs://bucket/path

  • Azure: azb://account/container/path

Load a file stored in HDFS using the default name node or name service:

=> COPY t FROM 'webhdfs:///opt/data/file1.dat';

Load data from a particular HDFS name service (testNS). You specify a name service if your database is configured to read from more than one HDFS cluster:

=> COPY t FROM 'webhdfs://testNS/opt/data/file2.csv';

Load data from an S3 bucket:

=> COPY t FROM 's3://AWS_DataLake/*' ORC;

Partitioned data

Data files can be partitioned using the directory structure, such as:

path/created=2016-11-01/region=northeast/*
path/created=2016-11-01/region=central/*
path/created=2016-11-01/region=southeast/*
path/created=2016-11-01/...
path/created=2016-11-02/region=northeast/*
path/created=2016-11-02/region=central/*
path/created=2016-11-02/region=southeast/*
path/created=2016-11-02/...
path/created=2016-11-03/...
path/...

Load partition columns using the PARTITION COLUMNS option:

=> CREATE EXTERNAL TABLE records (id int, name varchar(50), created date, region varchar(50))
   AS COPY FROM 'webhdfs:///path/*/*/*'
   PARTITION COLUMNS created, region;

Using filler columns

In the following example, the table has columns for first name, last name, and full name, but the data being loaded contains columns for first, middle, and last names. The COPY statement reads all of the source data but only loads the source columns for first and last names. It constructs the data for the full name by concatenating each of the source data columns, including the middle name. The middle name is read as a FILLER column so it can be used in the concatenation, but is ignored otherwise. (There is no table column for middle name.)

=> CREATE TABLE names(first VARCHAR(20), last VARCHAR(20), full VARCHAR(60));
CREATE TABLE
=> COPY names(first,
              middle FILLER VARCHAR(20),
              last,
              full AS first||' '||middle||' '||last)
      FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> Marc|Gregory|Smith
>> Sue|Lucia|Temp
>> Jon|Pete|Hamilton
>> \.
=> SELECT * from names;
 first |   last   |        full
-------+----------+--------------------
 Jon   | Hamilton | Jon Pete Hamilton
 Marc  | Smith    | Marc Gregory Smith
 Sue   | Temp     | Sue Lucia Temp
(3 rows)

In the following example, JSON data contains an unencrypted password. The JSON data and the table definition use the same name for the field. Instead of loading the value directly, the COPY statement uses a filler column and a (hypothetical) function to encrypt the value. Both the table column (users.password) and the filler column ("*FILLER*".password) are fully qualified:

=> CREATE TABLE users(user VARCHAR(32), password VARCHAR(32));

=> COPY users(user, password FILLER VARCHAR(32),
        users.password AS encrypt("*FILLER*".password))
   FROM ... PARSER FJSONPARSER();

Setting defaults for null values

When creating a table, you can specify a default value for a column. When loading data, Vertica uses the default if the column value is not present. However, defaults do not override null values.

To intercept null values and apply defaults, you can first stage the load with COPY NO COMMIT and then update the values:

=> CREATE TABLE sales (id INT, shipped TIMESTAMP(6) DEFAULT now());

=> COPY sales FROM ... NO COMMIT;

=> UPDATE sales SET shipped = DEFAULT WHERE epoch IS NULL;

=> COMMIT;

All Vertica tables have an implicit epoch column, which records when the data was committed. For data that is not yet committed, this value is null. You can use this column to identify the just-loaded data and update it before committing.

Loading data into a flex table

Create a Flex table and copy JSON data into it:

=> CREATE FLEX TABLE darkdata();
CREATE TABLE
=> COPY tweets FROM '/myTest/Flexible/DATA/tweets_12.json' PARSER FJSONPARSER();
 Rows Loaded
-------------
          12
(1 row)

Using named pipes

COPY supports named pipes that follow the same naming conventions as file names on the given file system. Permissions are open, write, and close.

Create a named pipe and set two vsql variables:

=> \! mkfifo  pipe1
=> \set dir `pwd`/
=> \set file '''':dir'pipe1'''

Copy an uncompressed file from the named pipe:

=> \! cat pf1.dat > pipe1 &
=> COPY large_tbl FROM :file delimiter '|';
=> SELECT * FROM large_tbl;
=> COMMIT;

Loading compressed data

Copy a GZIP file from a named pipe and uncompress it:

=> \! gzip pf1.dat
=> \! cat pf1.dat.gz > pipe1 &
=> COPY large_tbl FROM :file ON site01 GZIP delimiter '|';
=> SELECT * FROM large_tbl;
=> COMMIT;
=> \!gunzip pf1.dat.gz