Examples
For additional COPY examples, see the reference pages for specific parsers, including: DELIMITED, ORC, PARQUET, FJSONPARSER, and FAVROPARSER.
Specifying string options
Use COPY with FORMAT, DELIMITER, NULL, and ENCLOSED BY options:
This example sets and references a vsql
variable for the input file:
Including multiple source files
Create a table and then copy multiple source files to it:
Use wildcards to indicate a group of files:
Wildcards can include regular expressions:
Specify multiple paths in a single COPY statement:
Distributing a load
Load data that is shared across all nodes. Vertica distributes the load across all nodes, if possible:
Load data from two files. Because the first load file does not specify nodes (or ON ANY NODE), the initiator performs the load. Loading the second file is distributed across all nodes:
=> COPY sampletab FROM '/data/file1.dat', '/data/file2.dat' ON ANY NODE;
Specify different nodes for each load file:
Loading data from shared storage
To load data from shared storage, use URLs in the corresponding schemes:
-
HDFS:
[[s]web]hdfs://[
nameservice
]/
path
-
S3:
s3://
bucket
/
path
-
Google Cloud:
gs://
bucket
/
path
-
Azure:
azb://
account
/
container
/
path
Note
Loads from HDFS, S3, GCS, and Azure default toON ANY NODE
; you do not need to specify it.
Load a file stored in HDFS using the default name node or name service:
Load data from a particular HDFS name service (testNS). You specify a name service if your database is configured to read from more than one HDFS cluster:
Load data from an S3 bucket:
Partitioned data
Data files can be partitioned using the directory structure, such as:
path/created=2016-11-01/region=northeast/*
path/created=2016-11-01/region=central/*
path/created=2016-11-01/region=southeast/*
path/created=2016-11-01/...
path/created=2016-11-02/region=northeast/*
path/created=2016-11-02/region=central/*
path/created=2016-11-02/region=southeast/*
path/created=2016-11-02/...
path/created=2016-11-03/...
path/...
Load partition columns using the PARTITION COLUMNS option:
Using filler columns
In the following example, the table has columns for first name, last name, and full name, but the data being loaded contains columns for first, middle, and last names. The COPY statement reads all of the source data but only loads the source columns for first and last names. It constructs the data for the full name by concatenating each of the source data columns, including the middle name. The middle name is read as a FILLER column so it can be used in the concatenation, but is ignored otherwise. (There is no table column for middle name.)
In the following example, JSON data contains an unencrypted password. The JSON data and the table definition use the same name for the field. Instead of loading the value directly, the COPY statement uses a filler column and a (hypothetical) function to encrypt the value. Both the table column (users.password
) and the filler column ("*FILLER*".password
) are fully qualified:
Setting defaults for null values
When creating a table, you can specify a default value for a column. When loading data, Vertica uses the default if the column value is not present. However, defaults do not override null values.
To intercept null values and apply defaults, you can first stage the load with COPY NO COMMIT and then update the values:
All Vertica tables have an implicit epoch
column, which records when the data was committed. For data that is not yet committed, this value is null. You can use this column to identify the just-loaded data and update it before committing.
Loading data into a flex table
Create a Flex table and copy JSON data into it:
Using named pipes
COPY supports named pipes that follow the same naming conventions as file names on the given file system. Permissions are open
, write
, and close
.
Create a named pipe and set two vsql
variables:
Copy an uncompressed file from the named pipe:
Loading compressed data
Copy a GZIP file from a named pipe and uncompress it: