CREATE DATA LOADER

CREATE DATA LOADER creates an automatic data loader that executes a COPY statement when new data files appear in a specified location. The loader records which files have already been successfully loaded and skips them.

CREATE DATA LOADER creates an automatic data loader that executes a COPY statement when new data files appear in a specified path. The loader records which files have already been successfully loaded and skips them. These events are recorded in the DATA_LOADER_EVENTS system table.

Use EXECUTE DATA LOADER to execute the data loader once. To execute it periodically, you can use a scheduled stored procedure. Executing the loader automatically commits the transaction.

The DATA_LOADERS system table shows all defined loaders. The DATA_LOADER_EVENTS system table records paths that were attempted and their outcomes. To prevent unbounded growth, records in DATA_LOADER_EVENTS are purged after a specified retention interval. If previously-loaded files are still in the source path after this purge, the data loader sees them as new files and loads them again.

Syntax

CREATE [ OR REPLACE ] DATA LOADER [ IF NOT EXISTS ]
   [schema.]name
   [ RETRY LIMIT { NONE | DEFAULT | limit } ]
   [ RETENTION INTERVAL monitoring-retention ]
   AS copy-statement

Arguments

OR REPLACE
If a data loader with the same name in the same schema exists, replace it. The original data loader's monitoring table is dropped.

This option cannot be used with IF NOT EXISTS.

IF NOT EXISTS

If an object with the same name exists, return without creating the object. If you do not use this directive and the object already exists, Vertica returns with an error message.

The IF NOT EXISTS clause is useful for SQL scripts where you might not know if the object already exists. The ON ERROR STOP directive can be helpful in scripts.

This option cannot be used with OR REPLACE.

schema
Schema containing the data loader. The default schema is public.
name
Name of the data loader.
RETRY LIMIT { NONE | DEFAULT | limit }
Maximum number of times to retry a failing file. Each time the data loader is executed, it attempts to load all files that have not yet been successfully loaded, up to this per-file limit. If set to DEFAULT, at load time the loader uses the value of the DataLoaderDefaultRetryLimit configuration parameter.

Default: DEFAULT

RETENTION INTERVAL monitoring-retention
How long to keep records in the events table. DATA_LOADER_EVENTS records events for all data loaders, but each data loader has its own retention interval.

Default: 14 days

AS copy-statement
The COPY statement that the loader executes. The FROM clause typically uses a glob.

Privileges

Non-superuser: CREATE privileges on the schema.

Restrictions

  • COPY NO COMMIT is not supported.
  • Data loaders are executed with COPY ABORT ON ERROR.
  • The COPY statement must specify file paths. You cannot use COPY FROM VERTICA.

File systems

The source path can be any shared file system or object store that all database nodes can access. To use an HDFS or Linux file system safely, you must prevent the loader from reading a partially-written file. One way to achieve this is to only execute the loader when files are not being written to the loader's path. Another way to achieve this is to write new files in a temporary location and move them to the loader's path only when they are complete.

Examples

=> CREATE DATA LOADER s.dl1 RETRY LIMIT NONE 
   AS COPY s.local FROM 's3://b/data/*.dat';

=> SELECT name, schemaname, copystmt, retrylimit FROM data_loaders;
 name | schemaname |              copystmt                 | retrylimit 
------+------------+---------------------------------------+------------
 dl1  | s          | COPY s.local FROM 's3://b/data/*.dat' |         -1
(1 row)

See Automatic load for an extended example.

See also