CREATE DATA LOADER

CREATE DATA LOADER creates an automatic data loader that executes a COPY statement when new data files appear in a specified location. The loader records which files have already been successfully loaded and skips them.

CREATE DATA LOADER creates an automatic data loader that executes a COPY statement when new data files appear in a specified path. The loader records which files have already been successfully loaded and skips them.

Use EXECUTE DATA LOADER to execute the data loader once. To execute it periodically, you can use a scheduled stored procedure. Executing the loader automatically commits the transaction.

Each data loader has an associated monitoring table that records paths that were attempted and their outcomes. The table follows the following naming scheme:

v_data_loader.schema_loader-name

Access to monitoring tables requires superuser privileges.

To prevent unbounded growth, records in the monitoring table are purged after a specified interval. If previously-loaded files are still in the source path after this purge, the data loader sees them as new files and loads them again.

The DATA_LOADERS system table shows all defined loaders.

Syntax

CREATE DATA LOADER [schema.]name
   [ RETRY LIMIT { NONE | DEFAULT | limit } ]
   [ RETENTION INTERVAL monitoring-retention ]
   AS copy-statement

Arguments

schema
Schema containing the data loader. The default schema is public.
name
Name of the data loader.
RETRY LIMIT { NONE | DEFAULT | limit }
Maximum number of times to retry a failing file. Each time the data loader is executed, it attempts to load all files that have not yet been successfully loaded, up to this per-file limit. If set to DEFAULT, at load time the loader uses the value of the DataLoaderDefaultRetryLimit configuration parameter.

Default: DEFAULT

RETENTION INTERVAL monitoring-retention
How long to keep records in the monitoring table.

Default: 14 days

copy-statement
The COPY statement that the loader executes. The FROM clause typically uses a glob.

Privileges

Non-superuser: CREATE privileges on the schema.

Restrictions

  • COPY NO COMMIT is not supported.
  • Data loaders are excuted with COPY ABORT ON ERROR.
  • The COPY statement must specify file paths. You cannot use COPY FROM VERTICA.
  • The loader's monitoring table requires superuser privileges to access.

File systems

The source path can be any shared file system or object store that all database nodes can access. To use an HDFS or Linux file system safely, you must prevent the loader from reading a partially-written file. One way to achieve this is to only execute the loader when files are not being written to the loader's path. Another way to achieve this is to write new files in a temporary location and move them to the loader's path only when they are complete.

Examples

=> CREATE DATA LOADER s.dl1 RETRY LIMIT NONE 
   AS COPY s.local FROM 's3://b/data/*.dat';

=> SELECT name, schemaname, copystmt, retrylimit FROM data_loaders;
 name | schemaname |              copystmt                 | retrylimit 
------+------------+---------------------------------------+------------
 dl1  | s          | COPY s.local FROM 's3://b/data/*.dat' |         -1
(1 row)

See Automatic load for an extended example.

See also