CREATE DATA LOADER
CREATE DATA LOADER creates an automatic data loader that executes a COPY statement when new data files appear in a specified path. The loader records which files have already been successfully loaded and skips them. These events are recorded in the DATA_LOADER_EVENTS system table.
Use EXECUTE DATA LOADER to execute the data loader once. To execute it periodically, you can use a scheduled stored procedure. Executing the loader automatically commits the transaction.
The DATA_LOADERS system table shows all defined loaders. The DATA_LOADER_EVENTS system table records paths that were attempted and their outcomes. To prevent unbounded growth, records in DATA_LOADER_EVENTS are purged after a specified retention interval. If previously-loaded files are still in the source path after this purge, the data loader sees them as new files and loads them again.
Syntax
CREATE [ OR REPLACE ] DATA LOADER [ IF NOT EXISTS ]
[schema.]name
[ RETRY LIMIT { NONE | DEFAULT | limit } ]
[ RETENTION INTERVAL monitoring-retention ]
AS copy-statement
Arguments
OR REPLACE
- If a data loader with the same name in the same schema exists, replace it. The original data loader's monitoring table is dropped.
This option cannot be used with
IF NOT EXISTS
. IF NOT EXISTS
If an object with the same name exists, return without creating the object. If you do not use this directive and the object already exists, Vertica returns with an error message.
The
IF NOT EXISTS
clause is useful for SQL scripts where you might not know if the object already exists. The ON ERROR STOP directive can be helpful in scripts.This option cannot be used with
OR REPLACE
.schema
- Schema containing the data loader. The default schema is
public
. name
- Name of the data loader.
RETRY LIMIT { NONE | DEFAULT |
limit
}
- Maximum number of times to retry a failing file. Each time the data loader is executed, it attempts to load all files that have not yet been successfully loaded, up to this per-file limit. If set to DEFAULT, at load time the loader uses the value of the DataLoaderDefaultRetryLimit configuration parameter.
Default:
DEFAULT
RETENTION INTERVAL
monitoring-retention
- How long to keep records in the events table. DATA_LOADER_EVENTS records events for all data loaders, but each data loader has its own retention interval.
Default:
14 days
AS
copy-statement
- The COPY statement that the loader executes. The FROM clause typically uses a glob.
Privileges
Non-superuser: CREATE privileges on the schema.
Restrictions
- COPY NO COMMIT is not supported.
- Data loaders are executed with COPY ABORT ON ERROR.
- The COPY statement must specify file paths. You cannot use COPY FROM VERTICA.
File systems
The source path can be any shared file system or object store that all database nodes can access. To use an HDFS or Linux file system safely, you must prevent the loader from reading a partially-written file. One way to achieve this is to only execute the loader when files are not being written to the loader's path. Another way to achieve this is to write new files in a temporary location and move them to the loader's path only when they are complete.
Examples
=> CREATE DATA LOADER s.dl1 RETRY LIMIT NONE
AS COPY s.local FROM 's3://b/data/*.dat';
=> SELECT name, schemaname, copystmt, retrylimit FROM data_loaders;
name | schemaname | copystmt | retrylimit
------+------------+---------------------------------------+------------
dl1 | s | COPY s.local FROM 's3://b/data/*.dat' | -1
(1 row)
See Automatic load for an extended example.