EXECUTE DATA LOADER
EXECUTE DATA LOADER executes a data loader created with CREATE DATA LOADER. It attempts to load all files in the specifed location that have not already been loaded and that have not reached the retry limit. By default, executing the loader commits the transaction.
Data loaders are excuted with COPY ABORT ON ERROR.
EXECUTE DATA LOADER records events in the DATA_LOADER_EVENTS system table.
EXECUTE DATA LOADER runs the data loader once. To execute it periodically, you can use a scheduled stored procedure. To suspend execution, call ALTER DATA LOADER with the DISABLE
option.
Syntax
EXECUTE DATA LOADER [schema.]name
[ WITH FILES file [{ EXACT_PATH | EXPAND_GLOB }][,...] ]
[ FORCE RETRY ]
[ BATCH_SIZE { 'size{K|M|G|T}' | '0' | 'UNLIMITED' } ]
[ NO COMMIT ]
Arguments
schema
- Schema containing the data loader. The default schema is
public
. name
- Name of the data loader.
WITH FILES
file
- Load only the specified files from the data loader's COPY path. Files that do not also match the COPY path are ignored. Files that have already been loaded are not reloaded. Both of these events are recorded in the loader's associated monitoring table.
EXACT_PATH | EXPAND_GLOB
- How to match the named file:
-
EXACT_PATH
(default): file names must match exactly. Any special characters in the file name are treated as literal parts of the name. For example,sales*.parquet
specifies a single file whose name includes the*
character. -
EXPAND_GLOB
: glob characters in file names are interpreted as globs. For example,sales*.parquet
specifies one or more Parquet files begainning withsales
.
-
FORCE RETRY
- Ignore the retry limit.
BATCH_SIZE {
'size
{K|M|G|T}' | '0' | 'UNLIMITED' }
- When dividing a load for parallel execution, override the Vertica default and set the target total size of files to include in one batch. This value is a hint and batches could be larger, depending on the sizes of individual files.
If this option is omitted, Vertica chooses a size based on the total data volume and the number of executor nodes. The maximum batch size that Vertica automatically chooses is 10GB on each node, but you can specify a larger value with this option.
To set a target size, specify an integer with a unit of measure, such as
'100M'
or'2G'
. To disable batch loads and load files one at a time, specify '0'. To load all files in a single batch, specify 'UNLIMITED'. NO COMMIT
- Do not commit the transaction after execution. Use when execution is part of an operation that must be atomic. SQS triggers use this option so that execution and queue management are part of the same transaction.
Privileges
Read privileges to the source location.
Non-superuser:
- Owner or EXECUTE privilege on the data loader.
- Write privilege on the table.
Examples
Execute a data loader once:
=> EXECUTE DATA LOADER sales_dl;
Rows Loaded
-------------
5
(1 row)
Schedule a recurring execution:
=> CREATE SCHEDULE daily_load USING CRON '0 13 * * *';
--- statement must be quoted:
=> CREATE PROCEDURE dl_runner() as $$ begin EXECUTE 'EXECUTE DATA LOADER sales_dl'; end; $$;
=> CREATE TRIGGER t_load ON SCHEDULE daily_load EXECUTE PROCEDURE dl_runner() AS DEFINER;
Load specific files with globbing:
--- no file named '09*.dat' so nothing loaded by default:
=> EXECUTE DATA LOADER sales_dl
WITH FILES 's3://MyBucket/sales/2023/09*.dat';
Rows Loaded
-------------
0
(1 row)
--- loads '09-01.dat', '09-02.dat', etc:
=> EXECUTE DATA LOADER sales_dl
WITH FILES 's3://MyBucket/sales/2023/09*.dat' EXPAND_GLOB;
Rows Loaded
-------------
20
(1 row)
--- loads one glob and one specific file:
=> EXECUTE DATA LOADER sales_dl
WITH FILES 's3://MyBucket/sales/2023/09*.dat' EXPAND_GLOB,
's3://MyBucket/sales/2023/specials.dat' EXACT_PATH;
Rows Loaded
-------------
32
(1 row)
See Automatic load for an extended example.