EXECUTE DATA LOADER

EXECUTE DATA LOADER runs a data loader to load new files from a specified path.

EXECUTE DATA LOADER executes a data loader created with CREATE DATA LOADER. It attempts to load all files in the specifed location that have not already been loaded and that have not reached the retry limit. Executing the loader automatically commits the transaction.

Data loaders are excuted with COPY ABORT ON ERROR.

EXECUTE DATA LOADER records events in the DATA_LOADER_EVENTS system table.

EXECUTE DATA LOADER runs the data loader once. To execute it periodically, you can use a scheduled stored procedure.

Syntax

EXECUTE DATA LOADER [schema.]name
   [ WITH FILES file [{ EXACT_PATH | EXPAND_GLOB }][,...] ]
   [ FORCE RETRY ]
   [ BATCH_SIZE { 'size{K|M|G|T}' | '0' | 'UNLIMITED' } ]

Arguments

schema
Schema containing the data loader. The default schema is public.
name
Name of the data loader.
WITH FILES file
Load only the specified files from the data loader's COPY path. Files that do not also match the COPY path are ignored. Files that have already been loaded are not reloaded. Both of these events are recorded in the loader's associated monitoring table.
EXACT_PATH | EXPAND_GLOB
How to match the named file:
  • EXACT_PATH (default): file names must match exactly. Any special characters in the file name are treated as literal parts of the name. For example, sales*.parquet specifies a single file whose name includes the * character.

  • EXPAND_GLOB: glob characters in file names are interpreted as globs. For example, sales*.parquet specifies one or more Parquet files begainning with sales.

FORCE RETRY
Ignore the retry limit.
BATCH_SIZE { 'size{K|M|G|T}' | '0' | 'UNLIMITED' }
When dividing a load for parallel execution, override the Vertica default and set the target total size of files to include in one batch. This value is a hint and batches could be larger, depending on the sizes of individual files.

If this option is omitted, Vertica chooses a size based on the total data volume and the number of executor nodes. The maximum batch size that Vertica automatically chooses is 10GB on each node, but you can specify a larger value with this option.

To set a target size, specify an integer with a unit of measure, such as '100M' or '2G'. To disable batch loads and load files one at a time, specify '0'. To load all files in a single batch, specify 'UNLIMITED'.

Privileges

Read privileges to the source location.

Non-superuser:

  • Owner or EXECUTE privilege on the data loader.
  • Write privilege on the table.

Examples

Execute a data loader once:

=> EXECUTE DATA LOADER sales_dl;
 Rows Loaded 
-------------
           5
(1 row)

Schedule a recurring execution:

=> CREATE SCHEDULE daily_load USING CRON '0 13 * * *';

--- statement must be quoted:
=> CREATE PROCEDURE dl_runner() as $$ begin EXECUTE 'EXECUTE DATA LOADER sales_dl'; end; $$;

=> CREATE TRIGGER t_load ON SCHEDULE daily_load EXECUTE PROCEDURE dl_runner() AS DEFINER;

Load specific files with globbing:

--- no file named '09*.dat' so nothing loaded by default:
=> EXECUTE DATA LOADER sales_dl 
   WITH FILES 's3://MyBucket/sales/2023/09*.dat';
 Rows Loaded 
-------------
           0
(1 row)

--- loads '09-01.dat', '09-02.dat', etc:
=> EXECUTE DATA LOADER sales_dl 
   WITH FILES 's3://MyBucket/sales/2023/09*.dat' EXPAND_GLOB;
 Rows Loaded 
-------------
          20
(1 row)

--- loads one glob and one specific file:
=> EXECUTE DATA LOADER sales_dl 
   WITH FILES 's3://MyBucket/sales/2023/09*.dat' EXPAND_GLOB, 
              's3://MyBucket/sales/2023/specials.dat' EXACT_PATH;
 Rows Loaded 
-------------
          32
(1 row)

See Automatic load for an extended example.

See also