Scheduler tool options
The vkconfig script's scheduler tool lets you configure schedulers that continuously loads data from Kafka into Vertica. Use the scheduler tool to create, update, or delete a scheduler, defined by config-schema
. If you do not specify a scheduler, commands apply to the default stream_config scheduler.
Syntax
vkconfig scheduler {--create | --read | --update | --drop} other_options...
--create
- Creates a new scheduler. Cannot be used with
--delete
,--read
, or--update
. --read
- Outputs the current setting of the scheduler in JSON format. Cannot be used with
--create
,--delete
, or--update
. --update
- Updates the settings of the scheduler. Cannot be used with
--create
,--delete
, or--read
. --drop
- Drops the scheduler's schema. Dropping its schema deletes the scheduler. After you drop the scheduler's schema, you cannot recover it.
--add-operator
user_name
- Grants a Vertica user account or role access to use and alter the scheduler. Requires the
--update
shared utility option. --auto-sync
{TRUE|FALSE}
- When TRUE, Vertica automatically synchronizes scheduler source information at the interval specified in
--config-refresh
.For details about what the scheduler synchronizes at each interval, see the "Validating Schedulers" and "Synchronizing Schedulers" sections in Automatically consume data from Kafka with a scheduler.
Default: TRUE
--config-refresh
HH:MM:SS
- The interval of time that the scheduler runs before synchronizing its settings and updating its cached metadata (such as changes made by using the
--update
option).Default: 00:05:00
--consumer-group-id
id_name
The name of the Kafka consumer group to which Vertica reports its progress consuming messages. Set this value to disable progress reports to a Kafka consumer group. For details, see Monitoring Vertica message consumption with consumer groups.
Default:
vertica_
database-name
--dump
When you use this option along with the
--read
option, vkconfig outputs the Vertica query it would use to retrieve the data, rather than outputting the data itself. This option is useful if you want to access the data from within Vertica without having to go through vkconfig. This option has no effect if not used with--read
.--eof-timeout-ms
number of milliseconds
- If a COPY command does not receive any messages within the eof-timeout-ms interval, Vertica responds by ending that COPY statement.
See Manually consume data from Kafka for more information.
Default: 1 second
--fix-config
- Repairs the configuration and re-creates any missing tables. Valid only with the
--update
shared configuration option. --frame-duration
HH:MM:SS
- The interval of time that all individual frames last with this scheduler. The scheduler must have enough time to run each microbatch (each of which execute a COPY statement). You can approximate the average available time per microbatch using the following equation:
TimePerMicrobatch=(FrameDuration*Parallelism)/Microbatches
This is just a rough estimate as there are many factors that impact the amount of time that each microbatch will be able to run.
The vkconfig utility warns you if the time allocated per microbatch is below 2 seconds. You usually should allocate more than two seconds per microbatch to allow the scheduler to load all of the data in the data stream.
Note
In versions of Vertica earlier than 10.0, the default frame duration was 10 seconds. In version 10.0, this default value was increased to 5 minutes in part to compensate for the removal of WOS. If you created your scheduler with the default frame duration in a version prior to 10.0, the frame duration is not updated to the new default value. In this case, consider adjusting the frame duration manually. See Choosing a frame duration for more information.Default: 00:05:00
--message_max_bytes
max_message_size
Specifies the maximum size, in bytes, of a Kafka protocol batch message.
Default: 25165824
--new-source-policy
{FAIR|START|END}
- Determines how Vertica allocates resources to the newly added source, one of the following:
-
FAIR: Takes the average length of time from the previous batches and schedules itself appropriately.
-
START: All new sources start at the beginning of the frame. The batch receives the minimal amount of time to run.
-
END: All new sources start at the end of the frame. The batch receives the maximum amount of time to run.
Default: FAIR
-
--operator
username
- Allows the dbadmin to grant privileges to a previously created Vertica user or role.
This option gives the specified user all privileges on the scheduler instance and EXECUTE privileges on the libkafka library and all its UDxs.
Granting operator privileges gives the user the right to read data off any source in any cluster that can be reached from the Vertica node.
The dbadmin must grant the user separate permission for them to have write privileges on the target tables.
Requires the
--create
shared utility option. Use the--add-operator
option to grant operate privileges after the scheduler has been created.To revoke privileges, use the
--remove-operator
option. --remove-operator
user_name
- Removes access to the scheduler from a Vertica user account. Requires the
--update
shared utility option. --resource-pool
pool_name
- The resource pool to be used by all queries executed by this scheduler. You must create this pool in advance.
Default: GENERAL pool
Note
The scheduler can use only one-fourth of GENERAL pool's PLANNEDCONCURRENCY. --upgrade
- Upgrades the existing scheduler and configuration schema to the current Vertica version. The upgraded version of the scheduler is not backwards compatible with earlier versions. To upgrade a scheduler to an alternate schema, use the
upgrade-to-schema
parameter. See Updating schedulers after Vertica upgrades for more information. --upgrade-to-schema
schema name
- Copies the scheduler's schema to a new schema specified by
schema name
and then upgrades it to be compatible with the current version of Vertica. Vertica does not alter the old schema. Requires the--upgrade
scheduler utility option. --validation-type
{ERROR|WARN|SKIP}
- Renamed from
--skip-validation
, specifies the level of validation performed on the scheduler. Invalid SQL syntax and other errors can cause invalid microbatches. Vertica supports the following validation types:-
ERROR: Cancel configuration or creation if validation fails.
-
WARN: Proceed with task if validation fails, but display a warning.
-
SKIP: Perform no validation.
For more information on validation, refer to Automatically consume data from Kafka with a scheduler.
Default: ERROR
-
See Common vkconfig script options for options that are available in all of the vkconfig tools.
Examples
These examples show how you can use the scheduler utility options.
Give a user, Jim, privileges on the StreamConfig scheduler. Specify that you are making edits to the stream_config scheduler with the --config-schema
option:
$ /opt/vertica/packages/kafka/bin/vkconfig scheduler --update --config-schema stream_config --add-operator Jim
Edit the default stream_config scheduler so that every microbatch waits for data for one second before ending:
$ /opt/vertica/packages/kafka/bin/vkconfig scheduler --update --eof-timeout-ms 1000
Upgrade the scheduler named iot_scheduler_8.1 to a new scheduler named iot_scheduler_9.0 that is compatible with the current version of Vertica:
$ /opt/vertica/packages/kafka/bin/vkconfig scheduler --upgrade --config-schema iot_scheduler_8.1 \
--upgrade-to-schema iot_scheduler_9.0
Drop the schema scheduler219a:
$ /opt/vertica/packages/kafka/bin/vkconfig scheduler --drop --config-schema scheduler219a --username dbadmin
Read the current setting of the options you can set using the scheduler tool for the scheduler defined in weblogs.conf.
$ vkconfig scheduler --read --conf weblog.conf
{"version":"v9.2.0", "frame_duration":"00:00:10", "resource_pool":"weblog_pool",
"config_refresh":"00:05:00", "new_source_policy":"FAIR",
"pushback_policy":"LINEAR", "pushback_max_count":5, "auto_sync":true,
"consumer_group_id":null}