Scheduler tool options

The vkconfig script's scheduler tool lets you configure schedulers that continuously loads data from Kafka into Vertica.

The vkconfig script's scheduler tool lets you configure schedulers that continuously loads data from Kafka into Vertica. Use the scheduler tool to create, update, or delete a scheduler, defined by config-schema. If you do not specify a scheduler, commands apply to the default stream_config scheduler.

Syntax

vkconfig scheduler {--create | --read | --update | --drop} other_options...

--create

Creates a new load spec, cannot be used with --delete, --read, or --update.

--read

Outputs the current setting of the scheduler in JSON format. Cannot be used with --create, --delete, or --update.

--update

Updates an existing Set Snippet Variable Value in Topic. Cannot be used with --create, --delete, or --read.

--drop

Drops the scheduler's schema. Dropping its schema deletes the scheduler. After you drop the scheduler's schema, you cannot recover it.

--add-operator user_name

Grants a Vertica user account or role access to use and alter the scheduler. Requires the --update shared utility option.

--auto-sync {TRUE|FALSE}

When TRUE, Vertica automatically synchronizes scheduler source information at the interval specified in --config-refresh.

For details about what the scheduler synchronizes at each interval, see the "Validating Schedulers" and "Synchronizing Schedulers" sections in Automatically consume data from Kafka with the scheduler.

Default: TRUE

--config-refresh HH:MM:SS

The interval of time that the scheduler runs before synchronizing its settings and updating its cached metadata (such as changes made by using the --update option).

Default: 00:05:00

--consumer-group-id id_name

The name of the Kafka consumer group to which Vertica reports its progress consuming messages. Set this value to disable progress reports to a Kafka consumer group. For details, see Monitoring Vertica message consumption with consumer groups.

Default: vertica_database-name

--dump

When you use this option along with the --read option, vkconfig outputs the Vertica query it would use to retrieve the data, rather than outputting the data itself. This option is useful if you want to access the data from within Vertica without having to go through vkconfig. This option has no effect if not used with --read.

--eof-timeout-ms number of milliseconds

If a COPY command does not receive any messages within the eof-timeout-ms interval, Vertica responds by ending that COPY statement.

See Manually consume data from Kafka for more information.

Default: 1 second

--fix-config

Repairs the configuration and re-creates any missing tables. Valid only with the --update shared configuration option.

--frame-duration HH:MM:SS

The interval of time that all individual frames last with this scheduler. The scheduler must have enough time to run each microbatch (each of which execute a COPY statement). You can approximate the average available time per microbatch using the following equation:

TimePerMicrobatch=(FrameDuration*Parallelism)/Microbatches

This is just a rough estimate as there are many factors that impact the amount of time that each microbatch will be able to run.

The vkconfig utility warns you if the time allocated per microbatch is below 2 seconds. You usually should allocate more than two seconds per microbatch to allow the scheduler to load all of the data in the data stream.

Note

In versions of Vertica earlier than 10.0, the default frame duration was 10 seconds. In version 10.0, this default value was increased to 5 minutes in part to compensate for the removal of WOS. If you created your scheduler with the default frame duration in a version prior to 10.0, the frame duration is not updated to the new default value. In this case, consider adjusting the frame duration manually. See Choosing a frame duration for more information.

Default: 00:05:00

--message_max_bytes max_message_size

Specifies the maximum size, in bytes, of a Kafka protocol batch message.

Important

You may need to manually update this value if you created a scheduler using Vertica 9.1.0 or earlier. The meaning of Kafka's max.message.bytes setting changed between version 0.10 and 0.11. See Changes to the message.max.bytes setting in Kafka version 0.11 and later for more information.

**Default:**25165824

--new-source-policy {FAIR|START|END}

Determines how Vertica allocates resources to the newly added source, one of the following:

FAIR: Takes the average length of time from the previous batches and schedules itself appropriately.
START: All new sources start at the beginning of the frame. The batch receives the minimal amount of time to run.
END: All new sources start at the end of the frame. The batch receives the maximum amount of time to run.

Default: FAIR

--operator username

Allows the dbadmin to grant privileges to a previously created Vertica user or role.

This option gives the specified user all privileges on the scheduler instance and EXECUTE privileges on the libkafka library and all its UDxs.

Granting operator privileges gives the user the right to read data off any source in any cluster that can be reached from the Vertica node.

The dbadmin must grant the user separate permission for them to have write privileges on the target tables.

Requires the --create shared utility option. Use the --add-operator option to grant operate privileges after the scheduler has been created.

To revoke privileges, use the --remove-operator option.

--remove-operator user_name

Removes access to the scheduler from a Vertica user account. Requires the --update shared utility option.

--resource-pool pool_name

The resource pool to be used by all queries executed by this scheduler. You must create this pool in advance.

Default: GENERAL pool

Note

The scheduler can use only one-fourth of GENERAL pool's PLANNEDCONCURRENCY.

--upgrade

Upgrades the existing scheduler and configuration schema to the current Vertica version. The upgraded version of the scheduler is not backwards compatible with earlier versions. To upgrade a scheduler to an alternate schema, use the upgrade-to-schema parameter. See Updating schedulers after Vertica upgrades for more information.

--upgrade-to-schema schema name

Copies the scheduler's schema to a new schema specified by schema name and then upgrades it to be compatible with the current version of Vertica. Vertica does not alter the old schema. Requires the --upgrade scheduler utility option.

--validation-type {ERROR|WARN|SKIP}

Renamed from --skip-validation, specifies the level of validation performed on the scheduler. Invalid SQL syntax and other errors can cause invalid microbatches. Vertica supports the following validation types:

ERROR: Cancel configuration or creation if validation fails.
WARN: Proceed with task if validation fails, but display a warning.
SKIP: Perform no validation.

For more information on validation, refer to Automatically consume data from Kafka with the scheduler.

Default: ERROR

See Common vkconfig script options for options that are available in all of the vkconfig tools.

Examples

These examples show how you can use the scheduler utility options.

Give a user, Jim, privileges on the StreamConfig scheduler. Specify that you are making edits to the stream_config scheduler with the --config-schema option:

$ /opt/vertica/packages/kafka/bin/vkconfig scheduler --update --config-schema stream_config --add-operator Jim

Edit the default stream_config scheduler so that every microbatch waits for data for one second before ending:

$ /opt/vertica/packages/kafka/bin/vkconfig scheduler --update --eof-timeout-ms 1000

Upgrade the scheduler named iot_scheduler_8.1 to a new scheduler named iot_scheduler_9.0 that is compatible with the current version of Vertica:

$ /opt/vertica/packages/kafka/bin/vkconfig scheduler --upgrade --config-schema iot_scheduler_8.1 \
                                           --upgrade-to-schema iot_scheduler_9.0

Drop the schema scheduler219a:

$ /opt/vertica/packages/kafka/bin/vkconfig scheduler --drop --config-schema  scheduler219a --username dbadmin

Read the current setting of the options you can set using the scheduler tool for the scheduler defined in weblogs.conf.

$ vkconfig scheduler --read --conf weblog.conf
{"version":"v9.2.0", "frame_duration":"00:00:10", "resource_pool":"weblog_pool",
"config_refresh":"00:05:00", "new_source_policy":"FAIR",
"pushback_policy":"LINEAR", "pushback_max_count":5, "auto_sync":true,
"consumer_group_id":null}