Microbatch tool options

The vkconfig script's microbatch tool lets you configure a scheduler's microbatches.

Syntax

vkconfig microbatch {--create | --read | --update | --delete} \
         [--microbatch microbatch_name] [other_options...]

--create

Creates a new microbatch. Cannot be used with --delete, --read, or --update.

--read

Outputs the current settings of all microbatches defined in the scheduler. This output is in JSON format. Cannot be used with --create, --delete, or --update.

You can limit the output to specific microbatches by using the --consumer-group-id, --enabled, --load-spec, --microbatch, --rejection-schema, --rejection-table, --target-schema, --target-table, and --target-columns options. The --enabled option only accepts a true or false value.

You can use LIKE wildcards in these options. See LIKE for more information about using wildcards.

--update

Updates the settings of microbatch_name. Cannot be used with --create, --delete, or --read.

--delete

Deletes the microbatch named microbatch_name. Cannot be used with --create, --read, or --update.

--microbatch microbatch_name

A unique, case insensitive name for the microbatch. This option is required for --create, --update, and --delete.

--add-source-cluster cluster_name

The name of a cluster to assign to the microbatch you specify with the --microbatch option. You can use this parameter once per command. You can also use it with --update to add sources to a microbatch. You can only add sources from the same cluster to a single microbatch. Requires --add-source.

--add-source source_name

The name of a source to assign to this microbatch. You can use this parameter once per command. You can also use it with --update to add sources to a microbatch. Requires --add-source-cluster.

--cluster cluster_name

The name of the cluster to which the --offset option applies. Only required if the microbatch defines more than one cluster or the --source parameter is supplied. Requires the --offset option.

--consumer-group-id id_name

The name of the Kafka consumer group to which Vertica reports its progress consuming messages. Set this value to disable progress reports to a Kafka consumer group. For details, see Monitoring Vertica message consumption with consumer groups.

Default: vertica_database-name

--dump

When you use this option along with the --read option, vkconfig outputs the Vertica query it would use to retrieve the data, rather than outputting the data itself. This option is useful if you want to access the data from within Vertica without having to go through vkconfig. This option has no effect if not used with --read.

--enabled TRUE|FALSE

When TRUE, allows the microbatch to execute.

--load-spec loadspec_name

The load spec to use while processing this microbatch.

--max-parallelism max_num_loads

The maximum number of simultaneous COPY statements created for the microbatch. The scheduler dynamically splits a single microbatch with multiple partitions into max_num_loads COPY statements with fewer partitions.

This option allows you to:

Control the transaction size.
Optimize your loads according to your scheduler's scheduler's resource pool settings, such as PLANNEDCONCURRENCY.

--new-microbatch updated_name

The updated name for the microbatch. Requires the --update option.

--offset partition_offset[,...]

The offset of the message in the source where the microbatch starts its load. If you use this parameter, you must supply an offset value for each partition in the source or each partition you list in the --partition option.

You can use this option to skip some messages in the source or reload previously read messages.

See Special Starting Offset Values below for more information.

Important

You cannot set an offset for a microbatch while the scheduler is running. If you attempt to do so, the vkconfig utility returns an error. Use the shutdown utility to shut the scheduler down before setting an offset for a microbatch.

--partition partition[,...]

One or more partitions to which the offsets given in the --offset option apply. If you supply this option, then the offset values given in the --offset option applies to the partitions you specify. Requires the --offset option.

--rejection-schema schema_name

The existing Vertica schema that contains a table for storing rejected messages.

--rejection-table table_name

The existing Vertica table that stores rejected messages.

--remove-source-cluster cluster_name

The name of a cluster to remove from this microbatch. You can use this parameter once per command. Requires --remove-source.

--remove-source source_name

The name of a source to remove from this microbatch. You can use this parameter once per command. You can also use it with --update to remove multiple sources from a microbatch. Requires --remove-source-cluster.

--source source_name

The name of the source to which the offset in the --offset option applies. Required when the microbatch defines more than one source or the --cluster parameter is given. Requires the --offset option.

--target-columns column_expression

A column expression for the target table, where column_expression can be a comma-delimited list of columns or a complete expression.

See the COPY statement Parameters for a description of column expressions.

--target-schema schema_name

The existing Vertica target schema associated with this microbatch.

--target-table table_name

The name of a Vertica table corresponding to the target. This table must belong to the target schema.

--validation-type {ERROR|WARN|SKIP}

Controls the validation performed on a created or updated microbatch:

ERROR - Cancel configuration or creation if vkconfig cannot validate the microbatch. This is the default setting.
WARN - Proceed with task if validation fails, but display a warning.
SKIP - Perform no validation.

Renamed from --skip-validation.

See Common vkconfig script options for options that are available in all of the vkconfig tools.

Special starting offset values

The start_offset portion of the stream parameter lets you start loading messages from a specific point in the topic's partition. It also accepts one of two special offset values:

-2 tells the scheduler to start loading at the earliest available message in the topic's partition. This value is useful when you want to load as many messages as you can from the Kafka topic's partition.
-3 tells the scheduler to start loading from the consumer group's saved offset. If the consumer group does not have a saved offset, it starts loading from the earliest available message in the topic partition. See Monitoring Vertica message consumption with consumer groups for more information.

Examples

This example shows how you can create the microbatch, mbatch1. This microbatch identifies the schema, target table, load spec, and source for the microbatch:

$ /opt/vertica/packages/kafka/bin/vkconfig microbatch --create --microbatch mbatch1 \
                                                    --target-schema public \
                                                    --target-table BatchTarget \
                                                    --load-spec Filterspec \
                                                    --add-source SourceFeed \
                                                    --add-source-cluster StreamCluster1 \
                                                    --conf myscheduler.conf

This example demonstrates listing the current settings for the microbatches in the scheduler defined in the weblog.conf configuration file.

$ vkconfig microbatch --read --conf weblog.conf
{"microbatch":"weblog", "target_columns":null, "rejection_schema":null,
"rejection_table":null, "enabled":true, "consumer_group_id":null,
"load_spec":"weblog_load", "filters":null, "parser":"KafkaJSONParser",
"parser_parameters":null, "load_method":"TRICKLE", "message_max_bytes":null,
"uds_kv_parameters":null, "target_schema":"public", "target_table":"web_hits",
"source":"web_hits", "partitions":1, "src_enabled":true, "cluster":"kafka_weblog",
"hosts":"kafka01.example.com:9092,kafka02.example.com:9092"}