Microbatch tool options
The vkconfig script's microbatch tool lets you configure a scheduler's microbatches.
Syntax
vkconfig microbatch {--create | --read | --update | --delete} \
[--microbatch microbatch_name] [other_options...]
--create
- Creates a new microbatch. Cannot be used with
--delete
,--read
, or--update
. --read
- Outputs the current settings of all microbatches defined in the scheduler. This output is in JSON format. Cannot be used with
--create
,--delete
, or--update
.You can limit the output to specific microbatches by using the
--consumer-group-id
,--enabled
,--load-spec
,--microbatch
,--rejection-schema
,--rejection-table
,--target-schema
,--target-table
, and--target-columns
options. The--enabled
option only accepts a true or false value.You can use LIKE wildcards in these options. See LIKE for more information about using wildcards.
--update
- Updates the settings of
microbatch_name
. Cannot be used with--create
,--delete
, or--read
. --delete
- Deletes the microbatch named
microbatch_name
. Cannot be used with--create
,--read
, or--update
. --microbatch
microbatch_name
- A unique, case insensitive name for the microbatch. This option is required for
--create
,--update
, and--delete
. --add-source-cluster
cluster_name
- The name of a cluster to assign to the microbatch you specify with the
--microbatch
option. You can use this parameter once per command. You can also use it with--update
to add sources to a microbatch. You can only add sources from the same cluster to a single microbatch. Requires--add-source
. --add-source
source_name
- The name of a source to assign to this microbatch. You can use this parameter once per command. You can also use it with
--update
to add sources to a microbatch. Requires--add-source-cluster
. --cluster
cluster_name
- The name of the cluster to which the
--offset
option applies. Only required if the microbatch defines more than one cluster or the--source
parameter is supplied. Requires the--offset
option. --consumer-group-id
id_name
The name of the Kafka consumer group to which Vertica reports its progress consuming messages. Set this value to disable progress reports to a Kafka consumer group. For details, see Monitoring Vertica message consumption with consumer groups.
Default:
vertica_
database-name
--dump
When you use this option along with the
--read
option, vkconfig outputs the Vertica query it would use to retrieve the data, rather than outputting the data itself. This option is useful if you want to access the data from within Vertica without having to go through vkconfig. This option has no effect if not used with--read
.--enabled TRUE|FALSE
- When TRUE, allows the microbatch to execute.
--load-spec
loadspec_name
- The load spec to use while processing this microbatch.
--max-parallelism
max_num_loads
- The maximum number of simultaneous COPY statements created for the microbatch. The scheduler dynamically splits a single microbatch with multiple partitions into
max_num_loads
COPY statements with fewer partitions.This option allows you to:
-
Control the transaction size.
-
Optimize your loads according to your scheduler's scheduler's resource pool settings, such as PLANNEDCONCURRENCY.
-
--new-microbatch
updated_name
- The updated name for the microbatch. Requires the
--update
option. --offset
partition_offset
[,...]
- The offset of the message in the source where the microbatch starts its load. If you use this parameter, you must supply an offset value for each partition in the source or each partition you list in the
--partition
option.You can use this option to skip some messages in the source or reload previously read messages.
See Special Starting Offset Values below for more information.
Important
You cannot set an offset for a microbatch while the scheduler is running. If you attempt to do so, the vkconfig utility returns an error. Use the shutdown utility to shut the scheduler down before setting an offset for a microbatch. --partition
partition
[,...]
- One or more partitions to which the offsets given in the
--offset
option apply. If you supply this option, then the offset values given in the--offset
option applies to the partitions you specify. Requires the--offset
option. --rejection-schema
schema_name
- The existing Vertica schema that contains a table for storing rejected messages.
--rejection-table
table_name
- The existing Vertica table that stores rejected messages.
--remove-source-cluster
cluster_name
- The name of a cluster to remove from this microbatch. You can use this parameter once per command. Requires
--remove-source
. --remove-source
source_name
- The name of a source to remove from this microbatch. You can use this parameter once per command. You can also use it with
--update
to remove multiple sources from a microbatch. Requires--remove-source-cluster
. --source
source_name
- The name of the source to which the offset in the
--offset
option applies. Required when the microbatch defines more than one source or the--cluster
parameter is given. Requires the--offset
option. --target-columns
column_expression
- A column expression for the target table, where
column_expression
can be a comma-delimited list of columns or a complete expression.See the COPY statement Parameters for a description of column expressions.
--target-schema
schema_name
- The existing Vertica target schema associated with this microbatch.
--target-table
table_name
- The name of a Vertica table corresponding to the target. This table must belong to the target schema.
--validation-type {ERROR|WARN|SKIP}
- Controls the validation performed on a created or updated microbatch:
-
ERROR - Cancel configuration or creation if vkconfig cannot validate the microbatch. This is the default setting.
-
WARN - Proceed with task if validation fails, but display a warning.
-
SKIP - Perform no validation.
Renamed from
--skip-validation
. -
See Common vkconfig script options for options that are available in all of the vkconfig tools.
Special starting offset values
The start_offset
portion of the stream
parameter lets you start loading messages from a specific point in the topic's partition. It also accepts one of two special offset values:
-
-2 tells the scheduler to start loading at the earliest available message in the topic's partition. This value is useful when you want to load as many messages as you can from the Kafka topic's partition.
-
-3 tells the scheduler to start loading from the consumer group's saved offset. If the consumer group does not have a saved offset, it starts loading from the earliest available message in the topic partition. See Monitoring Vertica message consumption with consumer groups for more information.
Examples
This example shows how you can create the microbatch, mbatch1. This microbatch identifies the schema, target table, load spec, and source for the microbatch:
$ /opt/vertica/packages/kafka/bin/vkconfig microbatch --create --microbatch mbatch1 \
--target-schema public \
--target-table BatchTarget \
--load-spec Filterspec \
--add-source SourceFeed \
--add-source-cluster StreamCluster1 \
--conf myscheduler.conf
This example demonstrates listing the current settings for the microbatches in the scheduler defined in the weblog.conf configuration file.
$ vkconfig microbatch --read --conf weblog.conf
{"microbatch":"weblog", "target_columns":null, "rejection_schema":null,
"rejection_table":null, "enabled":true, "consumer_group_id":null,
"load_spec":"weblog_load", "filters":null, "parser":"KafkaJSONParser",
"parser_parameters":null, "load_method":"TRICKLE", "message_max_bytes":null,
"uds_kv_parameters":null, "target_schema":"public", "target_table":"web_hits",
"source":"web_hits", "partitions":1, "src_enabled":true, "cluster":"kafka_weblog",
"hosts":"kafka01.example.com:9092,kafka02.example.com:9092"}