Microbatch tool options
The vkconfig script's microbatch tool lets you configure a scheduler's microbatches.
Syntax
vkconfig microbatch {--create | --read | --update | --delete} \
[--microbatch microbatch_name] [other_options...]
--create- Creates a new microbatch. Cannot be used with
--delete,--read, or--update. --read- Outputs the current settings of all microbatches defined in the scheduler. This output is in JSON format. Cannot be used with
--create,--delete, or--update.You can limit the output to specific microbatches by using the
--consumer-group-id,--enabled,--load-spec,--microbatch,--rejection-schema,--rejection-table,--target-schema,--target-table, and--target-columnsoptions. The--enabledoption only accepts a true or false value.You can use LIKE wildcards in these options. See LIKE for more information about using wildcards.
--update- Updates the settings of
microbatch_name. Cannot be used with--create,--delete, or--read. --delete- Deletes the microbatch named
microbatch_name. Cannot be used with--create,--read, or--update. --microbatchmicrobatch_name- A unique, case insensitive name for the microbatch. This option is required for
--create,--update, and--delete. --add-source-clustercluster_name- The name of a cluster to assign to the microbatch you specify with the
--microbatchoption. You can use this parameter once per command. You can also use it with--updateto add sources to a microbatch. You can only add sources from the same cluster to a single microbatch. Requires--add-source. --add-sourcesource_name- The name of a source to assign to this microbatch. You can use this parameter once per command. You can also use it with
--updateto add sources to a microbatch. Requires--add-source-cluster. --clustercluster_name- The name of the cluster to which the
--offsetoption applies. Only required if the microbatch defines more than one cluster or the--sourceparameter is supplied. Requires the--offsetoption. --consumer-group-idid_nameThe name of the Kafka consumer group to which OpenText™ Analytics Database reports its progress consuming messages. Set this value to disable progress reports to a Kafka consumer group. For details, see Monitoring OpenText Analytics Database message consumption with consumer groups.
Default:
vertica_database-name--dumpWhen you use this option along with the
--readoption, vkconfig outputs the OpenText™ Analytics Database query it would use to retrieve the data, rather than outputting the data itself. This option is useful if you want to access the data from within the database without having to go through vkconfig. This option has no effect if not used with--read.--enabled TRUE|FALSE- When TRUE, allows the microbatch to execute.
--load-specloadspec_name- The load spec to use while processing this microbatch.
--max-parallelismmax_num_loads- The maximum number of simultaneous COPY statements created for the microbatch. The scheduler dynamically splits a single microbatch with multiple partitions into
max_num_loadsCOPY statements with fewer partitions.This option allows you to:
-
Control the transaction size.
-
Optimize your loads according to your scheduler's scheduler's resource pool settings, such as PLANNEDCONCURRENCY.
-
--new-microbatchupdated_name- The updated name for the microbatch. Requires the
--updateoption. --offsetpartition_offset[,...]- The offset of the message in the source where the microbatch starts its load. If you use this parameter, you must supply an offset value for each partition in the source or each partition you list in the
--partitionoption.You can use this option to skip some messages in the source or reload previously read messages.
See Special Starting Offset Values below for more information.
Important
You cannot set an offset for a microbatch while the scheduler is running. If you attempt to do so, the vkconfig utility returns an error. Use the shutdown utility to shut the scheduler down before setting an offset for a microbatch. --partitionpartition[,...]- One or more partitions to which the offsets given in the
--offsetoption apply. If you supply this option, then the offset values given in the--offsetoption applies to the partitions you specify. Requires the--offsetoption. --rejection-schemaschema_name- The existing OpenText™ Analytics Database schema that contains a table for storing rejected messages.
--rejection-tabletable_name- The existing database table that stores rejected messages.
--remove-source-clustercluster_name- The name of a cluster to remove from this microbatch. You can use this parameter once per command. Requires
--remove-source. --remove-sourcesource_name- The name of a source to remove from this microbatch. You can use this parameter once per command. You can also use it with
--updateto remove multiple sources from a microbatch. Requires--remove-source-cluster. --sourcesource_name- The name of the source to which the offset in the
--offsetoption applies. Required when the microbatch defines more than one source or the--clusterparameter is given. Requires the--offsetoption. --target-columnscolumn_expression- A column expression for the target table, where
column_expressioncan be a comma-delimited list of columns or a complete expression.See the COPY statement Parameters for a description of column expressions.
--target-schemaschema_name- The existing database target schema associated with this microbatch.
--target-tabletable_name- The name of a database table corresponding to the target. This table must belong to the target schema.
--validation-type {ERROR|WARN|SKIP}- Controls the validation performed on a created or updated microbatch:
-
ERROR - Cancel configuration or creation if vkconfig cannot validate the microbatch. This is the default setting.
-
WARN - Proceed with task if validation fails, but display a warning.
-
SKIP - Perform no validation.
Renamed from
--skip-validation. -
See Common vkconfig script options for options that are available in all of the vkconfig tools.
Special starting offset values
The start_offset portion of the stream parameter lets you start loading messages from a specific point in the topic's partition. It also accepts one of two special offset values:
-
-2 tells the scheduler to start loading at the earliest available message in the topic's partition. This value is useful when you want to load as many messages as you can from the Kafka topic's partition.
-
-3 tells the scheduler to start loading from the consumer group's saved offset. If the consumer group does not have a saved offset, it starts loading from the earliest available message in the topic partition. See Monitoring OpenText Analytics Database message consumption with consumer groups for more information.
Examples
This example shows how you can create the microbatch, mbatch1. This microbatch identifies the schema, target table, load spec, and source for the microbatch:
$ /opt/vertica/packages/kafka/bin/vkconfig microbatch --create --microbatch mbatch1 \
--target-schema public \
--target-table BatchTarget \
--load-spec Filterspec \
--add-source SourceFeed \
--add-source-cluster StreamCluster1 \
--conf myscheduler.conf
This example demonstrates listing the current settings for the microbatches in the scheduler defined in the weblog.conf configuration file.
$ vkconfig microbatch --read --conf weblog.conf
{"microbatch":"weblog", "target_columns":null, "rejection_schema":null,
"rejection_table":null, "enabled":true, "consumer_group_id":null,
"load_spec":"weblog_load", "filters":null, "parser":"KafkaJSONParser",
"parser_parameters":null, "load_method":"TRICKLE", "message_max_bytes":null,
"uds_kv_parameters":null, "target_schema":"public", "target_table":"web_hits",
"source":"web_hits", "partitions":1, "src_enabled":true, "cluster":"kafka_weblog",
"hosts":"kafka01.example.com:9092,kafka02.example.com:9092"}