1 - Common vkconfig script options
These options are available across the different tools available in the vkconfig script.
These options are available across the different tools available in the vkconfig script.
--conf
filename
- A text file containing configuration options for the vkconfig script. See Configuration File Format below.
--config-schema
schema_name
- The name of the scheduler's Vertica schema. This value is the same as the name of the scheduler. You use this name to identify the scheduler during configuration.
Default:
stream_config
--dbhost
host name
- The host name or IP address of the Vertica node acting as the initiator node for the scheduler.
Default:
localhost
--dbport
port_number
- The port to use to connect to a Vertica database.
Default:
5433
--enable-ssl
- Enables the vkconfig script to use SSL to connect to Vertica or between Vertica and Kafka. See Configuring your scheduler for TLS connections for more information.
--help
- Prints out a help menu listing available options with a description.
--jdbc-opt
option
=
value
[&
option2
=
value2
...]
- One or more options to add to the standard JDBC URL that vkconfig uses to connect to Vertica. Cannot be combined with
--jdbc-url
.
--jdbc-url
url
- A complete JDBC URL that vkconfig uses instead of standard JDBC URL string to connect to Vertica.
--password
password
- Password for the database user.
--ssl-ca-alias
alias_name
- The alias of the root certificate authority in the truststore. When set, the scheduler loads only the certificate associated with the specified alias. When omitted, the scheduler loads all certificates into the truststore.
--ssl-key-alias
alias_name
- The alias of the key and certificate pairs within the keystore. Must be set when Vertica uses SSL to connect to Kafka.
--ssl-key-password
password
- The password for the SSL key. Must be set when Vertica uses SSL to connect to Kafka.
Caution
Specifying this option on the command line can expose it to other users logged into the host. Always use a configuration file to set this option.
--username
username
- The Vertica database user used to alter the configuration of the scheduler. This use must have create privileges on the scheduler's schema.
Default:
Current user
--version
- Displays the version number of the scheduler.
You can use a configuration file to store common parameters you use in your calls to the vkconfig utility. The configuration file is a text file containing one option setting per line in the format:
option=value
You can also include comments in the option file by prefixing them with a hash mark (#).
#config.properties:
username=myuser
password=mypassword
dbhost=localhost
dbport=5433
You tell vkconfig to use the configuration file using the --conf option:
$ /opt/vertica/packages/kafka/bin/vkconfig source --update --conf config.properties
You can override any stored parameter from the command line:
$ /opt/vertica/packages/kafka/bin/vkconfig source --update --conf config.properties --dbhost otherVerticaHost
Examples
These examples show how you can use the shared utility options.
Display help for the scheduler utility:
$ vkconfig scheduler --help
This command configures a Scheduler, which can run and load data from configured
sources and clusters into Vertica tables. It provides options for changing the
'frame duration' (time given per set of batches to resolve), as well as the
dedicated Vertica resource pool the Scheduler will use while running.
Available Options:
PARAMETER #ARGS DESCRIPTION
conf 1 Allow the use of a properties file to associate
parameter keys and values. This file enables
command string reuse and cleaner command strings.
help 0 Outputs a help context for the given subutility.
version 0 Outputs the current Version of the scheduer.
skip-validation 0 [Depricated] Use --validation-type.
validation-type 1 Determine what happens when there are
configuration errors. Accepts: ERROR - errors
out, WARN - prints out a message and continues,
SKIP - skip running validations
dbhost 1 The Vertica database hostname that contains
metadata and configuration information. The
default value is 'localhost'.
dbport 1 The port at the hostname to connect to the
Vertica database. The default value is '5433'.
username 1 The user to connect to Vertica. The default
value is the current system user.
password 1 The password for the user connecting to Vertica.
The default value is empty.
jdbc-url 1 A JDBC URL that can override Vertica connection
parameters and provide additional JDBC options.
jdbc-opt 1 Options to add to the JDBC URL used to connect
to Vertica ('&'-separated key=value list).
Used with generated URL (i.e. not with
'--jdbc-url' set).
enable-ssl 1 Enable SSL between JDBC and Vertica and/or
Vertica and Kafka.
ssl-ca-alias 1 The alias of the root CA within the provided
truststore used when connecting between
Vertica and Kafka.
ssl-key-alias 1 The alias of the key and certificate pair
within the provided keystore used when
connecting between Vertica and Kafka.
ssl-key-password 1 The password for the key used when connecting
between Vertica and Kafka. Should be hidden
with file access (see --conf).
config-schema 1 The schema containing the configuration details
to be used, created or edited. This parameter
defines the scheduler. The default value is
'stream_config'.
create 0 Create a new instance of the supplied type.
read 0 Read an instance of the supplied type.
update 0 Update an instance of the supplied type.
delete 0 Delete an instance of the supplied type.
drop 0 Drops the specified configuration schema.
CAUTION: this command will completely delete
and remove all configuration and monitoring
data for the specified scheduler.
dump 0 Dump the config schema query string used to
answer this command in the output.
operator 1 Specifies a user designated as an operator for
the created configuration. Used with --create.
add-operator 1 Add a user designated as an operator for the
specified configuration. Used with --update.
remove-operator 1 Removes a user designated as an operator for
the specified configuration. Used with
--update.
upgrade 0 Upgrade the current scheduler configuration
schema to the current version of this
scheduler. WARNING: if upgrading between
EXCAVATOR and FRONTLOADER be aware that the
Scheduler is not backwards compatible. The
upgrade procedure will translate your kafka
model into the new stream model.
upgrade-to-schema 1 Used with upgrade: will upgrade the
configuration to a new given schema instead of
upgrading within the same schema.
fix-config 0 Attempts to fix the configuration (ex: dropped
tables) before doing any other updates. Used
with --update.
frame-duration 1 The duration of the Scheduler's frame, in
which every configured Microbatch runs. Default
is 300 seconds: '00:05:00'
resource-pool 1 The Vertica resource pool to run the Scheduler
on. Default is 'general'.
config-refresh 1 The interval of time between Scheduler
configuration refreshes. Default is 5 minutes:
'00:05'
new-source-policy 1 The policy for new Sources to be scheduled
during a frame. Options are: START, END, and
FAIR. Default is 'FAIR'.
pushback-policy 1
pushback-max-count 1
auto-sync 1 Automatically update configuration based on
metadata from the Kafka cluster
consumer-group-id 1 The Kafka consumer group id to report offsets
to.
eof-timeout-ms 1 [DEPRECATED] This option has no effect.
2 - Scheduler tool options
The vkconfig script's scheduler tool lets you configure schedulers that continuously loads data from Kafka into Vertica.
The vkconfig script's scheduler tool lets you configure schedulers that continuously loads data from Kafka into Vertica. Use the scheduler tool to create, update, or delete a scheduler, defined by config-schema
. If you do not specify a scheduler, commands apply to the default stream_config scheduler.
Syntax
vkconfig scheduler {--create | --read | --update | --drop} other_options...
--create
- Creates a new scheduler. Cannot be used with
--delete
, --read
, or --update
.
--read
- Outputs the current setting of the scheduler in JSON format. Cannot be used with
--create
, --delete
, or --update
.
--update
- Updates the settings of the scheduler. Cannot be used with
--create
, --delete
, or --read
.
--drop
- Drops the scheduler's schema. Dropping its schema deletes the scheduler. After you drop the scheduler's schema, you cannot recover it.
--add-operator
user_name
- Grants a Vertica user account or role access to use and alter the scheduler. Requires the
--update
shared utility option.
--auto-sync
{TRUE|FALSE}
- When TRUE, Vertica automatically synchronizes scheduler source information at the interval specified in
--config-refresh
.
For details about what the scheduler synchronizes at each interval, see the "Validating Schedulers" and "Synchronizing Schedulers" sections in Automatically consume data from Kafka with a scheduler.
Default: TRUE
--config-refresh
HH:MM:SS
- The interval of time that the scheduler runs before synchronizing its settings and updating its cached metadata (such as changes made by using the
--update
option).
Default: 00:05:00
--consumer-group-id
id_name
The name of the Kafka consumer group to which Vertica reports its progress consuming messages. Set this value to disable progress reports to a Kafka consumer group. For details, see Monitoring Vertica message consumption with consumer groups.
Default: vertica_
database-name
--dump
When you use this option along with the --read
option, vkconfig outputs the Vertica query it would use to retrieve the data, rather than outputting the data itself. This option is useful if you want to access the data from within Vertica without having to go through vkconfig. This option has no effect if not used with --read
.
--eof-timeout-ms
number of milliseconds
- If a COPY command does not receive any messages within the eof-timeout-ms interval, Vertica responds by ending that COPY statement.
See Manually consume data from Kafka for more information.
Default: 1 second
--fix-config
- Repairs the configuration and re-creates any missing tables. Valid only with the
--update
shared configuration option.
--frame-duration
HH:MM:SS
- The interval of time that all individual frames last with this scheduler. The scheduler must have enough time to run each microbatch (each of which execute a COPY statement). You can approximate the average available time per microbatch using the following equation:
TimePerMicrobatch=(FrameDuration*Parallelism)/Microbatches
This is just a rough estimate as there are many factors that impact the amount of time that each microbatch will be able to run.
The vkconfig utility warns you if the time allocated per microbatch is below 2 seconds. You usually should allocate more than two seconds per microbatch to allow the scheduler to load all of the data in the data stream.
Note
In versions of Vertica earlier than 10.0, the default frame duration was 10 seconds. In version 10.0, this default value was increased to 5 minutes in part to compensate for the removal of WOS. If you created your scheduler with the default frame duration in a version prior to 10.0, the frame duration is not updated to the new default value. In this case, consider adjusting the frame duration manually. See
Choosing a frame duration for more information.
Default: 00:05:00
--message_max_bytes
max_message_size
Specifies the maximum size, in bytes, of a Kafka protocol batch message.
Default: 25165824
--new-source-policy
{FAIR|START|END}
- Determines how Vertica allocates resources to the newly added source, one of the following:
-
FAIR: Takes the average length of time from the previous batches and schedules itself appropriately.
-
START: All new sources start at the beginning of the frame. The batch receives the minimal amount of time to run.
-
END: All new sources start at the end of the frame. The batch receives the maximum amount of time to run.
Default: FAIR
--operator
username
- Allows the dbadmin to grant privileges to a previously created Vertica user or role.
This option gives the specified user all privileges on the scheduler instance and EXECUTE privileges on the libkafka library and all its UDxs.
Granting operator privileges gives the user the right to read data off any source in any cluster that can be reached from the Vertica node.
The dbadmin must grant the user separate permission for them to have write privileges on the target tables.
Requires the --create
shared utility option. Use the --add-operator
option to grant operate privileges after the scheduler has been created.
To revoke privileges, use the --remove-operator
option.
--remove-operator
user_name
- Removes access to the scheduler from a Vertica user account. Requires the
--update
shared utility option.
--resource-pool
pool_name
- The resource pool to be used by all queries executed by this scheduler. You must create this pool in advance.
Default: GENERAL pool
--upgrade
- Upgrades the existing scheduler and configuration schema to the current Vertica version. The upgraded version of the scheduler is not backwards compatible with earlier versions. To upgrade a scheduler to an alternate schema, use the
upgrade-to-schema
parameter. See Updating schedulers after Vertica upgrades for more information.
--upgrade-to-schema
schema name
- Copies the scheduler's schema to a new schema specified by
schema name
and then upgrades it to be compatible with the current version of Vertica. Vertica does not alter the old schema. Requires the --upgrade
scheduler utility option.
--validation-type
{ERROR|WARN|SKIP}
- Renamed from
--skip-validation
, specifies the level of validation performed on the scheduler. Invalid SQL syntax and other errors can cause invalid microbatches. Vertica supports the following validation types:
-
ERROR: Cancel configuration or creation if validation fails.
-
WARN: Proceed with task if validation fails, but display a warning.
-
SKIP: Perform no validation.
For more information on validation, refer to Automatically consume data from Kafka with a scheduler.
Default: ERROR
See Common vkconfig script options for options that are available in all of the vkconfig tools.
Examples
These examples show how you can use the scheduler utility options.
Give a user, Jim, privileges on the StreamConfig scheduler. Specify that you are making edits to the stream_config scheduler with the --config-schema
option:
$ /opt/vertica/packages/kafka/bin/vkconfig scheduler --update --config-schema stream_config --add-operator Jim
Edit the default stream_config scheduler so that every microbatch waits for data for one second before ending:
$ /opt/vertica/packages/kafka/bin/vkconfig scheduler --update --eof-timeout-ms 1000
Upgrade the scheduler named iot_scheduler_8.1 to a new scheduler named iot_scheduler_9.0 that is compatible with the current version of Vertica:
$ /opt/vertica/packages/kafka/bin/vkconfig scheduler --upgrade --config-schema iot_scheduler_8.1 \
--upgrade-to-schema iot_scheduler_9.0
Drop the schema scheduler219a:
$ /opt/vertica/packages/kafka/bin/vkconfig scheduler --drop --config-schema scheduler219a --username dbadmin
Read the current setting of the options you can set using the scheduler tool for the scheduler defined in weblogs.conf.
$ vkconfig scheduler --read --conf weblog.conf
{"version":"v9.2.0", "frame_duration":"00:00:10", "resource_pool":"weblog_pool",
"config_refresh":"00:05:00", "new_source_policy":"FAIR",
"pushback_policy":"LINEAR", "pushback_max_count":5, "auto_sync":true,
"consumer_group_id":null}
4 - Source tool options
Use the vkconfig script's source tool to create, update, or delete a source.
Use the vkconfig script's source tool to create, update, or delete a source.
Syntax
vkconfig source {--create | --read | --update | --delete} \
--source source_name [other_options...]
--create
- Creates a new source. Cannot be used with
--delete
, --read
, or --update
.
--read
- Outputs the current setting of the sources defined in the scheduler. The output is in JSON format. Cannot be used with
--create
, --delete
, or --update
.
By default this option outputs all of the sources defined in the scheduler. You can limit the output by using the --cluster
, --enabled
, --partitions
, and --source
options. The output will only contain sources that match the values in these options. The --enabled
option can only have a true or false value. The --source
option is case-sensitive.
You can use LIKE wildcards in these options. See LIKE for more information about using wildcards.
--update
- Updates the settings of
source_name
. Cannot be used with --create
, --delete
, or --read
.
--delete
- Deletes the source named
source_name
. Cannot be used with --create
, --read
, or --update
.
--source
source_name
- Identifies the source to create or alter in the scheduler's configuration. This option is case-sensitive. You can use any name you like for a new source. Most people use the name of the Kafka topic the scheduler loads its data from. This option is required for
--create
, --update
, and --delete
.
--cluster
cluster_name
- Identifies the cluster containing the source that you want to create or edit. You must have already defined this cluster in the scheduler.
--dump
When you use this option along with the --read
option, vkconfig outputs the Vertica query it would use to retrieve the data, rather than outputting the data itself. This option is useful if you want to access the data from within Vertica without having to go through vkconfig. This option has no effect if not used with --read
.
--enabled
TRUE|FALSE
- When TRUE, the source is available for use.
--new-cluster
cluster_name
- Changes the cluster this source belongs to.
All sources referencing the old cluster source now target this cluster.
Requires:--update
and --source
options
--new-source
source_name
- Updates the name of an existing source to the name specified by this parameter.
Requires: --update
shared utility option
--partitions
count
- Sets the number of partitions in the source.
Default:
The number of partitions defined in the cluster.
Requires:--create
and --source
options
You must keep this consistent with the number of partitions in the Kafka topic.
Renamed from --num-partitions
.
--validation-typERROR|WARN|SKIP}
- Controls the validation performed on a created or updated source:
-
ERROR - Cancel configuration or creation if vkconfig cannot validate the source. This is the default setting.
-
WARN - Proceed with task if validation fails, but display a warning.
-
SKIP - Perform no validation.
Renamed from --skip-validation
.
See Common vkconfig script options for options that are available in all of the vkconfig tools.
Examples
The following examples show how you can create or update SourceFeed.
Create the source SourceFeed and assign it to the cluster, StreamCluster1 in the scheduler defined by the myscheduler.conf config file:
$ /opt/vertica/packages/kafka/bin/vkconfig source --create --source SourceFeed \
--cluster StreamCluster1 --partitions 3
--conf myscheduler.conf
Update the existing source SourceFeed to use the existing cluster, StreamCluster2 in the scheduler defined by the myscheduler.conf config file:
$ /opt/vertica/packages/kafka/bin/vkconfig source --update --source SourceFeed \
--new-cluster StreamCluster2
--conf myscheduler.conf
The following example reads the sources defined in the scheduler defined by the weblogs.conf file.
$ vkconfig source --read --conf weblog.conf
{"source":"web_hits", "partitions":1, "src_enabled":true,
"cluster":"kafka_weblog",
"hosts":"kafka01.example.com:9092,kafka02.example.com:9092"}
6 - Load spec tool options
The vkconfig script's load spec tool lets you provide parameters for a COPY statement that loads streaming data.
The vkconfig script's load spec tool lets you provide parameters for a COPY statement that loads streaming data.
Syntax
$ vkconfig load-spec {--create | --read | --update | --delete} \
[--load-spec spec-name] [other-options...]
--create
- Creates a new load spec. Cannot be used with
--delete
, --read
, or --update
.
--read
- Outputs the current settings of the load specs defined in the scheduler. This output is in JSON format. Cannot be used with
--create
, --delete
, or --update
.
By default, this option outputs all load specs defined in the scheduler. You can limit the output by supplying a single value or a comma-separated list of values to these options:
-
--load-spec
-
--filters
-
--uds-kv-parameters
-
--parser
-
--message-max-bytes
-
--parser-parameters
The vkconfig script only outputs the configuration of load specs that match the values you supply.
You can use LIKE wildcards in these options. See LIKE for more information about using wildcards.
--update
- Updates the settings of
spec-name
. Cannot be used with --create
, --delete
, or --read
.
--delete
- Deletes the load spec named
spec-name
. Cannot be used with --create
, --read
, or --update
.
--load-spec
spec-name
- A unique name for copy load spec to operate on. This option is required for
--create
, --update
, and --delete
.
--dump
When you use this option along with the --read
option, vkconfig outputs the Vertica query it would use to retrieve the data, rather than outputting the data itself. This option is useful if you want to access the data from within Vertica without having to go through vkconfig. This option has no effect if not used with --read
.
--filters "
filter-name
"
- A Vertica FILTER chain containing all of the UDFilters to use in the COPY statement. For more information on filters, refer to Parsing custom formats.
--message-max-bytes
max-size
Specifies the maximum size, in bytes, of a Kafka protocol batch message.
Default: 25165824
--new-load-spec
new-name
- A new, unique name for an existing load spec. Requires the
--update
parameter.
-
--parser-parameters "key=value[,...]`"`
- A list of parameters to provide to the parser specified in the
--parser
parameter. When you use a Vertica native parser, the scheduler passes these parameters to the COPY statement where they are in turn passed to the parser.
--parser
parser-name
- Identifies a Vertica UDParser to use with a specified target.This parser is used within the COPY statement that the scheduler runs to load data. If you are using a Vertica native parser, the values supplied to the
--parser-parameters
option are passed through to the COPY statement.
**Default:**KafkaParser
--uds-kv-parameters
key
=
value
[,...]
- A comma separated list of key value pairs for the user-defined source.
--validation-type {ERROR|WARN|SKIP}
- Specifies the validation performed on a created or updated load spec, to one of the following:
-
ERROR
: Cancel configuration or creation if vkconfig cannot validate the load spec. This is the default setting.
-
WARN
: Proceed with task if validation fails, but display a warning.
-
SKIP
: Perform no validation.
Renamed from --skip-validation
.
See Common vkconfig script options for options that are available in all of the vkconfig tools.
Examples
These examples show how you can use the Load Spec utility options.
Create load spec Streamspec1
:
$ /opt/vertica/packages/kafka/bin/vkconfig load-spec --create --load-spec Streamspec1 --conf myscheduler.conf
Rename load spec Streamspec1
to Streamspec2
:
$ /opt/vertica/packages/kafka/bin/vkconfig load-spec --update --load-spec Streamspec1 \
--new-load-spec Streamspec2 \
--conf myscheduler.conf
Update load spec Filterspec
to use the KafkaInsertLengths
filter and a custom decryption filter:
$ /opt/vertica/packages/kafka/bin/vkconfig load-spec --update --load-spec Filterspec \
--filters "KafkaInsertLengths() DecryptFilter(parameter=Key)" \
--conf myscheduler.conf
Read the current settings for load spec streamspec1
:
$ vkconfig load-spec --read --load-spec streamspec1 --conf weblog.conf
{"load_spec":"streamspec1", "filters":null, "parser":"KafkaParser",
"parser_parameters":null, "load_method":"TRICKLE", "message_max_bytes":null,
"uds_kv_parameters":null}
7 - Microbatch tool options
The vkconfig script's microbatch tool lets you configure a scheduler's microbatches.
The vkconfig script's microbatch tool lets you configure a scheduler's microbatches.
Syntax
vkconfig microbatch {--create | --read | --update | --delete} \
[--microbatch microbatch_name] [other_options...]
--create
- Creates a new microbatch. Cannot be used with
--delete
, --read
, or --update
.
--read
- Outputs the current settings of all microbatches defined in the scheduler. This output is in JSON format. Cannot be used with
--create
, --delete
, or --update
.
You can limit the output to specific microbatches by using the --consumer-group-id
, --enabled
, --load-spec
, --microbatch
, --rejection-schema
, --rejection-table
, --target-schema
, --target-table
, and --target-columns
options. The --enabled
option only accepts a true or false value.
You can use LIKE wildcards in these options. See LIKE for more information about using wildcards.
--update
- Updates the settings of
microbatch_name
. Cannot be used with --create
, --delete
, or --read
.
--delete
- Deletes the microbatch named
microbatch_name
. Cannot be used with --create
, --read
, or --update
.
--microbatch
microbatch_name
- A unique, case insensitive name for the microbatch. This option is required for
--create
, --update
, and --delete
.
--add-source-cluster
cluster_name
- The name of a cluster to assign to the microbatch you specify with the
--microbatch
option. You can use this parameter once per command. You can also use it with --update
to add sources to a microbatch. You can only add sources from the same cluster to a single microbatch. Requires --add-source
.
--add-source
source_name
- The name of a source to assign to this microbatch. You can use this parameter once per command. You can also use it with
--update
to add sources to a microbatch. Requires --add-source-cluster
.
--cluster
cluster_name
- The name of the cluster to which the
--offset
option applies. Only required if the microbatch defines more than one cluster or the --source
parameter is supplied. Requires the --offset
option.
--consumer-group-id
id_name
The name of the Kafka consumer group to which Vertica reports its progress consuming messages. Set this value to disable progress reports to a Kafka consumer group. For details, see Monitoring Vertica message consumption with consumer groups.
Default: vertica_
database-name
--dump
When you use this option along with the --read
option, vkconfig outputs the Vertica query it would use to retrieve the data, rather than outputting the data itself. This option is useful if you want to access the data from within Vertica without having to go through vkconfig. This option has no effect if not used with --read
.
--enabled TRUE|FALSE
- When TRUE, allows the microbatch to execute.
--load-spec
loadspec_name
- The load spec to use while processing this microbatch.
--max-parallelism
max_num_loads
- The maximum number of simultaneous COPY statements created for the microbatch. The scheduler dynamically splits a single microbatch with multiple partitions into
max_num_loads
COPY statements with fewer partitions.
This option allows you to:
--new-microbatch
updated_name
- The updated name for the microbatch. Requires the
--update
option.
--offset
partition_offset
[,...]
- The offset of the message in the source where the microbatch starts its load. If you use this parameter, you must supply an offset value for each partition in the source or each partition you list in the
--partition
option.
You can use this option to skip some messages in the source or reload previously read messages.
See Special Starting Offset Values below for more information.
Important
You cannot set an offset for a microbatch while the scheduler is running. If you attempt to do so, the vkconfig utility returns an error. Use the shutdown utility to shut the scheduler down before setting an offset for a microbatch.
--partition
partition
[,...]
- One or more partitions to which the offsets given in the
--offset
option apply. If you supply this option, then the offset values given in the --offset
option applies to the partitions you specify. Requires the --offset
option.
--rejection-schema
schema_name
- The existing Vertica schema that contains a table for storing rejected messages.
--rejection-table
table_name
- The existing Vertica table that stores rejected messages.
--remove-source-cluster
cluster_name
- The name of a cluster to remove from this microbatch. You can use this parameter once per command. Requires
--remove-source
.
--remove-source
source_name
- The name of a source to remove from this microbatch. You can use this parameter once per command. You can also use it with
--update
to remove multiple sources from a microbatch. Requires --remove-source-cluster
.
--source
source_name
- The name of the source to which the offset in the
--offset
option applies. Required when the microbatch defines more than one source or the --cluster
parameter is given. Requires the --offset
option.
--target-columns
column_expression
- A column expression for the target table, where
column_expression
can be a comma-delimited list of columns or a complete expression.
See the COPY statement Parameters for a description of column expressions.
--target-schema
schema_name
- The existing Vertica target schema associated with this microbatch.
--target-table
table_name
- The name of a Vertica table corresponding to the target. This table must belong to the target schema.
--validation-type {ERROR|WARN|SKIP}
- Controls the validation performed on a created or updated microbatch:
-
ERROR - Cancel configuration or creation if vkconfig cannot validate the microbatch. This is the default setting.
-
WARN - Proceed with task if validation fails, but display a warning.
-
SKIP - Perform no validation.
Renamed from --skip-validation
.
See Common vkconfig script options for options that are available in all of the vkconfig tools.
Special starting offset values
The start_offset
portion of the stream
parameter lets you start loading messages from a specific point in the topic's partition. It also accepts one of two special offset values:
-
-2 tells the scheduler to start loading at the earliest available message in the topic's partition. This value is useful when you want to load as many messages as you can from the Kafka topic's partition.
-
-3 tells the scheduler to start loading from the consumer group's saved offset. If the consumer group does not have a saved offset, it starts loading from the earliest available message in the topic partition. See Monitoring Vertica message consumption with consumer groups for more information.
Examples
This example shows how you can create the microbatch, mbatch1. This microbatch identifies the schema, target table, load spec, and source for the microbatch:
$ /opt/vertica/packages/kafka/bin/vkconfig microbatch --create --microbatch mbatch1 \
--target-schema public \
--target-table BatchTarget \
--load-spec Filterspec \
--add-source SourceFeed \
--add-source-cluster StreamCluster1 \
--conf myscheduler.conf
This example demonstrates listing the current settings for the microbatches in the scheduler defined in the weblog.conf configuration file.
$ vkconfig microbatch --read --conf weblog.conf
{"microbatch":"weblog", "target_columns":null, "rejection_schema":null,
"rejection_table":null, "enabled":true, "consumer_group_id":null,
"load_spec":"weblog_load", "filters":null, "parser":"KafkaJSONParser",
"parser_parameters":null, "load_method":"TRICKLE", "message_max_bytes":null,
"uds_kv_parameters":null, "target_schema":"public", "target_table":"web_hits",
"source":"web_hits", "partitions":1, "src_enabled":true, "cluster":"kafka_weblog",
"hosts":"kafka01.example.com:9092,kafka02.example.com:9092"}
10 - Statistics tool options
The statistics tool lets you access the history of microbatches that your scheduler has run.
The statistics tool lets you access the history of microbatches that your scheduler has run. This tool outputs the log of the microbatches in JSON format to the standard output. You can use its options to filter the list of microbatches to get just the microbatches that interest you.
Note
The statistics tool can sometimes produce confusing output if you have altered the scheduler configuration over time. For example, suppose you have microbatch-a target a table. Later, you change the scheduler's configuration so that microbatch-b targets the table. Afterwards, you run the statistics tool and filter the microbatch log based on target table. Then the log output will show entries from both microbatch-a and microbatch-b.
Syntax
vkconfig statistics [options]
--cluster "
cluster
"[,"
cluster2
"...]
- Only return microbatches that retrieved data from a cluster whose name matches one in the list you supply.
--dump
- Instead of returning microbatch data, return the SQL query that vkconfig would execute to extract the data from the scheduler tables. You can use this option if you want use a Vertica client application to get the microbatch log instead of using vkconfig's JSON output.
--from-timestamp "
timestamp
"
- Only return microbatches that began after
timestamp
. The timestamp value is in yyyy-[m]m-[d]d hh:mm:ss format.
Cannot be used in conjunction with --last
.
--last
number
- Returns the number most recent microbatches that meet all other filters. Cannot be used in conjunction with
--from-timestamp
or --to-timestamp
.
--microbatch "
name
"[,"
name2
"...]
- Only return microbatches whose name matches one of the names in the comma-separated list.
--partition
partition#
[,
partition#2
...]
- Only return microbatches that accessed data from the topic partition that matches ones of the values in the partition list.
--source "
source
"[,"
source2
"...]
- Only return microbatches that accessed data from a source whose name matches one of the names in the list you supply to this argument.
--target-schema "schema"[,"schema2"...]
- Only return microbatches that wrote data to the Vertica schemas whose name matches one of the names in the target schema list argument.
--target-table "table"[,"table2"...]
- Only return microbatches that wrote data to Vertica tables whose name match one of the names in the target schema list argument.
--to-timestamp "
timestamp
"
- Only return microbatches that began before
timestamp
. The timestamp value is in yyyy-[m]m-[d]d hh:mm:ss format.
Cannot be used in conjunction with --last
.
See Common vkconfig script options for options that are available in all of the vkconfig tools.
Usage considerations
-
You can use LIKE wildcards in the values you supply to the --cluster
, --microbatch
, --source
, --target-schema
, and --target-table
arguments. This feature lets you match partial strings in the microbatch data. See LIKE for more information about using wildcards.
-
The string comparisons for the --cluster
, --microbatch
, --source
, --target-schema
, and --target-table
arguments are case-insensitive.
-
The date and time values you supply to the --from-timestamp
and --to-timestamp
arguments use the java.sql.timestamp format for parsing the value. This format's parsing can accept values that you may consider invalid and would expect it to reject. For example, if you supply a timestamp of 01-01-2018 24:99:99, the Java timestamp parser silently converts the date to 2018-01-02 01:40:39 instead of returning an error.
Examples
This example gets the last microbatch that the scheduler defined in the weblog.conf file ran:
$ /opt/vertica/packages/kafka/bin/vkconfig statistics --last 1 --conf weblog.conf
{"microbatch":"weblog", "target_schema":"public", "target_table":"web_hits",
"source_name":"web_hits", "source_cluster":"kafka_weblog", "source_partition":0,
"start_offset":80000, "end_offset":79999, "end_reason":"END_OF_STREAM",
"end_reason_message":null, "partition_bytes":0, "partition_messages":0,
"timeslice":"00:00:09.793000", "batch_start":"2018-11-06 09:42:00.176747",
"batch_end":"2018-11-06 09:42:00.437787", "source_duration":"00:00:00.214314",
"consecutive_error_count":null, "transaction_id":45035996274513069,
"frame_start":"2018-11-06 09:41:59.949", "frame_end":null}
If your scheduler is reading from more than partition, the --last 1
option lists the last microbatch from each partition:
$ /opt/vertica/packages/kafka/bin/vkconfig statistics --last 1 --conf iot.conf
{"microbatch":"iotlog", "target_schema":"public", "target_table":"iot_data",
"source_name":"iot_data", "source_cluster":"kafka_iot", "source_partition":0,
"start_offset":-2, "end_offset":-2, "end_reason":"DEADLINE",
"end_reason_message":null, "partition_bytes":0, "partition_messages":0,
"timeslice":"00:00:09.842000", "batch_start":"2018-11-06 12:52:49.387567",
"batch_end":"2018-11-06 12:52:59.400219", "source_duration":"00:00:09.950127",
"consecutive_error_count":null, "transaction_id":45035996274537015,
"frame_start":"2018-11-06 12:52:49.213", "frame_end":null}
{"microbatch":"iotlog", "target_schema":"public", "target_table":"iot_data",
"source_name":"iot_data", "source_cluster":"kafka_iot", "source_partition":1,
"start_offset":1604, "end_offset":1653, "end_reason":"END_OF_STREAM",
"end_reason_message":null, "partition_bytes":4387, "partition_messages":50,
"timeslice":"00:00:09.842000", "batch_start":"2018-11-06 12:52:49.387567",
"batch_end":"2018-11-06 12:52:59.400219", "source_duration":"00:00:00.220329",
"consecutive_error_count":null, "transaction_id":45035996274537015,
"frame_start":"2018-11-06 12:52:49.213", "frame_end":null}
{"microbatch":"iotlog", "target_schema":"public", "target_table":"iot_data",
"source_name":"iot_data", "source_cluster":"kafka_iot", "source_partition":2,
"start_offset":1603, "end_offset":1652, "end_reason":"END_OF_STREAM",
"end_reason_message":null, "partition_bytes":4383, "partition_messages":50,
"timeslice":"00:00:09.842000", "batch_start":"2018-11-06 12:52:49.387567",
"batch_end":"2018-11-06 12:52:59.400219", "source_duration":"00:00:00.318997",
"consecutive_error_count":null, "transaction_id":45035996274537015,
"frame_start":"2018-11-06 12:52:49.213", "frame_end":null}
{"microbatch":"iotlog", "target_schema":"public", "target_table":"iot_data",
"source_name":"iot_data", "source_cluster":"kafka_iot", "source_partition":3,
"start_offset":1604, "end_offset":1653, "end_reason":"END_OF_STREAM",
"end_reason_message":null, "partition_bytes":4375, "partition_messages":50,
"timeslice":"00:00:09.842000", "batch_start":"2018-11-06 12:52:49.387567",
"batch_end":"2018-11-06 12:52:59.400219", "source_duration":"00:00:00.219543",
"consecutive_error_count":null, "transaction_id":45035996274537015,
"frame_start":"2018-11-06 12:52:49.213", "frame_end":null}
You can use the --partition
argument to get just the partitions you want:
$ /opt/vertica/packages/kafka/bin/vkconfig statistics --last 1 --partition 2 --conf iot.conf
{"microbatch":"iotlog", "target_schema":"public", "target_table":"iot_data",
"source_name":"iot_data", "source_cluster":"kafka_iot", "source_partition":2,
"start_offset":1603, "end_offset":1652, "end_reason":"END_OF_STREAM",
"end_reason_message":null, "partition_bytes":4383, "partition_messages":50,
"timeslice":"00:00:09.842000", "batch_start":"2018-11-06 12:52:49.387567",
"batch_end":"2018-11-06 12:52:59.400219", "source_duration":"00:00:00.318997",
"consecutive_error_count":null, "transaction_id":45035996274537015,
"frame_start":"2018-11-06 12:52:49.213", "frame_end":null}
If your scheduler reads from more than one source, the --last 1
option outputs the last microbatch from each source:
$ /opt/vertica/packages/kafka/bin/vkconfig statistics --last 1 --conf weblog.conf
{"microbatch":"weberrors", "target_schema":"public", "target_table":"web_errors",
"source_name":"web_errors", "source_cluster":"kafka_weblog",
"source_partition":0, "start_offset":10000, "end_offset":9999,
"end_reason":"END_OF_STREAM", "end_reason_message":null,
"partition_bytes":0, "partition_messages":0, "timeslice":"00:00:04.909000",
"batch_start":"2018-11-06 10:58:02.632624",
"batch_end":"2018-11-06 10:58:03.058663", "source_duration":"00:00:00.220618",
"consecutive_error_count":null, "transaction_id":45035996274523991,
"frame_start":"2018-11-06 10:58:02.394", "frame_end":null}
{"microbatch":"weblog", "target_schema":"public", "target_table":"web_hits",
"source_name":"web_hits", "source_cluster":"kafka_weblog", "source_partition":0,
"start_offset":80000, "end_offset":79999, "end_reason":"END_OF_STREAM",
"end_reason_message":null, "partition_bytes":0, "partition_messages":0,
"timeslice":"00:00:09.128000", "batch_start":"2018-11-06 10:58:03.322852",
"batch_end":"2018-11-06 10:58:03.63047", "source_duration":"00:00:00.226493",
"consecutive_error_count":null, "transaction_id":45035996274524004,
"frame_start":"2018-11-06 10:58:02.394", "frame_end":null}
You can use wildcards to enable partial matches. This example demonstrates getting the last microbatch for all microbatches whose names end with "log":
~$ /opt/vertica/packages/kafka/bin/vkconfig statistics --microbatch "%log" \
--last 1 --conf weblog.conf
{"microbatch":"weblog", "target_schema":"public", "target_table":"web_hits",
"source_name":"web_hits", "source_cluster":"kafka_weblog", "source_partition":0,
"start_offset":80000, "end_offset":79999, "end_reason":"END_OF_STREAM",
"end_reason_message":null, "partition_bytes":0, "partition_messages":0,
"timeslice":"00:00:04.874000", "batch_start":"2018-11-06 11:37:16.17198",
"batch_end":"2018-11-06 11:37:16.460844", "source_duration":"00:00:00.213129",
"consecutive_error_count":null, "transaction_id":45035996274529932,
"frame_start":"2018-11-06 11:37:15.877", "frame_end":null}
To get microbatches from a specific period of time, use the --from-timestamp
and --to-timestamp
arguments. This example gets the microbatches that read from partition #2 between 12:52:30 and 12:53:00 on 2018-11-06 for the scheduler defined in iot.conf.
$ /opt/vertica/packages/kafka/bin/vkconfig statistics --partition 1 \
--from-timestamp "2018-11-06 12:52:30" \
--to-timestamp "2018-11-06 12:53:00" --conf iot.conf
{"microbatch":"iotlog", "target_schema":"public", "target_table":"iot_data",
"source_name":"iot_data", "source_cluster":"kafka_iot", "source_partition":1,
"start_offset":1604, "end_offset":1653, "end_reason":"END_OF_STREAM",
"end_reason_message":null, "partition_bytes":4387, "partition_messages":50,
"timeslice":"00:00:09.842000", "batch_start":"2018-11-06 12:52:49.387567",
"batch_end":"2018-11-06 12:52:59.400219", "source_duration":"00:00:00.220329",
"consecutive_error_count":null, "transaction_id":45035996274537015,
"frame_start":"2018-11-06 12:52:49.213", "frame_end":null}
{"microbatch":"iotlog", "target_schema":"public", "target_table":"iot_data",
"source_name":"iot_data", "source_cluster":"kafka_iot", "source_partition":1,
"start_offset":1554, "end_offset":1603, "end_reason":"END_OF_STREAM",
"end_reason_message":null, "partition_bytes":4371, "partition_messages":50,
"timeslice":"00:00:09.788000", "batch_start":"2018-11-06 12:52:38.930428",
"batch_end":"2018-11-06 12:52:48.932604", "source_duration":"00:00:00.231709",
"consecutive_error_count":null, "transaction_id":45035996274536981,
"frame_start":"2018-11-06 12:52:38.685", "frame_end":null}
This example demonstrates using the --dump
argument to get the SQL statement vkconfig executed to retrieve the output from the previous example:
$ /opt/vertica/packages/kafka/bin/vkconfig statistics --dump --partition 1 \
--from-timestamp "2018-11-06 12:52:30" \
--to-timestamp "2018-11-06 12:53:00" --conf iot.conf
SELECT microbatch, target_schema, target_table, source_name, source_cluster,
source_partition, start_offset, end_offset, end_reason, end_reason_message,
partition_bytes, partition_messages, timeslice, batch_start, batch_end,
last_batch_duration AS source_duration, consecutive_error_count, transaction_id,
frame_start, frame_end FROM "iot_sched".stream_microbatch_history WHERE
(source_partition = '1') AND (frame_start >= '2018-11-06 12:52:30.0') AND
(frame_start < '2018-11-06 12:53:00.0') ORDER BY frame_start DESC, microbatch,
source_cluster, source_name, source_partition;