Vertica includes the vkconfig script that lets you configure your schedulers. This script contains multiple tools that set groups of options in the scheduler, as well as starting and shutting it down. You supply the tool you want to use as the first argument in your call to the vkconfig script.
The topics in this section explain each of the tools available in the vkconfig script as well as their options. You can use the options in the Common vkconfig script options topic with any of the utilities. Utility-specific options appear in their respective tables.
1 - Common vkconfig script options
These options are available across the different tools available in the vkconfig script.
These options are available across the different tools available in the vkconfig script.
--conffilename
A text file containing configuration options for the vkconfig script. See Configuration File Format below.
--config-schemaschema_name
The name of the scheduler's Vertica schema. This value is the same as the name of the scheduler. You use this name to identify the scheduler during configuration.
Default:
stream_config
--dbhosthost name
The host name or IP address of the Vertica node acting as the initiator node for the scheduler.
Prints out a help menu listing available options with a description.
--jdbc-optoption=value[&option2=value2...]
One or more options to add to the standard JDBC URL that vkconfig uses to connect to Vertica. Cannot be combined with --jdbc-url.
--jdbc-urlurl
A complete JDBC URL that vkconfig uses instead of standard JDBC URL string to connect to Vertica.
--passwordpassword
Password for the database user.
--ssl-ca-aliasalias_name
The alias of the root certificate authority in the truststore. When set, the scheduler loads only the certificate associated with the specified alias. When omitted, the scheduler loads all certificates into the truststore.
--ssl-key-aliasalias_name
The alias of the key and certificate pairs within the keystore. Must be set when Vertica uses SSL to connect to Kafka.
--ssl-key-passwordpassword
The password for the SSL key. Must be set when Vertica uses SSL to connect to Kafka.
Caution
Specifying this option on the command line can expose it to other users logged into the host. Always use a configuration file to set this option.
--usernameusername
The Vertica database user used to alter the configuration of the scheduler. This use must have create privileges on the scheduler's schema.
Default:
Current user
--version
Displays the version number of the scheduler.
Configuration file format
You can use a configuration file to store common parameters you use in your calls to the vkconfig utility. The configuration file is a text file containing one option setting per line in the format:
option=value
You can also include comments in the option file by prefixing them with a hash mark (#).
These examples show how you can use the shared utility options.
Display help for the scheduler utility:
$ vkconfig scheduler --help
This command configures a Scheduler, which can run and load data from configured
sources and clusters into Vertica tables. It provides options for changing the
'frame duration' (time given per set of batches to resolve), as well as the
dedicated Vertica resource pool the Scheduler will use while running.
Available Options:
PARAMETER #ARGS DESCRIPTION
conf 1 Allow the use of a properties file to associate
parameter keys and values. This file enables
command string reuse and cleaner command strings.
help 0 Outputs a help context for the given subutility.
version 0 Outputs the current Version of the scheduer.
skip-validation 0 [Depricated] Use --validation-type.
validation-type 1 Determine what happens when there are
configuration errors. Accepts: ERROR - errors
out, WARN - prints out a message and continues,
SKIP - skip running validations
dbhost 1 The Vertica database hostname that contains
metadata and configuration information. The
default value is 'localhost'.
dbport 1 The port at the hostname to connect to the
Vertica database. The default value is '5433'.
username 1 The user to connect to Vertica. The default
value is the current system user.
password 1 The password for the user connecting to Vertica.
The default value is empty.
jdbc-url 1 A JDBC URL that can override Vertica connection
parameters and provide additional JDBC options.
jdbc-opt 1 Options to add to the JDBC URL used to connect
to Vertica ('&'-separated key=value list).
Used with generated URL (i.e. not with
'--jdbc-url' set).
enable-ssl 1 Enable SSL between JDBC and Vertica and/or
Vertica and Kafka.
ssl-ca-alias 1 The alias of the root CA within the provided
truststore used when connecting between
Vertica and Kafka.
ssl-key-alias 1 The alias of the key and certificate pair
within the provided keystore used when
connecting between Vertica and Kafka.
ssl-key-password 1 The password for the key used when connecting
between Vertica and Kafka. Should be hidden
with file access (see --conf).
config-schema 1 The schema containing the configuration details
to be used, created or edited. This parameter
defines the scheduler. The default value is
'stream_config'.
create 0 Create a new instance of the supplied type.
read 0 Read an instance of the supplied type.
update 0 Update an instance of the supplied type.
delete 0 Delete an instance of the supplied type.
drop 0 Drops the specified configuration schema.
CAUTION: this command will completely delete
and remove all configuration and monitoring
data for the specified scheduler.
dump 0 Dump the config schema query string used to
answer this command in the output.
operator 1 Specifies a user designated as an operator for
the created configuration. Used with --create.
add-operator 1 Add a user designated as an operator for the
specified configuration. Used with --update.
remove-operator 1 Removes a user designated as an operator for
the specified configuration. Used with
--update.
upgrade 0 Upgrade the current scheduler configuration
schema to the current version of this
scheduler. WARNING: if upgrading between
EXCAVATOR and FRONTLOADER be aware that the
Scheduler is not backwards compatible. The
upgrade procedure will translate your kafka
model into the new stream model.
upgrade-to-schema 1 Used with upgrade: will upgrade the
configuration to a new given schema instead of
upgrading within the same schema.
fix-config 0 Attempts to fix the configuration (ex: dropped
tables) before doing any other updates. Used
with --update.
frame-duration 1 The duration of the Scheduler's frame, in
which every configured Microbatch runs. Default
is 300 seconds: '00:05:00'
resource-pool 1 The Vertica resource pool to run the Scheduler
on. Default is 'general'.
config-refresh 1 The interval of time between Scheduler
configuration refreshes. Default is 5 minutes:
'00:05'
new-source-policy 1 The policy for new Sources to be scheduled
during a frame. Options are: START, END, and
FAIR. Default is 'FAIR'.
pushback-policy 1
pushback-max-count 1
auto-sync 1 Automatically update configuration based on
metadata from the Kafka cluster
consumer-group-id 1 The Kafka consumer group id to report offsets
to.
eof-timeout-ms 1 [DEPRECATED] This option has no effect.
2 - Scheduler tool options
The vkconfig script's scheduler tool lets you configure schedulers that continuously loads data from Kafka into Vertica.
The vkconfig script's scheduler tool lets you configure schedulers that continuously loads data from Kafka into Vertica. Use the scheduler tool to create, update, or delete a scheduler, defined by config-schema. If you do not specify a scheduler, commands apply to the default stream_config scheduler.
The interval of time that the scheduler runs before synchronizing its settings and updating its cached metadata (such as changes made by using the --update option).
Default: 00:05:00
--consumer-group-idid_name
The name of the Kafka consumer group to which Vertica reports its progress consuming messages. Set this value to disable progress reports to a Kafka consumer group. For details, see Monitoring Vertica message consumption with consumer groups.
Default:vertica_database-name
--dump
When you use this option along with the --read option, vkconfig outputs the Vertica query it would use to retrieve the data, rather than outputting the data itself. This option is useful if you want to access the data from within Vertica without having to go through vkconfig. This option has no effect if not used with --read.
--eof-timeout-msnumber of milliseconds
If a COPY command does not receive any messages within the eof-timeout-ms interval, Vertica responds by ending that COPY statement.
Repairs the configuration and re-creates any missing tables. Valid only with the --update shared configuration option.
--frame-durationHH:MM:SS
The interval of time that all individual frames last with this scheduler. The scheduler must have enough time to run each microbatch (each of which execute a COPY statement). You can approximate the average available time per microbatch using the following equation:
This is just a rough estimate as there are many factors that impact the amount of time that each microbatch will be able to run.
The vkconfig utility warns you if the time allocated per microbatch is below 2 seconds. You usually should allocate more than two seconds per microbatch to allow the scheduler to load all of the data in the data stream.
Note
In versions of Vertica earlier than 10.0, the default frame duration was 10 seconds. In version 10.0, this default value was increased to 5 minutes in part to compensate for the removal of WOS. If you created your scheduler with the default frame duration in a version prior to 10.0, the frame duration is not updated to the new default value. In this case, consider adjusting the frame duration manually. See Choosing a frame duration for more information.
Default: 00:05:00
--message_max_bytesmax_message_size
Specifies the maximum size, in bytes, of a Kafka protocol batch message.
Default: 25165824
--new-source-policy{FAIR|START|END}
Determines how Vertica allocates resources to the newly added source, one of the following:
FAIR: Takes the average length of time from the previous batches and schedules itself appropriately.
START: All new sources start at the beginning of the frame. The batch receives the minimal amount of time to run.
END: All new sources start at the end of the frame. The batch receives the maximum amount of time to run.
Default: FAIR
--operatorusername
Allows the dbadmin to grant privileges to a previously created Vertica user or role.
This option gives the specified user all privileges on the scheduler instance and EXECUTE privileges on the libkafka library and all its UDxs.
Granting operator privileges gives the user the right to read data off any source in any cluster that can be reached from the Vertica node.
The dbadmin must grant the user separate permission for them to have write privileges on the target tables.
Requires the --create shared utility option. Use the --add-operator option to grant operate privileges after the scheduler has been created.
To revoke privileges, use the --remove-operator option.
--remove-operatoruser_name
Removes access to the scheduler from a Vertica user account. Requires the --update shared utility option.
--resource-poolpool_name
The resource pool to be used by all queries executed by this scheduler. You must create this pool in advance.
The scheduler can use only one-fourth of GENERAL pool's PLANNEDCONCURRENCY.
--upgrade
Upgrades the existing scheduler and configuration schema to the current Vertica version. The upgraded version of the scheduler is not backwards compatible with earlier versions. To upgrade a scheduler to an alternate schema, use the upgrade-to-schema parameter. See Updating schedulers after Vertica upgrades for more information.
--upgrade-to-schemaschema name
Copies the scheduler's schema to a new schema specified by schema name and then upgrades it to be compatible with the current version of Vertica. Vertica does not alter the old schema. Requires the --upgrade scheduler utility option.
--validation-type{ERROR|WARN|SKIP}
Renamed from --skip-validation, specifies the level of validation performed on the scheduler. Invalid SQL syntax and other errors can cause invalid microbatches. Vertica supports the following validation types:
ERROR: Cancel configuration or creation if validation fails.
WARN: Proceed with task if validation fails, but display a warning.
These examples show how you can use the scheduler utility options.
Give a user, Jim, privileges on the StreamConfig scheduler. Specify that you are making edits to the stream_config scheduler with the --config-schema option:
$ /opt/vertica/packages/kafka/bin/vkconfig scheduler --update --config-schema stream_config --add-operator Jim
Edit the default stream_config scheduler so that every microbatch waits for data for one second before ending:
Creates a new cluster. Cannot be used with --delete, --read, or --update.
--read
Outputs the settings of all clusters defined in the scheduler. This output is in JSON format. Cannot be used with --create, --delete, or --update.
You can limit the output to specific clusters by supplying one or more cluster names in the --cluster option. You an also limit the output to clusters that contain one or more specific hosts using the --hosts option. Use commas to separate multiple values.
You can use LIKE wildcards in these options. See LIKE for more information about using wildcards.
--update
Updates the settings of cluster_name. Cannot be used with --create, --delete, or --read.
--delete
Deletes the cluster cluster_name. Cannot be used with --create, --read, or --update.
--dump
When you use this option along with the --read option, vkconfig outputs the Vertica query it would use to retrieve the data, rather than outputting the data itself. This option is useful if you want to access the data from within Vertica without having to go through vkconfig. This option has no effect if not used with --read.
--clustercluster_name
A unique, case-insensitive name for the cluster to operate on. This option is required for --create, --update, and --delete.
--hostsb1:port[,b2:port...]
Identifies the broker hosts that you want to add, edit, or remove from a Kafka cluster. To identify multiple hosts, use a comma delimiter.
--kafka_conf 'kafka_configuration_setting'
A JSON-formatted object of option/value pairs to pass directly to the rdkafka library. This is the library Vertica uses to communicate with Kafka. You can use this parameter to directly set configuration options that are not available through the Vertica integration with Kafka. See Directly setting Kafka library options for details.
--kafka_conf_secret 'kafka_configuration_setting'
Conceals sensitive configuration data that you must pass directly to the rdkafka library, such as passwords. This parameter accepts settings in the same format as kafka_conf.
Values passed to this parameter are not logged or stored in system tables.
--new-clustercluster_name
The updated name for the cluster. Requires the --update shared utility option.
--validation-type {ERROR|WARN|SKIP}
Specifies the level of validation performed on a created or updated cluster:
ERROR - Cancel configuration or creation if vkconfig cannot validate that the cluster exists. This is the default setting.
WARN - Proceed with task if validation fails, but display a warning.
Creates a new source. Cannot be used with --delete, --read, or --update.
--read
Outputs the current setting of the sources defined in the scheduler. The output is in JSON format. Cannot be used with --create, --delete, or --update.
By default this option outputs all of the sources defined in the scheduler. You can limit the output by using the --cluster, --enabled, --partitions, and --source options. The output will only contain sources that match the values in these options. The --enabled option can only have a true or false value. The --source option is case-sensitive.
You can use LIKE wildcards in these options. See LIKE for more information about using wildcards.
--update
Updates the settings of source_name. Cannot be used with --create, --delete, or --read.
--delete
Deletes the source named source_name. Cannot be used with --create, --read, or --update.
--sourcesource_name
Identifies the source to create or alter in the scheduler's configuration. This option is case-sensitive. You can use any name you like for a new source. Most people use the name of the Kafka topic the scheduler loads its data from. This option is required for --create, --update, and --delete.
--clustercluster_name
Identifies the cluster containing the source that you want to create or edit. You must have already defined this cluster in the scheduler.
--dump
When you use this option along with the --read option, vkconfig outputs the Vertica query it would use to retrieve the data, rather than outputting the data itself. This option is useful if you want to access the data from within Vertica without having to go through vkconfig. This option has no effect if not used with --read.
--enabledTRUE|FALSE
When TRUE, the source is available for use.
--new-clustercluster_name
Changes the cluster this source belongs to.
All sources referencing the old cluster source now target this cluster.
Requires:--update and --source options
--new-sourcesource_name
Updates the name of an existing source to the name specified by this parameter.
Requires:--update shared utility option
--partitionscount
Sets the number of partitions in the source.
Default:
The number of partitions defined in the cluster.
Requires:--create and --source options
You must keep this consistent with the number of partitions in the Kafka topic.
Renamed from --num-partitions.
--validation-typERROR|WARN|SKIP}
Controls the validation performed on a created or updated source:
ERROR - Cancel configuration or creation if vkconfig cannot validate the source. This is the default setting.
WARN - Proceed with task if validation fails, but display a warning.
Adds a new target table for the scheduler. Cannot be used with --delete, --read, or --update.
--read
Outputs the targets defined in the scheduler. This output is in JSON format. Cannot be used with --create, --delete, or --update.
By default this option outputs all of the targets defined in the configuration schema. You can limit the output to specific targets by using the --target-schema and --target-table options. The vkconfig script only outputs targets that match the values you set in these options.
You can use LIKE wildcards in these options. See LIKE for more information about using wildcards.
--update
Updates the settings for the targeted table. Use with the --new-target-schema and --new-target-table options. Cannot be used with --create, --delete, or --read.
--delete
Removes the scheduler's association with the target table table. Cannot be used with --create, --read, or --update.
--target-tabletable
The existing Vertica table for the scheduler to target. This option is required for --create, --update, and --delete.
--target-schemaschema
The existing Vertica schema containing the target table. This option is required for --create, --update, and --delete.
--dump
When you use this option along with the --read option, vkconfig outputs the Vertica query it would use to retrieve the data, rather than outputting the data itself. This option is useful if you want to access the data from within Vertica without having to go through vkconfig. This option has no effect if not used with --read.
--new-target-schemaschema_name
Changes the schema containing the target table to a another existing schema.
Requires:--update option.
--new-target-tabletable_name
Changes the Vertica target table associated with this schema to a another existing table.
Requires:--update option.
--validation-type {ERROR|WARN|SKIP}
Controls validation performed on a created or updated target:
ERROR - Cancel configuration or creation if vkconfig cannot validate that the table exists. This is the default setting.
WARN - Creates or updates the target if validation fails, but display a warning.
SKIP - Perform no validation.
Renamed from --skip-validation.
Important
Avoid having columns with primary key restrictions in your target table. The scheduler stops loading data if it encounters a row that has a value which violates this restriction. If you must have a primary key restricted column, try to filter out any redundant values for that column in the streamed data before is it loaded by the scheduler.
Creates a new load spec. Cannot be used with --delete, --read, or --update.
--read
Outputs the current settings of the load specs defined in the scheduler. This output is in JSON format. Cannot be used with --create, --delete, or --update.
By default, this option outputs all load specs defined in the scheduler. You can limit the output by supplying a single value or a comma-separated list of values to these options:
--load-spec
--filters
--uds-kv-parameters
--parser
--message-max-bytes
--parser-parameters
The vkconfig script only outputs the configuration of load specs that match the values you supply.
You can use LIKE wildcards in these options. See LIKE for more information about using wildcards.
--update
Updates the settings of spec-name. Cannot be used with --create, --delete, or --read.
--delete
Deletes the load spec named spec-name. Cannot be used with --create, --read, or --update.
--load-spec spec-name
A unique name for copy load spec to operate on. This option is required for --create, --update, and --delete.
--dump
When you use this option along with the --read option, vkconfig outputs the Vertica query it would use to retrieve the data, rather than outputting the data itself. This option is useful if you want to access the data from within Vertica without having to go through vkconfig. This option has no effect if not used with --read.
--filters "filter-name"
A Vertica FILTER chain containing all of the UDFilters to use in the COPY statement. For more information on filters, refer to Parsing custom formats.
--message-max-bytes max-size
Specifies the maximum size, in bytes, of a Kafka protocol batch message.
Default: 25165824
--new-load-spec new-name
A new, unique name for an existing load spec. Requires the --update parameter.
--parser-parameters "key=value[,...]`"`
A list of parameters to provide to the parser specified in the --parser parameter. When you use a Vertica native parser, the scheduler passes these parameters to the COPY statement where they are in turn passed to the parser.
--parser parser-name
Identifies a Vertica UDParser to use with a specified target.This parser is used within the COPY statement that the scheduler runs to load data. If you are using a Vertica native parser, the values supplied to the --parser-parameters option are passed through to the COPY statement.
**Default:**KafkaParser
--uds-kv-parameters key=value[,...]
A comma separated list of key value pairs for the user-defined source.
--validation-type {ERROR|WARN|SKIP}
Specifies the validation performed on a created or updated load spec, to one of the following:
ERROR: Cancel configuration or creation if vkconfig cannot validate the load spec. This is the default setting.
WARN: Proceed with task if validation fails, but display a warning.
Creates a new microbatch. Cannot be used with --delete, --read, or --update.
--read
Outputs the current settings of all microbatches defined in the scheduler. This output is in JSON format. Cannot be used with --create, --delete, or --update.
You can limit the output to specific microbatches by using the --consumer-group-id, --enabled, --load-spec, --microbatch, --rejection-schema, --rejection-table, --target-schema, --target-table, and --target-columns options. The --enabled option only accepts a true or false value.
You can use LIKE wildcards in these options. See LIKE for more information about using wildcards.
--update
Updates the settings of microbatch_name. Cannot be used with --create, --delete, or --read.
--delete
Deletes the microbatch named microbatch_name. Cannot be used with --create, --read, or --update.
--microbatch microbatch_name
A unique, case insensitive name for the microbatch. This option is required for --create, --update, and --delete.
--add-source-cluster cluster_name
The name of a cluster to assign to the microbatch you specify with the --microbatch option. You can use this parameter once per command. You can also use it with --update to add sources to a microbatch. You can only add sources from the same cluster to a single microbatch. Requires --add-source.
--add-source source_name
The name of a source to assign to this microbatch. You can use this parameter once per command. You can also use it with --update to add sources to a microbatch. Requires --add-source-cluster.
--cluster cluster_name
The name of the cluster to which the --offset option applies. Only required if the microbatch defines more than one cluster or the --source parameter is supplied. Requires the --offset option.
--consumer-group-id id_name
The name of the Kafka consumer group to which Vertica reports its progress consuming messages. Set this value to disable progress reports to a Kafka consumer group. For details, see Monitoring Vertica message consumption with consumer groups.
Default:vertica_database-name
--dump
When you use this option along with the --read option, vkconfig outputs the Vertica query it would use to retrieve the data, rather than outputting the data itself. This option is useful if you want to access the data from within Vertica without having to go through vkconfig. This option has no effect if not used with --read.
--enabled TRUE|FALSE
When TRUE, allows the microbatch to execute.
--load-spec loadspec_name
The load spec to use while processing this microbatch.
--max-parallelism max_num_loads
The maximum number of simultaneous COPY statements created for the microbatch. The scheduler dynamically splits a single microbatch with multiple partitions into max_num_loads COPY statements with fewer partitions.
The updated name for the microbatch. Requires the --update option.
--offset partition_offset[,...]
The offset of the message in the source where the microbatch starts its load. If you use this parameter, you must supply an offset value for each partition in the source or each partition you list in the --partition option.
You can use this option to skip some messages in the source or reload previously read messages.
You cannot set an offset for a microbatch while the scheduler is running. If you attempt to do so, the vkconfig utility returns an error. Use the shutdown utility to shut the scheduler down before setting an offset for a microbatch.
--partition partition[,...]
One or more partitions to which the offsets given in the --offset option apply. If you supply this option, then the offset values given in the --offset option applies to the partitions you specify. Requires the --offset option.
--rejection-schema schema_name
The existing Vertica schema that contains a table for storing rejected messages.
--rejection-table table_name
The existing Vertica table that stores rejected messages.
--remove-source-cluster cluster_name
The name of a cluster to remove from this microbatch. You can use this parameter once per command. Requires --remove-source.
--remove-source source_name
The name of a source to remove from this microbatch. You can use this parameter once per command. You can also use it with --update to remove multiple sources from a microbatch. Requires --remove-source-cluster.
--source source_name
The name of the source to which the offset in the --offset option applies. Required when the microbatch defines more than one source or the --cluster parameter is given. Requires the --offset option.
--target-columns column_expression
A column expression for the target table, where column_expression can be a comma-delimited list of columns or a complete expression.
See the COPY statement Parameters for a description of column expressions.
--target-schema schema_name
The existing Vertica target schema associated with this microbatch.
--target-table table_name
The name of a Vertica table corresponding to the target. This table must belong to the target schema.
--validation-type {ERROR|WARN|SKIP}
Controls the validation performed on a created or updated microbatch:
ERROR - Cancel configuration or creation if vkconfig cannot validate the microbatch. This is the default setting.
WARN - Proceed with task if validation fails, but display a warning.
The start_offset portion of the stream parameter lets you start loading messages from a specific point in the topic's partition. It also accepts one of two special offset values:
-2 tells the scheduler to start loading at the earliest available message in the topic's partition. This value is useful when you want to load as many messages as you can from the Kafka topic's partition.
-3 tells the scheduler to start loading from the consumer group's saved offset. If the consumer group does not have a saved offset, it starts loading from the earliest available message in the topic partition. See Monitoring Vertica message consumption with consumer groups for more information.
Examples
This example shows how you can create the microbatch, mbatch1. This microbatch identifies the schema, target table, load spec, and source for the microbatch:
Use the vkconfig script's launch tool to assign a name to a scheduler instance.
Use the vkconfig script's launch tool to assign a name to a scheduler instance.
Syntax
vkconfig launch [options...]
--enable-ssl{true|false}
(Optional) Enables SSL authentication
between Kafka and Vertica . For more information, refer to TLS/SSL encryption with Kafka.
--ssl-ca-aliasalias
The user-defined alias of the root certifying authority you are using to authenticate communication between Vertica and Kafka. This parameter is used only when SSL is enabled.
--ssl-key-aliasalias
The user-defined alias of the key/certificate pair you are using to authenticate communication between Vertica and Kafka. This parameter is used only when SSL is enabled.
--ssl-key-passwordpassword
The password used to create your SSL key. This parameter is used only when SSL is enabled.
--instance-namename
(Optional) Allows you to name the process running the scheduler. You can use this command when viewing the scheduler_history table, to find which instance is currently running.
--refresh-intervalhours
(Optional) The time interval at which the connection between Vertica and Kafka is refreshed (24 hours by default).
--kafka_conf 'kafka_configuration_setting'
A JSON-formatted object of option/value pairs to pass directly to the rdkafka library. This is the library Vertica uses to communicate with Kafka. You can use this parameter to directly set configuration options that are not available through the Vertica integration with Kafka. See Directly setting Kafka library options for details.
--kafka_conf_secret 'kafka_configuration_setting'
Conceals sensitive configuration data that you must pass directly to the rdkafka library, such as passwords. This parameter accepts settings in the same format as kafka_conf.
Values passed to this parameter are not logged or stored in system tables.
Use the vkconfig script's shutdown tool to terminate one or all Vertica schedulers running on a host.
Use the vkconfig script's shutdown tool to terminate one or all Vertica schedulers running on a host. Always run this command before restarting a scheduler to ensure the scheduler has shutdown correctly.
Use the --conf or --config-schema option to specify a scheduler to shut down. The following command terminates the scheduler that was launched with the same --conf myscheduler.conf option:
The statistics tool lets you access the history of microbatches that your scheduler has run.
The statistics tool lets you access the history of microbatches that your scheduler has run. This tool outputs the log of the microbatches in JSON format to the standard output. You can use its options to filter the list of microbatches to get just the microbatches that interest you.
Note
The statistics tool can sometimes produce confusing output if you have altered the scheduler configuration over time. For example, suppose you have microbatch-a target a table. Later, you change the scheduler's configuration so that microbatch-b targets the table. Afterwards, you run the statistics tool and filter the microbatch log based on target table. Then the log output will show entries from both microbatch-a and microbatch-b.
Syntax
vkconfig statistics [options]
--cluster "cluster"[,"cluster2"...]
Only return microbatches that retrieved data from a cluster whose name matches one in the list you supply.
--dump
Instead of returning microbatch data, return the SQL query that vkconfig would execute to extract the data from the scheduler tables. You can use this option if you want use a Vertica client application to get the microbatch log instead of using vkconfig's JSON output.
--from-timestamp "timestamp"
Only return microbatches that began after timestamp. The timestamp value is in yyyy-[m]m-[d]dhh:mm:ss format.
Cannot be used in conjunction with --last.
--last number
Returns the number most recent microbatches that meet all other filters. Cannot be used in conjunction with --from-timestamp or --to-timestamp.
--microbatch "name"[,"name2"...]
Only return microbatches whose name matches one of the names in the comma-separated list.
--partition partition#[,partition#2...]
Only return microbatches that accessed data from the topic partition that matches ones of the values in the partition list.
--source "source"[,"source2"...]
Only return microbatches that accessed data from a source whose name matches one of the names in the list you supply to this argument.
--target-schema "schema"[,"schema2"...]
Only return microbatches that wrote data to the Vertica schemas whose name matches one of the names in the target schema list argument.
--target-table "table"[,"table2"...]
Only return microbatches that wrote data to Vertica tables whose name match one of the names in the target schema list argument.
--to-timestamp "timestamp"
Only return microbatches that began before timestamp. The timestamp value is in yyyy-[m]m-[d]dhh:mm:ss format.
You can use LIKE wildcards in the values you supply to the --cluster, --microbatch, --source, --target-schema, and --target-table arguments. This feature lets you match partial strings in the microbatch data. See LIKE for more information about using wildcards.
The string comparisons for the --cluster, --microbatch, --source, --target-schema, and --target-table arguments are case-insensitive.
The date and time values you supply to the --from-timestamp and --to-timestamp arguments use the java.sql.timestamp format for parsing the value. This format's parsing can accept values that you may consider invalid and would expect it to reject. For example, if you supply a timestamp of 01-01-2018 24:99:99, the Java timestamp parser silently converts the date to 2018-01-02 01:40:39 instead of returning an error.
Examples
This example gets the last microbatch that the scheduler defined in the weblog.conf file ran:
You can use wildcards to enable partial matches. This example demonstrates getting the last microbatch for all microbatches whose names end with "log":
To get microbatches from a specific period of time, use the --from-timestamp and --to-timestamp arguments. This example gets the microbatches that read from partition #2 between 12:52:30 and 12:53:00 on 2018-11-06 for the scheduler defined in iot.conf.
This example demonstrates using the --dump argument to get the SQL statement vkconfig executed to retrieve the output from the previous example:
$ /opt/vertica/packages/kafka/bin/vkconfig statistics --dump --partition 1 \
--from-timestamp "2018-11-06 12:52:30" \
--to-timestamp "2018-11-06 12:53:00" --conf iot.conf
SELECT microbatch, target_schema, target_table, source_name, source_cluster,
source_partition, start_offset, end_offset, end_reason, end_reason_message,
partition_bytes, partition_messages, timeslice, batch_start, batch_end,
last_batch_duration AS source_duration, consecutive_error_count, transaction_id,
frame_start, frame_end FROM "iot_sched".stream_microbatch_history WHERE
(source_partition = '1') AND (frame_start >= '2018-11-06 12:52:30.0') AND
(frame_start < '2018-11-06 12:53:00.0') ORDER BY frame_start DESC, microbatch,
source_cluster, source_name, source_partition;
11 - Sync tool options
The sync utility immediately updates all source definitions by querying the Kafka cluster's brokers defined by the source.
The sync utility immediately updates all source definitions by querying the Kafka cluster's brokers defined by the source. By default, it updates all of the sources defined in the target schema. To update just specific sources, use the --source and --cluster options to specify which sources to update.
Syntax
vkconfig sync [options...]
--sourcesource_name
The name of the source sync. This source must already exist in the target schema.
--clustercluster_name
Identifies the cluster containing the source that you want to sync. You must have already defined this cluster in the scheduler.
--kafka_conf 'kafka_configuration_setting'
A JSON-formatted object of option/value pairs to pass directly to the rdkafka library. This is the library Vertica uses to communicate with Kafka. You can use this parameter to directly set configuration options that are not available through the Vertica integration with Kafka. See Directly setting Kafka library options for details.
--kafka_conf_secret 'kafka_configuration_setting'
Conceals sensitive configuration data that you must pass directly to the rdkafka library, such as passwords. This parameter accepts settings in the same format as kafka_conf.
Values passed to this parameter are not logged or stored in system tables.