This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

vkconfig script options

Vertica includes the vkconfig script that lets you configure your schedulers.

1: Common vkconfig script options
2: Scheduler tool options
3: Cluster tool options
4: Source tool options
5: Target tool options
6: Load spec tool options
7: Microbatch tool options
8: Launch tool options
9: Shutdown tool options
10: Statistics tool options
11: Sync tool options

Vertica includes the vkconfig script that lets you configure your schedulers. This script contains multiple tools that set groups of options in the scheduler, as well as starting and shutting it down. You supply the tool you want to use as the first argument in your call to the vkconfig script.

The topics in this section explain each of the tools available in the vkconfig script as well as their options. You can use the options in the Common vkconfig script options topic with any of the utilities. Utility-specific options appear in their respective tables.

1 - Common vkconfig script options

These options are available across the different tools available in the vkconfig script.

--conf filename

A text file containing configuration options for the vkconfig script. See Configuration File Format below.

--config-schema schema_name

The name of the scheduler's Vertica schema. This value is the same as the name of the scheduler. You use this name to identify the scheduler during configuration.

Default:

stream_config

--dbhost host name

The host name or IP address of the Vertica node acting as the initiator node for the scheduler.

Default:

localhost

--dbport port_number

The port to use to connect to a Vertica database.

Default:

5433

--enable-ssl

Enables the vkconfig script to use SSL to connect to Vertica or between Vertica and Kafka. See Configuring your scheduler for TLS connections for more information.

--help

Prints out a help menu listing available options with a description.

--jdbc-opt option=value[&option2=value2...]

One or more options to add to the standard JDBC URL that vkconfig uses to connect to Vertica. Cannot be combined with --jdbc-url.

--jdbc-url url

A complete JDBC URL that vkconfig uses instead of standard JDBC URL string to connect to Vertica.

--password password

Password for the database user.

--ssl-ca-alias alias_name

The alias of the root certificate authority in the truststore. When set, the scheduler loads only the certificate associated with the specified alias. When omitted, the scheduler loads all certificates into the truststore.

--ssl-key-alias alias_name

The alias of the key and certificate pairs within the keystore. Must be set when Vertica uses SSL to connect to Kafka.

--ssl-key-password password

The password for the SSL key. Must be set when Vertica uses SSL to connect to Kafka.

Caution

Specifying this option on the command line can expose it to other users logged into the host. Always use a configuration file to set this option.

--username username

The Vertica database user used to alter the configuration of the scheduler. This use must have create privileges on the scheduler's schema.

Default:

Current user

--version

Displays the version number of the scheduler.

Configuration file format

You can use a configuration file to store common parameters you use in your calls to the vkconfig utility. The configuration file is a text file containing one option setting per line in the format:

option=value

You can also include comments in the option file by prefixing them with a hash mark (#).

#config.properties:
username=myuser
password=mypassword
dbhost=localhost
dbport=5433

You tell vkconfig to use the configuration file using the --conf option:

$ /opt/vertica/packages/kafka/bin/vkconfig source --update --conf config.properties

You can override any stored parameter from the command line:

$ /opt/vertica/packages/kafka/bin/vkconfig source --update --conf config.properties --dbhost otherVerticaHost

Examples

These examples show how you can use the shared utility options.

Display help for the scheduler utility:

$ vkconfig scheduler --help
This command configures a Scheduler, which can run and load data from configured
sources and clusters into Vertica tables. It provides options for changing the
'frame duration' (time given per set of batches to resolve), as well as the
dedicated Vertica resource pool the Scheduler will use while running.

Available Options:
PARAMETER               #ARGS    DESCRIPTION
conf                    1        Allow the use of a properties file to associate
                                 parameter keys and values. This file enables
                                 command string reuse and cleaner command strings.
help                    0        Outputs a help context for the given subutility.
version                 0        Outputs the current Version of the scheduer.
skip-validation         0        [Depricated] Use --validation-type.
validation-type         1        Determine what happens when there are
                                 configuration errors. Accepts: ERROR - errors
                                 out, WARN - prints out a message and continues,
                                 SKIP - skip running validations
dbhost                  1        The Vertica database hostname that contains
                                 metadata and configuration information. The
                                 default value is 'localhost'.
dbport                  1        The port at the hostname to connect to the
                                 Vertica database. The default value is '5433'.
username                1        The user to connect to Vertica. The default
                                 value is the current system user.
password                1        The password for the user connecting to Vertica.
                                 The default value is empty.
jdbc-url                1        A JDBC URL that can override Vertica connection
                                 parameters and provide additional JDBC options.
jdbc-opt                1        Options to add to the JDBC URL used to connect
                                 to Vertica ('&'-separated key=value list).
                                 Used with generated URL (i.e. not with
                                 '--jdbc-url' set).
enable-ssl              1        Enable SSL between JDBC and Vertica and/or
                                 Vertica and Kafka.
ssl-ca-alias            1        The alias of the root CA within the provided
                                 truststore used when connecting between
                                 Vertica and Kafka.
ssl-key-alias           1        The alias of the key and certificate pair
                                 within the provided keystore used when
                                 connecting between Vertica and Kafka.
ssl-key-password        1        The password for the key used when connecting
                                 between Vertica and Kafka. Should be hidden
                                 with file access (see --conf).
config-schema           1        The schema containing the configuration details
                                 to be used, created or edited. This parameter
                                 defines the scheduler. The default value is
                                 'stream_config'.
create                  0        Create a new instance of the supplied type.
read                    0        Read an instance of the supplied type.
update                  0        Update an instance of the supplied type.
delete                  0        Delete an instance of the supplied type.
drop                    0        Drops the specified configuration schema.
                                 CAUTION: this command will completely delete
                                 and remove all configuration and monitoring
                                 data for the specified scheduler.
dump                    0        Dump the config schema query string used to
                                 answer this command in the output.
operator                1        Specifies a user designated as an operator for
                                 the created configuration. Used with --create.
add-operator            1        Add a user designated as an operator for the
                                 specified configuration. Used with --update.
remove-operator         1        Removes a user designated as an operator for
                                 the specified configuration. Used with
                                 --update.
upgrade                 0        Upgrade the current scheduler configuration
                                 schema to the current version of this
                                 scheduler. WARNING: if upgrading between
                                 EXCAVATOR and FRONTLOADER be aware that the
                                 Scheduler is not backwards compatible. The
                                 upgrade procedure will translate your kafka
                                 model into the new stream model.
upgrade-to-schema       1        Used with upgrade: will upgrade the
                                 configuration to a new given schema instead of
                                 upgrading within the same schema.
fix-config              0        Attempts to fix the configuration (ex: dropped
                                 tables) before doing any other updates. Used
                                 with --update.
frame-duration          1        The duration of the Scheduler's frame, in
                                 which every configured Microbatch runs. Default
                                 is 300 seconds: '00:05:00'
resource-pool           1        The Vertica resource pool to run the Scheduler
                                 on. Default is 'general'.
config-refresh          1        The interval of time between Scheduler
                                 configuration refreshes. Default is 5 minutes:
                                 '00:05'
new-source-policy       1        The policy for new Sources to be scheduled
                                 during a frame. Options are: START, END, and
                                 FAIR. Default is 'FAIR'.
pushback-policy         1
pushback-max-count      1
auto-sync               1        Automatically update configuration based on
                                 metadata from the Kafka cluster
consumer-group-id       1        The Kafka consumer group id to report offsets
                                 to.
eof-timeout-ms          1        [DEPRECATED] This option has no effect.

2 - Scheduler tool options

The vkconfig script's scheduler tool lets you configure schedulers that continuously loads data from Kafka into Vertica.

The vkconfig script's scheduler tool lets you configure schedulers that continuously loads data from Kafka into Vertica. Use the scheduler tool to create, update, or delete a scheduler, defined by config-schema. If you do not specify a scheduler, commands apply to the default stream_config scheduler.

Syntax

vkconfig scheduler {--create | --read | --update | --drop} other_options...

--create

Creates a new scheduler. Cannot be used with --delete, --read, or --update.

--read

Outputs the current setting of the scheduler in JSON format. Cannot be used with --create, --delete, or --update.

--update

Updates the settings of the scheduler. Cannot be used with --create, --delete, or --read.

--drop

Drops the scheduler's schema. Dropping its schema deletes the scheduler. After you drop the scheduler's schema, you cannot recover it.

--add-operator user_name

Grants a Vertica user account or role access to use and alter the scheduler. Requires the --update shared utility option.

--auto-sync {TRUE|FALSE}

When TRUE, Vertica automatically synchronizes scheduler source information at the interval specified in --config-refresh.

For details about what the scheduler synchronizes at each interval, see the "Validating Schedulers" and "Synchronizing Schedulers" sections in Automatically consume data from Kafka with the scheduler.

Default: TRUE

--config-refresh HH:MM:SS

The interval of time that the scheduler runs before synchronizing its settings and updating its cached metadata (such as changes made by using the --update option).

Default: 00:05:00

--consumer-group-id id_name

The name of the Kafka consumer group to which Vertica reports its progress consuming messages. Set this value to disable progress reports to a Kafka consumer group. For details, see Monitoring Vertica message consumption with consumer groups.

Default: vertica_database-name

--dump

When you use this option along with the --read option, vkconfig outputs the Vertica query it would use to retrieve the data, rather than outputting the data itself. This option is useful if you want to access the data from within Vertica without having to go through vkconfig. This option has no effect if not used with --read.

--eof-timeout-ms number of milliseconds

If a COPY command does not receive any messages within the eof-timeout-ms interval, Vertica responds by ending that COPY statement.

See Manually consume data from Kafka for more information.

Default: 1 second

--fix-config

Repairs the configuration and re-creates any missing tables. Valid only with the --update shared configuration option.

--frame-duration HH:MM:SS

The interval of time that all individual frames last with this scheduler. The scheduler must have enough time to run each microbatch (each of which execute a COPY statement). You can approximate the average available time per microbatch using the following equation:

TimePerMicrobatch=(FrameDuration*Parallelism)/Microbatches

This is just a rough estimate as there are many factors that impact the amount of time that each microbatch will be able to run.

The vkconfig utility warns you if the time allocated per microbatch is below 2 seconds. You usually should allocate more than two seconds per microbatch to allow the scheduler to load all of the data in the data stream.

Note

In versions of Vertica earlier than 10.0, the default frame duration was 10 seconds. In version 10.0, this default value was increased to 5 minutes in part to compensate for the removal of WOS. If you created your scheduler with the default frame duration in a version prior to 10.0, the frame duration is not updated to the new default value. In this case, consider adjusting the frame duration manually. See Choosing a frame duration for more information.

Default: 00:05:00

--message_max_bytes max_message_size

Specifies the maximum size, in bytes, of a Kafka protocol batch message.

Default: 25165824

--new-source-policy {FAIR|START|END}

Determines how Vertica allocates resources to the newly added source, one of the following:

FAIR: Takes the average length of time from the previous batches and schedules itself appropriately.
START: All new sources start at the beginning of the frame. The batch receives the minimal amount of time to run.
END: All new sources start at the end of the frame. The batch receives the maximum amount of time to run.

Default: FAIR

--operator username

Allows the dbadmin to grant privileges to a previously created Vertica user or role.

This option gives the specified user all privileges on the scheduler instance and EXECUTE privileges on the libkafka library and all its UDxs.

Granting operator privileges gives the user the right to read data off any source in any cluster that can be reached from the Vertica node.

The dbadmin must grant the user separate permission for them to have write privileges on the target tables.

Requires the --create shared utility option. Use the --add-operator option to grant operate privileges after the scheduler has been created.

To revoke privileges, use the --remove-operator option.

--remove-operator user_name

Removes access to the scheduler from a Vertica user account. Requires the --update shared utility option.

--resource-pool pool_name

The resource pool to be used by all queries executed by this scheduler. You must create this pool in advance.

Default: GENERAL pool

Note

The scheduler can use only one-fourth of GENERAL pool's PLANNEDCONCURRENCY.

--upgrade

Upgrades the existing scheduler and configuration schema to the current Vertica version. The upgraded version of the scheduler is not backwards compatible with earlier versions. To upgrade a scheduler to an alternate schema, use the upgrade-to-schema parameter. See Updating schedulers after Vertica upgrades for more information.

--upgrade-to-schema schema name

Copies the scheduler's schema to a new schema specified by schema name and then upgrades it to be compatible with the current version of Vertica. Vertica does not alter the old schema. Requires the --upgrade scheduler utility option.

--validation-type {ERROR|WARN|SKIP}

Renamed from --skip-validation, specifies the level of validation performed on the scheduler. Invalid SQL syntax and other errors can cause invalid microbatches. Vertica supports the following validation types:

ERROR: Cancel configuration or creation if validation fails.
WARN: Proceed with task if validation fails, but display a warning.
SKIP: Perform no validation.

For more information on validation, refer to Automatically consume data from Kafka with the scheduler.

Default: ERROR

See Common vkconfig script options for options that are available in all of the vkconfig tools.

Examples

These examples show how you can use the scheduler utility options.

Give a user, Jim, privileges on the StreamConfig scheduler. Specify that you are making edits to the stream_config scheduler with the --config-schema option:

$ /opt/vertica/packages/kafka/bin/vkconfig scheduler --update --config-schema stream_config --add-operator Jim

Edit the default stream_config scheduler so that every microbatch waits for data for one second before ending:

$ /opt/vertica/packages/kafka/bin/vkconfig scheduler --update --eof-timeout-ms 1000

Upgrade the scheduler named iot_scheduler_8.1 to a new scheduler named iot_scheduler_9.0 that is compatible with the current version of Vertica:

$ /opt/vertica/packages/kafka/bin/vkconfig scheduler --upgrade --config-schema iot_scheduler_8.1 \
                                           --upgrade-to-schema iot_scheduler_9.0

Drop the schema scheduler219a:

$ /opt/vertica/packages/kafka/bin/vkconfig scheduler --drop --config-schema  scheduler219a --username dbadmin

Read the current setting of the options you can set using the scheduler tool for the scheduler defined in weblogs.conf.

$ vkconfig scheduler --read --conf weblog.conf
{"version":"v9.2.0", "frame_duration":"00:00:10", "resource_pool":"weblog_pool",
"config_refresh":"00:05:00", "new_source_policy":"FAIR",
"pushback_policy":"LINEAR", "pushback_max_count":5, "auto_sync":true,
"consumer_group_id":null}

3 - Cluster tool options

The vkconfig script's cluster tool lets you define the streaming hosts your scheduler connects to.

Syntax

vkconfig cluster {--create | --read | --update | --delete} \ 
         [--cluster cluster_name] [other_options...]

--create

Creates a new cluster. Cannot be used with --delete, --read, or --update.

--read

Outputs the settings of all clusters defined in the scheduler. This output is in JSON format. Cannot be used with --create, --delete, or --update.

You can limit the output to specific clusters by supplying one or more cluster names in the --cluster option. You an also limit the output to clusters that contain one or more specific hosts using the --hosts option. Use commas to separate multiple values.

You can use LIKE wildcards in these options. See LIKE for more information about using wildcards.

--update

Updates the settings of cluster_name. Cannot be used with --create, --delete, or --read.

--delete

Deletes the cluster cluster_name. Cannot be used with --create, --read, or --update.

--dump

--cluster cluster_name

A unique, case-insensitive name for the cluster to operate on. This option is required for --create, --update, and --delete.

--hosts b1:port[,b2:port...]

Identifies the broker hosts that you want to add, edit, or remove from a Kafka cluster. To identify multiple hosts, use a comma delimiter.

--kafka_conf 'kafka_configuration_setting'

A JSON-formatted object of option/value pairs to pass directly to the rdkafka library. This is the library Vertica uses to communicate with Kafka. You can use this parameter to directly set configuration options that are not available through the Vertica integration with Kafka. See Directly setting Kafka library options for details.

--kafka_conf_secret 'kafka_configuration_setting'

Conceals sensitive configuration data that you must pass directly to the rdkafka library, such as passwords. This parameter accepts settings in the same format as kafka_conf.

Values passed to this parameter are not logged or stored in system tables.

--new-cluster cluster_name

The updated name for the cluster. Requires the --update shared utility option.

--validation-type {ERROR|WARN|SKIP}

Specifies the level of validation performed on a created or updated cluster:

ERROR - Cancel configuration or creation if vkconfig cannot validate that the cluster exists. This is the default setting.
WARN - Proceed with task if validation fails, but display a warning.
SKIP - Perform no validation.

Renamed from --skip-validation.

See Common vkconfig script options for options that are available in all of the vkconfig tools.

Examples

This example shows how you can create the cluster, StreamCluster1, and assign two hosts:

$ /opt/vertica/packages/kafka/bin/vkconfig cluster --create --cluster StreamCluster1 \
                                           --hosts 10.10.10.10:9092,10.10.10.11:9092
                                           --conf myscheduler.config

This example shows how you can list all of the clusters associated with the scheduler defined in the weblogs.conf file:

$ vkconfig cluster --read --conf weblog.conf
{"cluster":"kafka_weblog",
"hosts":"kafka01.example.com:9092,kafka02.example.com:9092"}

4 - Source tool options

Use the vkconfig script's source tool to create, update, or delete a source.

Syntax

vkconfig source {--create | --read | --update | --delete} \
         --source source_name [other_options...]

--create

Creates a new source. Cannot be used with --delete, --read, or --update.

--read

Outputs the current setting of the sources defined in the scheduler. The output is in JSON format. Cannot be used with --create, --delete, or --update.

By default this option outputs all of the sources defined in the scheduler. You can limit the output by using the --cluster, --enabled, --partitions, and --source options. The output will only contain sources that match the values in these options. The --enabled option can only have a true or false value. The --source option is case-sensitive.

You can use LIKE wildcards in these options. See LIKE for more information about using wildcards.

--update

Updates the settings of source_name. Cannot be used with --create, --delete, or --read.

--delete

Deletes the source named source_name. Cannot be used with --create, --read, or --update.

--source source_name

Identifies the source to create or alter in the scheduler's configuration. This option is case-sensitive. You can use any name you like for a new source. Most people use the name of the Kafka topic the scheduler loads its data from. This option is required for --create, --update, and --delete.

--cluster cluster_name

Identifies the cluster containing the source that you want to create or edit. You must have already defined this cluster in the scheduler.

--dump

--enabled TRUE|FALSE

When TRUE, the source is available for use.

--new-cluster cluster_name

Changes the cluster this source belongs to.

All sources referencing the old cluster source now target this cluster.

Requires:--update and --source options

--new-source source_name

Updates the name of an existing source to the name specified by this parameter.

Requires: --update shared utility option

--partitions count

Sets the number of partitions in the source.

Default:

The number of partitions defined in the cluster.

Requires:--create and --source options

You must keep this consistent with the number of partitions in the Kafka topic.

Renamed from --num-partitions.

--validation-typERROR|WARN|SKIP}

Controls the validation performed on a created or updated source:

ERROR - Cancel configuration or creation if vkconfig cannot validate the source. This is the default setting.
WARN - Proceed with task if validation fails, but display a warning.
SKIP - Perform no validation.

Renamed from --skip-validation.

See Common vkconfig script options for options that are available in all of the vkconfig tools.

Examples

The following examples show how you can create or update SourceFeed.

Create the source SourceFeed and assign it to the cluster, StreamCluster1 in the scheduler defined by the myscheduler.conf config file:

$ /opt/vertica/packages/kafka/bin/vkconfig source --create --source SourceFeed \
                                           --cluster StreamCluster1 --partitions 3
                                           --conf myscheduler.conf

Update the existing source SourceFeed to use the existing cluster, StreamCluster2 in the scheduler defined by the myscheduler.conf config file:

$ /opt/vertica/packages/kafka/bin/vkconfig source --update --source SourceFeed \
                                           --new-cluster StreamCluster2
                                           --conf myscheduler.conf

The following example reads the sources defined in the scheduler defined by the weblogs.conf file.

$ vkconfig source --read --conf weblog.conf
{"source":"web_hits", "partitions":1, "src_enabled":true,
"cluster":"kafka_weblog",
"hosts":"kafka01.example.com:9092,kafka02.example.com:9092"}

5 - Target tool options

Use the target tool to configure a Vertica table to receive data from your streaming data application.

Syntax

vkconfig target {--create | --read | --update | --delete} \ 
                [--target-table table --table_schema schema] \
                [other_options...]

--create

Adds a new target table for the scheduler. Cannot be used with --delete, --read, or --update.

--read

Outputs the targets defined in the scheduler. This output is in JSON format. Cannot be used with --create, --delete, or --update.

By default this option outputs all of the targets defined in the configuration schema. You can limit the output to specific targets by using the --target-schema and --target-table options. The vkconfig script only outputs targets that match the values you set in these options.

You can use LIKE wildcards in these options. See LIKE for more information about using wildcards.

--update

Updates the settings for the targeted table. Use with the --new-target-schema and --new-target-table options. Cannot be used with --create, --delete, or --read.

--delete

Removes the scheduler's association with the target table table. Cannot be used with --create, --read, or --update.

--target-table table

The existing Vertica table for the scheduler to target. This option is required for --create, --update, and --delete.

--target-schema schema

The existing Vertica schema containing the target table. This option is required for --create, --update, and --delete.

--dump

--new-target-schema schema_name

Changes the schema containing the target table to a another existing schema.

Requires: --update option.

--new-target-table table_name

Changes the Vertica target table associated with this schema to a another existing table.

Requires: --update option.

--validation-type {ERROR|WARN|SKIP}

Controls validation performed on a created or updated target:

ERROR - Cancel configuration or creation if vkconfig cannot validate that the table exists. This is the default setting.
WARN - Creates or updates the target if validation fails, but display a warning.
SKIP - Perform no validation.

Renamed from --skip-validation.

Important

Avoid having columns with primary key restrictions in your target table. The scheduler stops loading data if it encounters a row that has a value which violates this restriction. If you must have a primary key restricted column, try to filter out any redundant values for that column in the streamed data before is it loaded by the scheduler.

See Common vkconfig script options for options that are available in all of the vkconfig tools.

Examples

This example shows how you can create a target for the scheduler defined in the myscheduler.conf configuration file from public.streamtarget table:

$ /opt/vertica/packages/kafka/bin/vkconfig target --create \
            --target-table streamtarget --conf myscheduler.conf

This example lists all of the targets in the scheduler defined in the weblogs.conf configuration file.

$ vkconfig target --read --conf weblog.conf
{"target_schema":"public", "target_table":"web_hits"}

6 - Load spec tool options

The vkconfig script's load spec tool lets you provide parameters for a COPY statement that loads streaming data.

Syntax

$ vkconfig load-spec {--create | --read | --update | --delete} \
           [--load-spec spec-name] [other-options...]

--create

Creates a new load spec. Cannot be used with --delete, --read, or --update.

--read

Outputs the current settings of the load specs defined in the scheduler. This output is in JSON format. Cannot be used with --create, --delete, or --update.

By default, this option outputs all load specs defined in the scheduler. You can limit the output by supplying a single value or a comma-separated list of values to these options:

--load-spec
--filters
--uds-kv-parameters
--parser
--message-max-bytes
--parser-parameters

The vkconfig script only outputs the configuration of load specs that match the values you supply.

You can use LIKE wildcards in these options. See LIKE for more information about using wildcards.

--update

Updates the settings of spec-name. Cannot be used with --create, --delete, or --read.

--delete

Deletes the load spec named spec-name. Cannot be used with --create, --read, or --update.

--load-spec spec-name

A unique name for copy load spec to operate on. This option is required for --create, --update, and --delete.

--dump

--filters "filter-name"

A Vertica FILTER chain containing all of the UDFilters to use in the COPY statement. For more information on filters, refer to Parsing custom formats.

--message-max-bytes max-size

Specifies the maximum size, in bytes, of a Kafka protocol batch message.

Default: 25165824

--new-load-spec new-name

A new, unique name for an existing load spec. Requires the --update parameter.

--parser-parameters "key=value[,...]`"`

A list of parameters to provide to the parser specified in the --parser parameter. When you use a Vertica native parser, the scheduler passes these parameters to the COPY statement where they are in turn passed to the parser.

--parser parser-name

Identifies a Vertica UDParser to use with a specified target.This parser is used within the COPY statement that the scheduler runs to load data. If you are using a Vertica native parser, the values supplied to the --parser-parameters option are passed through to the COPY statement.

**Default:**KafkaParser

--uds-kv-parameters key=value[,...]

A comma separated list of key value pairs for the user-defined source.

--validation-type {ERROR|WARN|SKIP}

Specifies the validation performed on a created or updated load spec, to one of the following:

ERROR: Cancel configuration or creation if vkconfig cannot validate the load spec. This is the default setting.
WARN: Proceed with task if validation fails, but display a warning.
SKIP: Perform no validation.

Renamed from --skip-validation.

See Common vkconfig script options for options that are available in all of the vkconfig tools.

Examples

These examples show how you can use the Load Spec utility options.

Create load spec Streamspec1:

$ /opt/vertica/packages/kafka/bin/vkconfig load-spec --create --load-spec Streamspec1 --conf myscheduler.conf

Rename load spec Streamspec1 to Streamspec2:

$ /opt/vertica/packages/kafka/bin/vkconfig load-spec --update --load-spec Streamspec1 \
                                                     --new-load-spec Streamspec2 \
                                                     --conf myscheduler.conf

Update load spec Filterspec to use the KafkaInsertLengths filter and a custom decryption filter:

$ /opt/vertica/packages/kafka/bin/vkconfig load-spec --update --load-spec Filterspec \
                                                     --filters "KafkaInsertLengths() DecryptFilter(parameter=Key)" \
                                                     --conf myscheduler.conf

Read the current settings for load spec streamspec1:

$ vkconfig load-spec --read --load-spec streamspec1 --conf weblog.conf
{"load_spec":"streamspec1", "filters":null, "parser":"KafkaParser",
"parser_parameters":null, "load_method":"TRICKLE", "message_max_bytes":null,
"uds_kv_parameters":null}

7 - Microbatch tool options

The vkconfig script's microbatch tool lets you configure a scheduler's microbatches.

Syntax

vkconfig microbatch {--create | --read | --update | --delete} \
         [--microbatch microbatch_name] [other_options...]

--create

Creates a new microbatch. Cannot be used with --delete, --read, or --update.

--read

Outputs the current settings of all microbatches defined in the scheduler. This output is in JSON format. Cannot be used with --create, --delete, or --update.

You can limit the output to specific microbatches by using the --consumer-group-id, --enabled, --load-spec, --microbatch, --rejection-schema, --rejection-table, --target-schema, --target-table, and --target-columns options. The --enabled option only accepts a true or false value.

You can use LIKE wildcards in these options. See LIKE for more information about using wildcards.

--update

Updates the settings of microbatch_name. Cannot be used with --create, --delete, or --read.

--delete

Deletes the microbatch named microbatch_name. Cannot be used with --create, --read, or --update.

--microbatch microbatch_name

A unique, case insensitive name for the microbatch. This option is required for --create, --update, and --delete.

--add-source-cluster cluster_name

The name of a cluster to assign to the microbatch you specify with the --microbatch option. You can use this parameter once per command. You can also use it with --update to add sources to a microbatch. You can only add sources from the same cluster to a single microbatch. Requires --add-source.

--add-source source_name

The name of a source to assign to this microbatch. You can use this parameter once per command. You can also use it with --update to add sources to a microbatch. Requires --add-source-cluster.

--cluster cluster_name

The name of the cluster to which the --offset option applies. Only required if the microbatch defines more than one cluster or the --source parameter is supplied. Requires the --offset option.

--consumer-group-id id_name

Default: vertica_database-name

--dump

--enabled TRUE|FALSE

When TRUE, allows the microbatch to execute.

--load-spec loadspec_name

The load spec to use while processing this microbatch.

--max-parallelism max_num_loads

The maximum number of simultaneous COPY statements created for the microbatch. The scheduler dynamically splits a single microbatch with multiple partitions into max_num_loads COPY statements with fewer partitions.

This option allows you to:

Control the transaction size.
Optimize your loads according to your scheduler's scheduler's resource pool settings, such as PLANNEDCONCURRENCY.

--new-microbatch updated_name

The updated name for the microbatch. Requires the --update option.

--offset partition_offset[,...]

The offset of the message in the source where the microbatch starts its load. If you use this parameter, you must supply an offset value for each partition in the source or each partition you list in the --partition option.

You can use this option to skip some messages in the source or reload previously read messages.

See Special Starting Offset Values below for more information.

Important

You cannot set an offset for a microbatch while the scheduler is running. If you attempt to do so, the vkconfig utility returns an error. Use the shutdown utility to shut the scheduler down before setting an offset for a microbatch.

--partition partition[,...]

One or more partitions to which the offsets given in the --offset option apply. If you supply this option, then the offset values given in the --offset option applies to the partitions you specify. Requires the --offset option.

--rejection-schema schema_name

The existing Vertica schema that contains a table for storing rejected messages.

--rejection-table table_name

The existing Vertica table that stores rejected messages.

--remove-source-cluster cluster_name

The name of a cluster to remove from this microbatch. You can use this parameter once per command. Requires --remove-source.

--remove-source source_name

The name of a source to remove from this microbatch. You can use this parameter once per command. You can also use it with --update to remove multiple sources from a microbatch. Requires --remove-source-cluster.

--source source_name

The name of the source to which the offset in the --offset option applies. Required when the microbatch defines more than one source or the --cluster parameter is given. Requires the --offset option.

--target-columns column_expression

A column expression for the target table, where column_expression can be a comma-delimited list of columns or a complete expression.

See the COPY statement Parameters for a description of column expressions.

--target-schema schema_name

The existing Vertica target schema associated with this microbatch.

--target-table table_name

The name of a Vertica table corresponding to the target. This table must belong to the target schema.

--validation-type {ERROR|WARN|SKIP}

Controls the validation performed on a created or updated microbatch:

ERROR - Cancel configuration or creation if vkconfig cannot validate the microbatch. This is the default setting.
WARN - Proceed with task if validation fails, but display a warning.
SKIP - Perform no validation.

Renamed from --skip-validation.

See Common vkconfig script options for options that are available in all of the vkconfig tools.

Special starting offset values

The start_offset portion of the stream parameter lets you start loading messages from a specific point in the topic's partition. It also accepts one of two special offset values:

-2 tells the scheduler to start loading at the earliest available message in the topic's partition. This value is useful when you want to load as many messages as you can from the Kafka topic's partition.
-3 tells the scheduler to start loading from the consumer group's saved offset. If the consumer group does not have a saved offset, it starts loading from the earliest available message in the topic partition. See Monitoring Vertica message consumption with consumer groups for more information.

Examples

This example shows how you can create the microbatch, mbatch1. This microbatch identifies the schema, target table, load spec, and source for the microbatch:

$ /opt/vertica/packages/kafka/bin/vkconfig microbatch --create --microbatch mbatch1 \
                                                    --target-schema public \
                                                    --target-table BatchTarget \
                                                    --load-spec Filterspec \
                                                    --add-source SourceFeed \
                                                    --add-source-cluster StreamCluster1 \
                                                    --conf myscheduler.conf

This example demonstrates listing the current settings for the microbatches in the scheduler defined in the weblog.conf configuration file.

$ vkconfig microbatch --read --conf weblog.conf
{"microbatch":"weblog", "target_columns":null, "rejection_schema":null,
"rejection_table":null, "enabled":true, "consumer_group_id":null,
"load_spec":"weblog_load", "filters":null, "parser":"KafkaJSONParser",
"parser_parameters":null, "load_method":"TRICKLE", "message_max_bytes":null,
"uds_kv_parameters":null, "target_schema":"public", "target_table":"web_hits",
"source":"web_hits", "partitions":1, "src_enabled":true, "cluster":"kafka_weblog",
"hosts":"kafka01.example.com:9092,kafka02.example.com:9092"}

8 - Launch tool options

Use the vkconfig script's launch tool to assign a name to a scheduler instance.

Syntax

vkconfig launch [options...]

--enable-ssl {true|false}

(Optional) Enables SSL authentication between Kafka and Vertica . For more information, refer to TLS/SSL encryption with Kafka.

--ssl-ca-alias alias

The user-defined alias of the root certifying authority you are using to authenticate communication between Vertica and Kafka. This parameter is used only when SSL is enabled.

--ssl-key-alias alias

The user-defined alias of the key/certificate pair you are using to authenticate communication between Vertica and Kafka. This parameter is used only when SSL is enabled.

--ssl-key-password password

The password used to create your SSL key. This parameter is used only when SSL is enabled.

--instance-name name

(Optional) Allows you to name the process running the scheduler. You can use this command when viewing the scheduler_history table, to find which instance is currently running.

--refresh-interval hours

(Optional) The time interval at which the connection between Vertica and Kafka is refreshed (24 hours by default).

--kafka_conf 'kafka_configuration_setting'

--kafka_conf_secret 'kafka_configuration_setting'

Conceals sensitive configuration data that you must pass directly to the rdkafka library, such as passwords. This parameter accepts settings in the same format as kafka_conf.

Values passed to this parameter are not logged or stored in system tables.

See Common vkconfig script options for options that are available in all of the vkconfig tools.

Examples

This example shows how you can launch the scheduler defined in the myscheduler.conf config file and give it the instance name PrimaryScheduler:

$ nohup /opt/vertica/packages/kafka/bin/vkconfig launch --instance-name PrimaryScheduler \
  --conf myscheduler.conf >/dev/null 2>&1 &

This example shows how you can launch an instance named SecureScheduler with SSL enabled:

$ nohup /opt/vertica/packages/kafka/bin/vkconfig launch --instance-name SecureScheduler --enable-SSL true \
                                                  --ssl-ca-alias authenticcert --ssl-key-alias ourkey \
                                                  --ssl-key-password secret \
                                                  --conf myscheduler.conf \
                                                  >/dev/null 2>&1 &

9 - Shutdown tool options

Use the vkconfig script's shutdown tool to terminate one or all Vertica schedulers running on a host.

Use the vkconfig script's shutdown tool to terminate one or all Vertica schedulers running on a host. Always run this command before restarting a scheduler to ensure the scheduler has shutdown correctly.

Syntax

vkconfig shutdown [options...]

See Common vkconfig script options for options that are available in all vkconfig tools.

Examples

To terminate all schedulers running on a host, use the shutdown command with no options:

$ /opt/vertica/packages/kafka/bin/vkconfig shutdown

Use the --conf or --config-schema option to specify a scheduler to shut down. The following command terminates the scheduler that was launched with the same --conf myscheduler.conf option:

$ /opt/vertica/packages/kafka/bin/vkconfig shutdown --conf myscheduler.conf

10 - Statistics tool options

The statistics tool lets you access the history of microbatches that your scheduler has run.

The statistics tool lets you access the history of microbatches that your scheduler has run. This tool outputs the log of the microbatches in JSON format to the standard output. You can use its options to filter the list of microbatches to get just the microbatches that interest you.

Note

The statistics tool can sometimes produce confusing output if you have altered the scheduler configuration over time. For example, suppose you have microbatch-a target a table. Later, you change the scheduler's configuration so that microbatch-b targets the table. Afterwards, you run the statistics tool and filter the microbatch log based on target table. Then the log output will show entries from both microbatch-a and microbatch-b.

Syntax

vkconfig statistics [options]

--cluster "cluster"[,"cluster2"...]: Only return microbatches that retrieved data from a cluster whose name matches one in the list you supply.
--dump: Instead of returning microbatch data, return the SQL query that vkconfig would execute to extract the data from the scheduler tables. You can use this option if you want use a Vertica client application to get the microbatch log instead of using vkconfig's JSON output.
--from-timestamp "timestamp": Only return microbatches that began after timestamp. The timestamp value is in yyyy-[m]m-[d]d hh:mm:ss format.
Cannot be used in conjunction with --last.
--last number: Returns the number most recent microbatches that meet all other filters. Cannot be used in conjunction with --from-timestamp or --to-timestamp.
--microbatch "name"[,"name2"...]: Only return microbatches whose name matches one of the names in the comma-separated list.
--partition partition#[,partition#2...]: Only return microbatches that accessed data from the topic partition that matches ones of the values in the partition list.
--source "source"[,"source2"...]: Only return microbatches that accessed data from a source whose name matches one of the names in the list you supply to this argument.
--target-schema "schema"[,"schema2"...]: Only return microbatches that wrote data to the Vertica schemas whose name matches one of the names in the target schema list argument.
--target-table "table"[,"table2"...]: Only return microbatches that wrote data to Vertica tables whose name match one of the names in the target schema list argument.
--to-timestamp "timestamp": Only return microbatches that began before timestamp. The timestamp value is in yyyy-[m]m-[d]d hh:mm:ss format.
Cannot be used in conjunction with --last.

See Common vkconfig script options for options that are available in all of the vkconfig tools.

Usage considerations

You can use LIKE wildcards in the values you supply to the --cluster, --microbatch, --source, --target-schema, and --target-table arguments. This feature lets you match partial strings in the microbatch data. See LIKE for more information about using wildcards.
The string comparisons for the --cluster, --microbatch, --source, --target-schema, and --target-table arguments are case-insensitive.
The date and time values you supply to the --from-timestamp and --to-timestamp arguments use the java.sql.timestamp format for parsing the value. This format's parsing can accept values that you may consider invalid and would expect it to reject. For example, if you supply a timestamp of 01-01-2018 24:99:99, the Java timestamp parser silently converts the date to 2018-01-02 01:40:39 instead of returning an error.

Examples

This example gets the last microbatch that the scheduler defined in the weblog.conf file ran:

$ /opt/vertica/packages/kafka/bin/vkconfig statistics --last 1 --conf weblog.conf
{"microbatch":"weblog", "target_schema":"public", "target_table":"web_hits",
"source_name":"web_hits", "source_cluster":"kafka_weblog", "source_partition":0,
"start_offset":80000, "end_offset":79999, "end_reason":"END_OF_STREAM",
"end_reason_message":null, "partition_bytes":0, "partition_messages":0,
"timeslice":"00:00:09.793000", "batch_start":"2018-11-06 09:42:00.176747",
"batch_end":"2018-11-06 09:42:00.437787", "source_duration":"00:00:00.214314",
"consecutive_error_count":null, "transaction_id":45035996274513069,
"frame_start":"2018-11-06 09:41:59.949", "frame_end":null}

If your scheduler is reading from more than partition, the --last 1 option lists the last microbatch from each partition:

$ /opt/vertica/packages/kafka/bin/vkconfig statistics --last 1 --conf iot.conf
{"microbatch":"iotlog", "target_schema":"public", "target_table":"iot_data",
"source_name":"iot_data", "source_cluster":"kafka_iot", "source_partition":0,
"start_offset":-2, "end_offset":-2, "end_reason":"DEADLINE",
"end_reason_message":null, "partition_bytes":0, "partition_messages":0,
"timeslice":"00:00:09.842000", "batch_start":"2018-11-06 12:52:49.387567",
"batch_end":"2018-11-06 12:52:59.400219", "source_duration":"00:00:09.950127",
"consecutive_error_count":null, "transaction_id":45035996274537015,
"frame_start":"2018-11-06 12:52:49.213", "frame_end":null}
{"microbatch":"iotlog", "target_schema":"public", "target_table":"iot_data",
"source_name":"iot_data", "source_cluster":"kafka_iot", "source_partition":1,
"start_offset":1604, "end_offset":1653, "end_reason":"END_OF_STREAM",
"end_reason_message":null, "partition_bytes":4387, "partition_messages":50,
"timeslice":"00:00:09.842000", "batch_start":"2018-11-06 12:52:49.387567",
"batch_end":"2018-11-06 12:52:59.400219", "source_duration":"00:00:00.220329",
"consecutive_error_count":null, "transaction_id":45035996274537015,
"frame_start":"2018-11-06 12:52:49.213", "frame_end":null}
{"microbatch":"iotlog", "target_schema":"public", "target_table":"iot_data",
"source_name":"iot_data", "source_cluster":"kafka_iot", "source_partition":2,
"start_offset":1603, "end_offset":1652, "end_reason":"END_OF_STREAM",
"end_reason_message":null, "partition_bytes":4383, "partition_messages":50,
"timeslice":"00:00:09.842000", "batch_start":"2018-11-06 12:52:49.387567",
"batch_end":"2018-11-06 12:52:59.400219", "source_duration":"00:00:00.318997",
"consecutive_error_count":null, "transaction_id":45035996274537015,
"frame_start":"2018-11-06 12:52:49.213", "frame_end":null}
{"microbatch":"iotlog", "target_schema":"public", "target_table":"iot_data",
"source_name":"iot_data", "source_cluster":"kafka_iot", "source_partition":3,
"start_offset":1604, "end_offset":1653, "end_reason":"END_OF_STREAM",
"end_reason_message":null, "partition_bytes":4375, "partition_messages":50,
"timeslice":"00:00:09.842000", "batch_start":"2018-11-06 12:52:49.387567",
"batch_end":"2018-11-06 12:52:59.400219", "source_duration":"00:00:00.219543",
"consecutive_error_count":null, "transaction_id":45035996274537015,
"frame_start":"2018-11-06 12:52:49.213", "frame_end":null}

You can use the --partition argument to get just the partitions you want:

$ /opt/vertica/packages/kafka/bin/vkconfig statistics --last 1 --partition 2 --conf iot.conf
{"microbatch":"iotlog", "target_schema":"public", "target_table":"iot_data",
"source_name":"iot_data", "source_cluster":"kafka_iot", "source_partition":2,
"start_offset":1603, "end_offset":1652, "end_reason":"END_OF_STREAM",
"end_reason_message":null, "partition_bytes":4383, "partition_messages":50,
"timeslice":"00:00:09.842000", "batch_start":"2018-11-06 12:52:49.387567",
"batch_end":"2018-11-06 12:52:59.400219", "source_duration":"00:00:00.318997",
"consecutive_error_count":null, "transaction_id":45035996274537015,
"frame_start":"2018-11-06 12:52:49.213", "frame_end":null}

If your scheduler reads from more than one source, the --last 1 option outputs the last microbatch from each source:

$ /opt/vertica/packages/kafka/bin/vkconfig statistics --last 1 --conf weblog.conf
{"microbatch":"weberrors", "target_schema":"public", "target_table":"web_errors",
"source_name":"web_errors", "source_cluster":"kafka_weblog",
"source_partition":0, "start_offset":10000, "end_offset":9999,
"end_reason":"END_OF_STREAM", "end_reason_message":null,
"partition_bytes":0, "partition_messages":0, "timeslice":"00:00:04.909000",
"batch_start":"2018-11-06 10:58:02.632624",
"batch_end":"2018-11-06 10:58:03.058663", "source_duration":"00:00:00.220618",
"consecutive_error_count":null, "transaction_id":45035996274523991,
"frame_start":"2018-11-06 10:58:02.394", "frame_end":null}
{"microbatch":"weblog", "target_schema":"public", "target_table":"web_hits",
"source_name":"web_hits", "source_cluster":"kafka_weblog", "source_partition":0,
"start_offset":80000, "end_offset":79999, "end_reason":"END_OF_STREAM",
"end_reason_message":null, "partition_bytes":0, "partition_messages":0,
"timeslice":"00:00:09.128000", "batch_start":"2018-11-06 10:58:03.322852",
"batch_end":"2018-11-06 10:58:03.63047", "source_duration":"00:00:00.226493",
"consecutive_error_count":null, "transaction_id":45035996274524004,
"frame_start":"2018-11-06 10:58:02.394", "frame_end":null}

You can use wildcards to enable partial matches. This example demonstrates getting the last microbatch for all microbatches whose names end with "log":

~$ /opt/vertica/packages/kafka/bin/vkconfig statistics --microbatch "%log" \
                                            --last 1 --conf weblog.conf
{"microbatch":"weblog", "target_schema":"public", "target_table":"web_hits",
"source_name":"web_hits", "source_cluster":"kafka_weblog", "source_partition":0,
"start_offset":80000, "end_offset":79999, "end_reason":"END_OF_STREAM",
"end_reason_message":null, "partition_bytes":0, "partition_messages":0,
"timeslice":"00:00:04.874000", "batch_start":"2018-11-06 11:37:16.17198",
"batch_end":"2018-11-06 11:37:16.460844", "source_duration":"00:00:00.213129",
"consecutive_error_count":null, "transaction_id":45035996274529932,
"frame_start":"2018-11-06 11:37:15.877", "frame_end":null}

To get microbatches from a specific period of time, use the --from-timestamp and --to-timestamp arguments. This example gets the microbatches that read from partition #2 between 12:52:30 and 12:53:00 on 2018-11-06 for the scheduler defined in iot.conf.

$ /opt/vertica/packages/kafka/bin/vkconfig statistics  --partition 1 \
                        --from-timestamp "2018-11-06 12:52:30" \
                        --to-timestamp "2018-11-06 12:53:00" --conf iot.conf
{"microbatch":"iotlog", "target_schema":"public", "target_table":"iot_data",
"source_name":"iot_data", "source_cluster":"kafka_iot", "source_partition":1,
"start_offset":1604, "end_offset":1653, "end_reason":"END_OF_STREAM",
"end_reason_message":null, "partition_bytes":4387, "partition_messages":50,
"timeslice":"00:00:09.842000", "batch_start":"2018-11-06 12:52:49.387567",
"batch_end":"2018-11-06 12:52:59.400219", "source_duration":"00:00:00.220329",
"consecutive_error_count":null, "transaction_id":45035996274537015,
"frame_start":"2018-11-06 12:52:49.213", "frame_end":null}
{"microbatch":"iotlog", "target_schema":"public", "target_table":"iot_data",
"source_name":"iot_data", "source_cluster":"kafka_iot", "source_partition":1,
"start_offset":1554, "end_offset":1603, "end_reason":"END_OF_STREAM",
"end_reason_message":null, "partition_bytes":4371, "partition_messages":50,
"timeslice":"00:00:09.788000", "batch_start":"2018-11-06 12:52:38.930428",
"batch_end":"2018-11-06 12:52:48.932604", "source_duration":"00:00:00.231709",
"consecutive_error_count":null, "transaction_id":45035996274536981,
"frame_start":"2018-11-06 12:52:38.685", "frame_end":null}

This example demonstrates using the --dump argument to get the SQL statement vkconfig executed to retrieve the output from the previous example:

$ /opt/vertica/packages/kafka/bin/vkconfig statistics  --dump --partition 1 \
                       --from-timestamp "2018-11-06 12:52:30" \
                       --to-timestamp "2018-11-06 12:53:00" --conf iot.conf
SELECT microbatch, target_schema, target_table, source_name, source_cluster,
source_partition, start_offset, end_offset, end_reason, end_reason_message,
partition_bytes, partition_messages, timeslice, batch_start, batch_end,
last_batch_duration AS source_duration, consecutive_error_count, transaction_id,
frame_start, frame_end FROM "iot_sched".stream_microbatch_history WHERE
(source_partition = '1') AND (frame_start >= '2018-11-06 12:52:30.0') AND
(frame_start < '2018-11-06 12:53:00.0') ORDER BY frame_start DESC, microbatch,
source_cluster, source_name, source_partition;

11 - Sync tool options

The sync utility immediately updates all source definitions by querying the Kafka cluster's brokers defined by the source.

The sync utility immediately updates all source definitions by querying the Kafka cluster's brokers defined by the source. By default, it updates all of the sources defined in the target schema. To update just specific sources, use the --source and --cluster options to specify which sources to update.

Syntax

vkconfig sync [options...]

--source source_name

The name of the source sync. This source must already exist in the target schema.

--cluster cluster_name

Identifies the cluster containing the source that you want to sync. You must have already defined this cluster in the scheduler.

--kafka_conf 'kafka_configuration_setting'

--kafka_conf_secret 'kafka_configuration_setting'

Conceals sensitive configuration data that you must pass directly to the rdkafka library, such as passwords. This parameter accepts settings in the same format as kafka_conf.

Values passed to this parameter are not logged or stored in system tables.

See the Common vkconfig script options for options available in all of the vkconfig tools..