从 vkconfig 获取配置和统计信息

vkconfig 工具有两个可帮助检查调度程序的配置并监控数据加载情况的功能：

配置调度程序（调度程序、群集、源、目标、加载规范和微批处理）的 vkconfig 工具具有 --read 实参，可让这些工具在调度程序中输出其当前设置。
vkconfig 统计信息工具可用于获取有关微批处理的统计信息。根据日期和时间范围、群集、分区及其他条件，可以筛选微批处理记录。

上述两个功能均以 JSON 格式输出数据。可以使用能够使用 JSON 数据的第三方工具或编写自己的脚本来处理配置和统计信息数据。

此外，还可以通过查询调度程序架构中的配置表来访问这些 vkconfig 选项提供的数据。但是，您可能会发现这些选项变得更加易于使用，因为它们不需要您连接到 Vertica 数据库。

获取配置信息

将 --read 选项传递到 vkconfig 的配置工具即可获取该工具可以设置的选项的当前设置。此输出采用 JSON 格式。以下示例演示了如何为 weblog.conf 配置文件中定义的调度程序从调度程序和群集工具中获取配置信息：

$ vkconfig scheduler --read --conf weblog.conf
{"version":"v9.2.0", "frame_duration":"00:00:10", "resource_pool":"weblog_pool",
 "config_refresh":"00:05:00", "new_source_policy":"FAIR",
 "pushback_policy":"LINEAR", "pushback_max_count":5, "auto_sync":true,
 "consumer_group_id":null}

$ vkconfig cluster --read --conf weblog.conf
{"cluster":"kafka_weblog", "hosts":"kafak01.example.com:9092,kafka02.example.com:9092"}

--read 选项将列出该工具在调度程序架构中创建的所有值。例如，如果已在调度程序中定义多个目标，则 --read 选项会列出所有目标。

$ vkconfig target --list --conf weblog.conf
{"target_schema":"public", "target_table":"health_data"}
{"target_schema":"public", "target_table":"iot_data"}
{"target_schema":"public", "target_table":"web_hits"}

可以使用 vkconfig 工具接受的其他实参来筛选 --read 选项输出。例如，在群集工具中，可以使用 --host 实参将输出限制为仅显示包含特定主机的群集。这些实参支持 LIKE 谓词通配符，因此可以匹配部分值。有关使用通配符的详细信息，请参阅 LIKE 谓词。

以下示例演示了如何使用 --host 实参来筛选群集工具的 --read 选项的输出。第一次调用显示未经筛选的输出。第二次调用可筛选输出，以仅显示以“kafka”开头的群集：

$ vkconfig cluster --read --conf weblog.conf
{"cluster":"some_cluster", "hosts":"host01.example.com"}
{"cluster":"iot_cluster",
 "hosts":"kafka-iot01.example.com:9092,kafka-iot02.example.com:9092"}
{"cluster":"weblog",
 "hosts":"web01.example.com.com:9092,web02.example.com:9092"}
{"cluster":"streamcluster1",
 "hosts":"kafka-a-01.example.com:9092,kafka-a-02.example.com:9092"}
{"cluster":"test_cluster",
 "hosts":"test01.example.com:9092,test02.example.com:9092"}

$ vkconfig cluster --read --conf weblog.conf --hosts kafka%
{"cluster":"iot_cluster",
 "hosts":"kafka-iot01.example.com:9092,kafka-iot02.example.com:9092"}
{"cluster":"streamcluster1",
 "hosts":"kafka-a-01.example.com:9092,kafka-a-02.example.com:9092"}

有关详细信息，请参阅群集工具选项、加载规范工具选项、微批处理工具选项、调度程序工具选项、目标工具选项和源工具选项。

获取流式传输数据加载统计信息

vkconfig 脚本的统计信息工具可用于查看调度程序微批处理的历史记录。可以使用以下条件的任意组合来筛选结果：

微批处理的名称
作为数据加载源的 Kafka 群集
主题的名称
主题内的分区
数据加载所针对的 Vertica 架构和表
日期和时间范围
最新的微批处理

有关此工具中提供的所有选项，请参阅统计信息工具选项。

以下示例将获取调度程序运行的最后两个微批处理：

$ vkconfig statistics --last 2 --conf weblog.conf
{"microbatch":"weblog", "target_schema":"public", "target_table":"web_hits",
 "source_name":"web_hits", "source_cluster":"kafka_weblog", "source_partition":0,
 "start_offset":73300, "end_offset":73399, "end_reason":"END_OF_STREAM",
 "end_reason_message":null, "partition_bytes":19588, "partition_messages":100,
 "timeslice":"00:00:09.807000", "batch_start":"2018-11-02 13:22:07.825295",
 "batch_end":"2018-11-02 13:22:08.135299", "source_duration":"00:00:00.219619",
 "consecutive_error_count":null, "transaction_id":45035996273976123,
 "frame_start":"2018-11-02 13:22:07.601", "frame_end":null}
{"microbatch":"weblog", "target_schema":"public", "target_table":"web_hits",
 "source_name":"web_hits", "source_cluster":"kafka_weblog", "source_partition":0,
 "start_offset":73200, "end_offset":73299, "end_reason":"END_OF_STREAM",
 "end_reason_message":null, "partition_bytes":19781, "partition_messages":100,
 "timeslice":"00:00:09.561000", "batch_start":"2018-11-02 13:21:58.044698",
 "batch_end":"2018-11-02 13:21:58.335431", "source_duration":"00:00:00.214868",
 "consecutive_error_count":null, "transaction_id":45035996273976095,
 "frame_start":"2018-11-02 13:21:57.561", "frame_end":null}

以下示例将从名为 web_hits 的源获取介于 2018 年 11 月 2 日 13:21:00 到 13:21:20 之间的微批处理：

$ vkconfig statistics --source "web_hits" --from-timestamp \
           "2018-11-02 13:21:00" --to-timestamp "2018-11-02 13:21:20"  \
           --conf weblog.conf
{"microbatch":"weblog", "target_schema":"public", "target_table":"web_hits",
 "source_name":"web_hits", "source_cluster":"kafka_weblog", "source_partition":0,
 "start_offset":72800, "end_offset":72899, "end_reason":"END_OF_STREAM",
 "end_reason_message":null, "partition_bytes":19989, "partition_messages":100,
 "timeslice":"00:00:09.778000", "batch_start":"2018-11-02 13:21:17.581606",
 "batch_end":"2018-11-02 13:21:18.850705", "source_duration":"00:00:01.215751",
 "consecutive_error_count":null, "transaction_id":45035996273975997,
 "frame_start":"2018-11-02 13:21:17.34", "frame_end":null}
{"microbatch":"weblog", "target_schema":"public", "target_table":"web_hits",
 "source_name":"web_hits", "source_cluster":"kafka_weblog", "source_partition":0,
 "start_offset":72700, "end_offset":72799, "end_reason":"END_OF_STREAM",
 "end_reason_message":null, "partition_bytes":19640, "partition_messages":100,
 "timeslice":"00:00:09.857000", "batch_start":"2018-11-02 13:21:07.470834",
 "batch_end":"2018-11-02 13:21:08.737255", "source_duration":"00:00:01.218932",
 "consecutive_error_count":null, "transaction_id":45035996273975978,
 "frame_start":"2018-11-02 13:21:07.309", "frame_end":null}

有关使用此工具的更多示例，请参阅统计信息工具选项。