This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Data streaming schema tables
Every time you create a scheduler (--create), Vertica creates a schema for that scheduler with the name you specify or the default stream_config.
Every time you create a scheduler (--create
), Vertica creates a schema for that scheduler with the name you specify or the default stream_config
. Each schema has the following tables:
Caution
Vertica recommends that you do not alter these tables except in consultation with support.
1 - stream_clusters
This table lists clusters and hosts.
This table lists clusters and hosts. You change settings in this table using the vkconfig cluster tool. See Cluster tool options for more information.
Column |
Data Type |
Description |
id |
INTEGER |
The identification number assigned to the cluster. |
cluster |
VARCHAR |
The name of the cluster. |
hosts |
VARCHAR |
A comma-separated list of hosts associated with the cluster. |
Examples
This example shows a cluster and its associated hosts.
=> SELECT * FROM stream_config.stream_clusters;
id | cluster | hosts
---------+----------------+-----------------------------------
2250001 | streamcluster1 | 10.10.10.10:9092,10.10.10.11:9092
(1 rows)
2 - stream_events
This table logs microbatches and other important events from the scheduler in an internal log table.
This table logs microbatches and other important events from the scheduler in an internal log table.
This table was renamed from kafka_config.kafka_events.
Column |
Data Type |
Description |
event_time |
TIMESTAMP |
The time the event was logged. |
log_level |
VARCHAR |
The type of event that was logged.
Valid Values:
-
TRACE
-
DEBUG
-
FATAL
-
ERROR
-
WARN
-
INFO
Default: INFO
|
frame_start |
TIMESTAMP |
The time when the frame executed. |
frame_end |
TIMESTAMP |
The time when the frame completed. |
microbatch |
INTEGER |
The identification number of the associated microbatch. |
message |
VARCHAR |
A description of the event. |
exception |
VARCHAR |
If this log is in the form of a stack trace, this column lists the exception. |
Examples
This example shows typical rows from the stream_events table.
=> SELECT * FROM stream_config.stream_events;
-[ RECORD 1 ]-+-------------
event_time | 2016-07-17 13:28:35.548-04
log_level | INFO
frame_start |
frame_end |
microbatch |
message | New leader registered for schema stream_config. New ID: 0, new Host: 10.20.30.40
exception |
-[ RECORD 2 ]-+-------------
event_time | 2016-07-17 13:28:45.643-04
log_level | INFO
frame_start | 2015-07-17 12:28:45.633
frame_end | 2015-07-17 13:28:50.701-04
microbatch |
message | Generated tuples: test3|2|-2,test3|1|-2,test3|0|-2
exception |
-[ RECORD 3 ]-+----------------
event_time | 2016-07-17 14:28:50.701-04
log_level | INFO
frame_start | 2016-07-17 13:28:45.633
frame_end | 2016-07-17 14:28:50.701-04
microbatch |
message | Total rows inserted: 0
exception |
3 - stream_load_specs
This table describes user-created load specs.
This table describes user-created load specs. You change the entries in this table using the vkconfig utility's load spec tool.
Column |
Data Type |
Description |
id |
INTEGER |
The identification number assigned to the cluster. |
load_spec |
VARCHAR |
The name of the load spec. |
filters |
VARCHAR |
A comma-separated list of UDFilters for the scheduler to include in the COPY statement it uses to load data from Kafka. |
parser |
VARCHAR |
A Vertica UDParser to use with a specified target. If you are using a Vertica native parser, parser parameters serve as a COPY statement parameters. |
parser_parameters |
VARCHAR |
A list of parameters to provide to the parser. |
load_method |
VARCHAR |
The COPY load method to use for all loads with this scheduler.
Deprecated
In Vertica 10.0, load methods are no longer used due to the removal of the WOS. The value shown in this column has no effect.
|
message_max_bytes |
INTEGER |
The maximum size, in bytes, of a message. |
uds_kv_parameters |
VARCHAR |
A list of parameters that are supplied to the KafkaSource statement. If the value in this column is in the format key = value , the scheduler it to the COPY statement's KafkaSource call. |
Examples
This example shows the load specs that you can use with a Vertica instance.
SELECT * FROM stream_config.stream_load_specs;
-[ RECORD 1 ]-----+------------
id | 1
load_spec | loadspec2
filters |
parser | KafkaParser
parser_parameters |
load_method | direct
message_max_bytes | 1048576
uds_kv_parameters |
-[ RECORD 2 ]-----+------------
id | 750001
load_spec | streamspec1
filters |
parser | KafkaParser
parser_parameters |
load_method | TRICKLE
message_max_bytes | 1048576
uds_kv_parameters |
4 - stream_lock
This table is locked by the scheduler.
This table is locked by the scheduler. This locks prevent multiple schedulers from running at the same time. The scheduler that locks this table updates it with its own information.
Important
Do not use this table in a serializable transaction that locks this table. Locking this table can interfere with the operation of the scheduler.
Column |
Data Type |
Description |
scheduler_id |
INTEGER |
A unique ID for the scheduler instance that is currently running. |
update_time |
TIMESTAMP |
The time the scheduler took ownership of writing to the schema. |
process_info |
VARCHAR |
Information about the scheduler process. Currently unused. |
Example
=> SELECT * FROM weblog_sched.stream_lock;
scheduler_id | update_time | process_info
--------------+-------------------------+--------------
2 | 2018-11-08 10:12:36.033 |
(1 row)
5 - stream_microbatch_history
This table contains a history of every microbatch executed within this scheduler configuration.
This table contains a history of every microbatch executed within this scheduler configuration.
Column |
Data Type |
Description |
source_name |
VARCHAR |
The name of the source. |
source_cluster |
VARCHAR |
The name of the source cluster. The clusters are defined in stream_clusters. |
source_partition |
INTEGER |
The number of the data streaming partition. |
start_offset |
INTEGER |
The starting offset of the microbatch. |
end_offset |
INTEGER |
The ending offset of the microbatch. |
end_reason |
VARCHAR |
An explanation for why the batch ended.The following are valid end reasons:
-
DEADLINE - The batch ran out of time.
-
END_OFFSET - The load reached the ending offset specified in the KafkaSource. This reason is never used by the scheduler, as it does specify an end offset.
-
END_OF_STREAM - There are no messages available to the scheduler or the eof_timeout has been reached.
-
NETWORK_ERROR - The scheduler could not connect to Kafka.
-
RESET_OFFSET - The start offset was changed using the --update and --offset parameters to the KafkaSource. This state does not occur during normal scheduler operations.
-
SOURCE_ISSUE - The Kafka service returned an error.
-
UNKNOWN - The batch ended for an unknown reason.
|
end_reason_message |
VARCHAR |
If the end reason is a network or source issue, this column contains a brief description of the issue. |
partition_bytes |
INTEGER |
The number of bytes transferred from a source partition to a Vertica target table. |
partition_messages |
INTEGER |
The number of messages transferred from a source partition to a Vertica target table. |
microbatch_id |
INTEGER |
The Vertica transaction id for the batch session. |
microbatch |
VARCHAR |
The name of the microbatch. |
target_schema |
VARCHAR |
The name of the target schema. |
target_table |
VARCHAR |
The name of the target table. |
timeslice |
INTERVAL |
The amount of time spent in the KafkaSource operator. |
batch_start |
TIMESTAMP |
The time the batch executed. |
batch_end |
TIMESTAMP |
The time the batch completed. |
last_batch_duration |
INTERVAL |
The length of time required to run the complete COPY statement. |
last_batch_parallelism |
INTEGER |
The number of parallel COPY statements generated to process the microbatch during the last frame. |
microbatch_sub_id |
INTEGER |
The identifier for the COPY statement that processed the microbatch. |
consecutive_error_count |
INTEGER |
(Currently not used.) The number of times a microbatch has encountered an error on an attempt to load. This value increases over multiple attempts. |
transaction_id |
INTEGER |
The identifier for the transaction within the session. |
frame_start |
TIMESTAMP |
The time the frame started. A frame can contain multiple microbatches. |
frame_end |
TIMESTAMP |
The time the frame completed. |
Examples
This example shows typical rows from the stream_microbatch_history table.
=> SELECT * FROM stream_config.stream_microbatch_history;
-[ RECORD 1 ]--+---------------------------
source_name | streamsource1
source_cluster | kafka-1
source_partition | 0
start_offset | 196
end_offset | 196
end_reason | END_OF_STREAM
partition_bytes | 0
partition_messages | 0
microbatch_id | 1
microbatch | mb_0
target_schema | public
target_table | kafka_flex_0
timeslice | 00:00:09.892
batch_start | 2016-07-28 11:31:25.854221
batch_end | 2016-07-28 11:31:26.357942
last_batch_duration | 00:00:00.379826
last_batch_parallelism | 1
microbatch_sub_id | 0
consecutive_error_count |
transaction_id | 45035996275130064
frame_start | 2016-07-28 11:31:25.751
frame_end |
end_reason_message |
-[ RECORD 2 ]--+---------------------------
source_name | streamsource1
source_cluster | kafka-1
source_partition | 1
start_offset | 197
end_offset | 197
end_reason | NETWORK_ISSUE
partition_bytes | 0
partition_messages | 0
microbatch_id | 1
microbatch | mb_0
target_schema | public
target_table | kafka_flex_0
timeslice | 00:00:09.897
batch_start | 2016-07-28 11:31:45.84898
batch_end | 2016-07-28 11:31:46.253367
last_batch_duration | 000:00:00.377796
last_batch_parallelism | 1
microbatch_sub_id | 0
consecutive_error_count |
transaction_id | 45035996275130109
frame_start | 2016-07-28 11:31:45.751
frame_end |
end_reason_message | Local: All brokers are down
6 - stream_microbatch_source_map
This table maps microbatches to the their associated sources.
This table maps microbatches to the their associated sources.
Column |
Data Type |
Description |
microbatch |
INTEGER |
The identification number of the microbatch. |
source |
INTEGER |
The identification number of the associated source. |
Examples
This example shows typical rows from the stream_microbatch table.
SELECT * FROM stream_config.stream_microbatch_source_map;
microbatch | source
-----------+--------
1 | 4
3 | 2
(2 rows)
7 - stream_microbatches
This table contains configuration data related to microbatches.
This table contains configuration data related to microbatches.
Column |
Data Type |
Description |
id |
INTEGER |
The identification number of the microbatch. |
microbatch |
VARCHAR |
The name of the microbatch. |
target |
INTEGER |
The identification number of the target associated with the microbatch. |
load_spec |
INTEGER |
The identification number of the load spec associated with the microbatch. |
target_columns |
VARCHAR |
The table columns associated with the microbatch. |
rejection_schema |
VARCHAR |
The schema that contains the rejection table. |
rejection_table |
VARCHAR |
The table where Vertica stores messages that are rejected by the database. |
max_parallelism |
INTEGER |
The number of parallel COPY statements the scheduler uses to process the microbatch. |
enabled |
BOOLEAN |
When TRUE, the microbatch is enabled for use. |
consumer_group_id |
VARCHAR |
The name of the Kafka consumer group to report loading progress to. This value is NULL if the microbatch reports its progress to the default consumer group for the scheduler. See Monitoring Vertica message consumption with consumer groups for more information. |
Examples
This example shows a row from a typical stream_microbatches table.
=> select * from weblog_sched.stream_microbatches;
-[ RECORD 1 ]-----+----------
id | 750001
microbatch | weberrors
target | 750001
load_spec | 2250001
target_columns |
rejection_schema |
rejection_table |
max_parallelism | 1
enabled | t
consumer_group_id |
-[ RECORD 2 ]-----+----------
id | 1
microbatch | weblog
target | 1
load_spec | 1
target_columns |
rejection_schema |
rejection_table |
max_parallelism | 1
enabled | t
consumer_group_id | weblog_group
8 - stream_scheduler
This table contains metadata related to a single scheduler.
This table contains metadata related to a single scheduler.
This table was renamed from kafka_config.kafka_scheduler. This table used to contain a column named eof_timeout_ms. It has been removed.
Column |
Data Type |
Description |
version |
VARCHAR |
The version of the scheduler. |
frame_duration |
INTERVAL |
The length of time of the frame. The default is 00:00:10. |
resource_pool |
VARCHAR |
The resource pool associated with this scheduler. |
config_refresh |
INTERVAL |
The interval of time that the scheduler runs before applying any changes to its metadata, such as, changes made using the --update option.
For more information, refer to --config-refresh inScheduler tool options.
|
new_source_policy |
VARCHAR |
When during the frame that the source runs. Set this value with the --new-source-policy in Source tool options.
Valid Values:
-
FAIR: Takes the average length of time from the previous batches and schedules itself appropriately.
-
START: Runs all new sources at the beginning of the frame. In this case, Vertica gives the minimal amount of time to run.
-
END: Runs all new sources starting at the end of the frame. In this case, Vertica gives the maximum amount of time to run.
Default:
FAIR
|
pushback_policy |
VARCHAR |
(Not currently used.) How Vertica handles delays for microbatches that continually fail.
Valid Values:
Default:
LINEAR
|
pushback_max_count |
INTEGER |
(Currently not used.) The maximum number of times a microbatch can fail before Vertica terminates it. |
auto_sync |
BOOLEAN |
When TRUE, the scheduler automatically synchronizes source information with host clusters. For more information, refer to Automatically consume data from Kafka with a scheduler.
Default:
TRUE
|
consumer_group_id |
VARCHAR |
The name of the Kafka consumer group to which the scheduler reports its progress in consuming messages. This value is NULL if the scheduler reports to the default consumer group named vertica-database_name. See Monitoring Vertica message consumption with consumer groups for more information. |
Examples
This example shows a typical row in the stream_scheduler table.
=> SELECT * FROM weblog_sched.stream_scheduler;
-[ RECORD 1 ]------+-----------------------
version | v9.2.1
frame_duration | 00:05:00
resource_pool | weblog_pool
config_refresh | 00:05
new_source_policy | FAIR
pushback_policy | LINEAR
pushback_max_count | 5
auto_sync | t
consumer_group_id | vertica-consumer-group
9 - stream_scheduler_history
This table shows the history of launched scheduler instances.
This table shows the history of launched scheduler instances.
This table was renamed from kafka_config.kafka_scheduler_history.
Column |
Data Type |
Description |
elected_leader_time |
TIMESTAMP |
The time when this instance took began scheduling operations. |
host |
VARCHAR |
The host name of the machine running the scheduler instance. |
launcher |
VARCHAR |
The name of the currently active scheduler instance.
Default: NULL
|
scheduler_id |
INTEGER |
The identification number of the scheduler. |
version |
VARCHAR |
The version of the scheduler. |
Examples
This example shows typical rows from the stream_scheduler_history table.
SELECT * FROM stream_config.stream_scheduler_history;
elected_leader_time | host | launcher | scheduler_id | version
-------------------------+--------------+-------------------+--------------+---------
2016-07-26 13:19:42.692 | 10.20.100.62 | | 0 | v8.0.0
2016-07-26 13:54:37.715 | 10.20.100.62 | | 1 | v8.0.0
2016-07-26 13:56:06.785 | 10.20.100.62 | | 2 | v8.0.0
2016-07-26 13:56:56.033 | 10.20.100.62 | SchedulerInstance | 3 | v8.0.0
2016-07-26 15:51:20.513 | 10.20.100.62 | SchedulerInstance | 4 | v8.0.0
2016-07-26 15:51:35.111 | 10.20.100.62 | SchedulerInstance | 5 | v8.0.0
(6 rows)
10 - stream_sources
This table contains metadata related to data streaming sources.
This table contains metadata related to data streaming sources.
This table was formerly named kafka_config.kafka_scheduler.
Column |
Data Type |
Description |
id |
INTEGER |
The identification number of the source |
source |
VARCHAR |
The name of the source. |
cluster |
INTEGER |
The identification number of the cluster associated with the source. |
partitions |
INTEGER |
The number of partitions in the source. |
enabled |
BOOLEAN |
When TRUE, the source is enabled for use. |
Examples
This example shows a typical row from the stream_sources table.
select * from stream_config.stream_sources;
-[ RECORD 1 ]--------------
id | 1
source | SourceFeed1
cluster | 1
partitions | 1
enabled | t
-[ RECORD 2 ]--------------
id | 250001
source | SourceFeed2
cluster | 1
partitions | 1
enabled | t
11 - stream_targets
This table contains the metadata for all Vertica target tables.
This table contains the metadata for all Vertica target tables.
The table was formerly named kafka_config.kafka_targets.
Column |
Data Type |
Description |
id |
INTEGER |
The identification number of the target table |
target_schema |
VARCHAR |
The name of the schema for the target table. |
target_table |
VARCHAR |
The name of the target table. |
Examples
This example shows typical rows from the stream_tables table.
=> SELECT * FROM stream_config.stream_targets;
-[ RECORD 1 ]-----+---------------------
id | 1
target_schema | public
target_table | stream_flex1
-[ RECORD 2 ]-----+---------------------
id | 2
target_schema | public
target_table | stream_flex2