This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Data streaming schema tables

Every time you create a scheduler (--create), Vertica creates a schema for that scheduler with the name you specify or the default stream_config.

1: stream_clusters
2: stream_events
3: stream_load_specs
4: stream_lock
5: stream_microbatch_history
6: stream_microbatch_source_map
7: stream_microbatches
8: stream_scheduler
9: stream_scheduler_history
10: stream_sources
11: stream_targets

Every time you create a scheduler (--create), Vertica creates a schema for that scheduler with the name you specify or the default stream_config. Each schema has the following tables:

Caution

Vertica recommends that you do not alter these tables except in consultation with support.

1 - stream_clusters

This table lists clusters and hosts.

This table lists clusters and hosts. You change settings in this table using the vkconfig cluster tool. See Cluster tool options for more information.

Column	Data Type	Description
id	INTEGER	The identification number assigned to the cluster.
cluster	VARCHAR	The name of the cluster.
hosts	VARCHAR	A comma-separated list of hosts associated with the cluster.

Examples

This example shows a cluster and its associated hosts.

=> SELECT * FROM stream_config.stream_clusters;

    id    |    cluster     |               hosts
 ---------+----------------+-----------------------------------
  2250001 | streamcluster1 | 10.10.10.10:9092,10.10.10.11:9092
(1 rows)

2 - stream_events

This table logs microbatches and other important events from the scheduler in an internal log table.

This table was renamed from kafka_config.kafka_events.

Column	Data Type	Description
event_time	TIMESTAMP	The time the event was logged.
log_level	VARCHAR	The type of event that was logged. Valid Values: TRACE DEBUG FATAL ERROR WARN INFO Default: INFO
frame_start	TIMESTAMP	The time when the frame executed.
frame_end	TIMESTAMP	The time when the frame completed.
microbatch	INTEGER	The identification number of the associated microbatch.
message	VARCHAR	A description of the event.
exception	VARCHAR	If this log is in the form of a stack trace, this column lists the exception.

Examples

This example shows typical rows from the stream_events table.

=> SELECT * FROM stream_config.stream_events;
-[ RECORD 1 ]-+-------------
event_time    | 2016-07-17 13:28:35.548-04
log_level     | INFO
frame_start   |
frame_end     |
microbatch    |
message       | New leader registered for schema stream_config. New ID: 0, new Host: 10.20.30.40
exception     |
-[ RECORD 2 ]-+-------------
event_time    | 2016-07-17 13:28:45.643-04
log_level     | INFO
frame_start   | 2015-07-17 12:28:45.633
frame_end     | 2015-07-17 13:28:50.701-04
microbatch    |
message       | Generated tuples: test3|2|-2,test3|1|-2,test3|0|-2
exception     |
-[ RECORD 3 ]-+----------------
event_time    | 2016-07-17 14:28:50.701-04
log_level     | INFO
frame_start   | 2016-07-17 13:28:45.633
frame_end     | 2016-07-17 14:28:50.701-04
microbatch    |
message       | Total rows inserted: 0
exception     |

3 - stream_load_specs

This table describes user-created load specs.

This table describes user-created load specs. You change the entries in this table using the vkconfig utility's load spec tool.

Column	Data Type	Description
id	INTEGER	The identification number assigned to the cluster.
load_spec	VARCHAR	The name of the load spec.
filters	VARCHAR	A comma-separated list of UDFilters for the scheduler to include in the COPY statement it uses to load data from Kafka.
parser	VARCHAR	A Vertica UDParser to use with a specified target. If you are using a Vertica native parser, parser parameters serve as a COPY statement parameters.
parser_parameters	VARCHAR	A list of parameters to provide to the parser.
load_method	VARCHAR	The COPY load method to use for all loads with this scheduler. Deprecated In Vertica 10.0, load methods are no longer used due to the removal of the WOS. The value shown in this column has no effect.
message_max_bytes	INTEGER	The maximum size, in bytes, of a message.
uds_kv_parameters	VARCHAR	A list of parameters that are supplied to the KafkaSource statement. If the value in this column is in the format `key``=``value`, the scheduler it to the COPY statement's KafkaSource call.

Examples

This example shows the load specs that you can use with a Vertica instance.


SELECT * FROM stream_config.stream_load_specs;
-[ RECORD 1 ]-----+------------
id                | 1
load_spec         | loadspec2
filters           |
parser            | KafkaParser
parser_parameters |
load_method       | direct
message_max_bytes | 1048576
uds_kv_parameters |
-[ RECORD 2 ]-----+------------
id                | 750001
load_spec         | streamspec1
filters           |
parser            | KafkaParser
parser_parameters |
load_method       | TRICKLE
message_max_bytes | 1048576
uds_kv_parameters |

4 - stream_lock

This table is locked by the scheduler.

This table is locked by the scheduler. This locks prevent multiple schedulers from running at the same time. The scheduler that locks this table updates it with its own information.

Important

Do not use this table in a serializable transaction that locks this table. Locking this table can interfere with the operation of the scheduler.

Column	Data Type	Description
scheduler_id	INTEGER	A unique ID for the scheduler instance that is currently running.
update_time	TIMESTAMP	The time the scheduler took ownership of writing to the schema.
process_info	VARCHAR	Information about the scheduler process. Currently unused.

Example

=> SELECT * FROM weblog_sched.stream_lock;
 scheduler_id |       update_time       | process_info
--------------+-------------------------+--------------
            2 | 2018-11-08 10:12:36.033 |
(1 row)

5 - stream_microbatch_history

This table contains a history of every microbatch executed within this scheduler configuration.

Column	Data Type	Description
source_name	VARCHAR	The name of the source.
source_cluster	VARCHAR	The name of the source cluster. The clusters are defined in stream_clusters.
source_partition	INTEGER	The number of the data streaming partition.
start_offset	INTEGER	The starting offset of the microbatch.
end_offset	INTEGER	The ending offset of the microbatch.
end_reason	VARCHAR	An explanation for why the batch ended.The following are valid end reasons: DEADLINE - The batch ran out of time. END_OFFSET - The load reached the ending offset specified in the KafkaSource. This reason is never used by the scheduler, as it does specify an end offset. END_OF_STREAM - There are no messages available to the scheduler or the eof_timeout has been reached. NETWORK_ERROR - The scheduler could not connect to Kafka. RESET_OFFSET - The start offset was changed using the `--update` and `--offset` parameters to the KafkaSource. This state does not occur during normal scheduler operations. SOURCE_ISSUE - The Kafka service returned an error. UNKNOWN - The batch ended for an unknown reason.
end_reason_message	VARCHAR	If the end reason is a network or source issue, this column contains a brief description of the issue.
partition_bytes	INTEGER	The number of bytes transferred from a source partition to a Vertica target table.
partition_messages	INTEGER	The number of messages transferred from a source partition to a Vertica target table.
microbatch_id	INTEGER	The Vertica transaction id for the batch session.
microbatch	VARCHAR	The name of the microbatch.
target_schema	VARCHAR	The name of the target schema.
target_table	VARCHAR	The name of the target table.
timeslice	INTERVAL	The amount of time spent in the KafkaSource operator.
batch_start	TIMESTAMP	The time the batch executed.
batch_end	TIMESTAMP	The time the batch completed.
last_batch_duration	INTERVAL	The length of time required to run the complete COPY statement.
last_batch_parallelism	INTEGER	The number of parallel COPY statements generated to process the microbatch during the last frame.
microbatch_sub_id	INTEGER	The identifier for the COPY statement that processed the microbatch.
consecutive_error_count	INTEGER	(Currently not used.) The number of times a microbatch has encountered an error on an attempt to load. This value increases over multiple attempts.
transaction_id	INTEGER	The identifier for the transaction within the session.
frame_start	TIMESTAMP	The time the frame started. A frame can contain multiple microbatches.
frame_end	TIMESTAMP	The time the frame completed.

Examples

This example shows typical rows from the stream_microbatch_history table.


=> SELECT * FROM stream_config.stream_microbatch_history;

-[ RECORD 1 ]--+---------------------------
source_name             | streamsource1
source_cluster          | kafka-1
source_partition        | 0
start_offset            | 196
end_offset              | 196
end_reason              | END_OF_STREAM
partition_bytes         | 0
partition_messages      | 0
microbatch_id           | 1
microbatch              | mb_0
target_schema           | public
target_table            | kafka_flex_0
timeslice               | 00:00:09.892
batch_start             | 2016-07-28 11:31:25.854221
batch_end               | 2016-07-28 11:31:26.357942
last_batch_duration     | 00:00:00.379826
last_batch_parallelism  | 1
microbatch_sub_id       | 0
consecutive_error_count |
transaction_id          | 45035996275130064
frame_start             | 2016-07-28 11:31:25.751
frame_end               |
end_reason_message      |

-[ RECORD 2 ]--+---------------------------
source_name             | streamsource1
source_cluster          | kafka-1
source_partition        | 1
start_offset            | 197
end_offset              | 197
end_reason              | NETWORK_ISSUE
partition_bytes         | 0
partition_messages      | 0
microbatch_id           | 1
microbatch              | mb_0
target_schema           | public
target_table            | kafka_flex_0
timeslice               | 00:00:09.897
batch_start             | 2016-07-28 11:31:45.84898
batch_end               | 2016-07-28 11:31:46.253367
last_batch_duration     | 000:00:00.377796
last_batch_parallelism  | 1
microbatch_sub_id       | 0
consecutive_error_count |
transaction_id          | 45035996275130109
frame_start             | 2016-07-28 11:31:45.751
frame_end               |
end_reason_message      | Local: All brokers are down

6 - stream_microbatch_source_map

This table maps microbatches to the their associated sources.

Column	Data Type	Description
microbatch	INTEGER	The identification number of the microbatch.
source	INTEGER	The identification number of the associated source.

Examples

This example shows typical rows from the stream_microbatch table.

SELECT * FROM stream_config.stream_microbatch_source_map;
microbatch | source
-----------+--------
         1 |      4
         3 |      2
(2 rows)

7 - stream_microbatches

This table contains configuration data related to microbatches.

Column	Data Type	Description
id	INTEGER	The identification number of the microbatch.
microbatch	VARCHAR	The name of the microbatch.
target	INTEGER	The identification number of the target associated with the microbatch.
load_spec	INTEGER	The identification number of the load spec associated with the microbatch.
target_columns	VARCHAR	The table columns associated with the microbatch.
rejection_schema	VARCHAR	The schema that contains the rejection table.
rejection_table	VARCHAR	The table where Vertica stores messages that are rejected by the database.
max_parallelism	INTEGER	The number of parallel COPY statements the scheduler uses to process the microbatch.
enabled	BOOLEAN	When TRUE, the microbatch is enabled for use.
consumer_group_id	VARCHAR	The name of the Kafka consumer group to report loading progress to. This value is NULL if the microbatch reports its progress to the default consumer group for the scheduler. See Monitoring Vertica message consumption with consumer groups for more information.

Examples

This example shows a row from a typical stream_microbatches table.

=> select * from weblog_sched.stream_microbatches;
-[ RECORD 1 ]-----+----------
id                | 750001
microbatch        | weberrors
target            | 750001
load_spec         | 2250001
target_columns    |
rejection_schema  |
rejection_table   |
max_parallelism   | 1
enabled           | t
consumer_group_id |
-[ RECORD 2 ]-----+----------
id                | 1
microbatch        | weblog
target            | 1
load_spec         | 1
target_columns    |
rejection_schema  |
rejection_table   |
max_parallelism   | 1
enabled           | t
consumer_group_id | weblog_group

8 - stream_scheduler

This table contains metadata related to a single scheduler.

This table was renamed from kafka_config.kafka_scheduler. This table used to contain a column named eof_timeout_ms. It has been removed.

Column	Data Type	Description
version	VARCHAR	The version of the scheduler.
frame_duration	INTERVAL	The length of time of the frame. The default is 00:00:10.
resource_pool	VARCHAR	The resource pool associated with this scheduler.
config_refresh	INTERVAL	The interval of time that the scheduler runs before applying any changes to its metadata, such as, changes made using the `--update` option. For more information, refer to `--config-refresh` inScheduler tool options.
new_source_policy	VARCHAR	When during the frame that the source runs. Set this value with the `--new-source-policy` in Source tool options. Valid Values: FAIR: Takes the average length of time from the previous batches and schedules itself appropriately. START: Runs all new sources at the beginning of the frame. In this case, Vertica gives the minimal amount of time to run. END: Runs all new sources starting at the end of the frame. In this case, Vertica gives the maximum amount of time to run. Default: FAIR
pushback_policy	VARCHAR	(Not currently used.) How Vertica handles delays for microbatches that continually fail. Valid Values: FLAT LINEAR EXPONENTIAL Default: LINEAR
pushback_max_count	INTEGER	(Currently not used.) The maximum number of times a microbatch can fail before Vertica terminates it.
auto_sync	BOOLEAN	When TRUE, the scheduler automatically synchronizes source information with host clusters. For more information, refer to Automatically consume data from Kafka with the scheduler. Default: TRUE
consumer_group_id	VARCHAR	The name of the Kafka consumer group to which the scheduler reports its progress in consuming messages. This value is NULL if the scheduler reports to the default consumer group named vertica-database_name. See Monitoring Vertica message consumption with consumer groups for more information.

Examples

This example shows a typical row in the stream_scheduler table.

=> SELECT * FROM weblog_sched.stream_scheduler;
-[ RECORD 1 ]------+-----------------------
version            | v9.2.1
frame_duration     | 00:05:00
resource_pool      | weblog_pool
config_refresh     | 00:05
new_source_policy  | FAIR
pushback_policy    | LINEAR
pushback_max_count | 5
auto_sync          | t
consumer_group_id  | vertica-consumer-group

9 - stream_scheduler_history

This table shows the history of launched scheduler instances.

This table was renamed from kafka_config.kafka_scheduler_history.

Column	Data Type	Description
elected_leader_time	TIMESTAMP	The time when this instance took began scheduling operations.
host	VARCHAR	The host name of the machine running the scheduler instance.
launcher	VARCHAR	The name of the currently active scheduler instance. Default: NULL
scheduler_id	INTEGER	The identification number of the scheduler.
version	VARCHAR	The version of the scheduler.

Examples

This example shows typical rows from the stream_scheduler_history table.

 SELECT * FROM stream_config.stream_scheduler_history;
   elected_leader_time   |     host     |     launcher      | scheduler_id | version
-------------------------+--------------+-------------------+--------------+---------
 2016-07-26 13:19:42.692 | 10.20.100.62 |                   |            0 | v8.0.0
 2016-07-26 13:54:37.715 | 10.20.100.62 |                   |            1 | v8.0.0
 2016-07-26 13:56:06.785 | 10.20.100.62 |                   |            2 | v8.0.0
 2016-07-26 13:56:56.033 | 10.20.100.62 | SchedulerInstance |            3 | v8.0.0
 2016-07-26 15:51:20.513 | 10.20.100.62 | SchedulerInstance |            4 | v8.0.0
 2016-07-26 15:51:35.111 | 10.20.100.62 | SchedulerInstance |            5 | v8.0.0
    (6 rows)

10 - stream_sources

This table contains metadata related to data streaming sources.

This table was formerly named kafka_config.kafka_scheduler.

Column	Data Type	Description
id	INTEGER	The identification number of the source
source	VARCHAR	The name of the source.
cluster	INTEGER	The identification number of the cluster associated with the source.
partitions	INTEGER	The number of partitions in the source.
enabled	BOOLEAN	When TRUE, the source is enabled for use.

Examples

This example shows a typical row from the stream_sources table.

select * from  stream_config.stream_sources;
-[ RECORD 1 ]--------------
   id         | 1
   source     | SourceFeed1
   cluster    | 1
   partitions | 1
   enabled    | t
-[ RECORD 2 ]--------------
   id         | 250001
   source     | SourceFeed2
   cluster    | 1
   partitions | 1
   enabled    | t

11 - stream_targets

This table contains the metadata for all Vertica target tables.

The table was formerly named kafka_config.kafka_targets.

Column	Data Type	Description
id	INTEGER	The identification number of the target table
target_schema	VARCHAR	The name of the schema for the target table.
target_table	VARCHAR	The name of the target table.

Examples

This example shows typical rows from the stream_tables table.

=> SELECT * FROM stream_config.stream_targets;
-[ RECORD 1 ]-----+---------------------
id                | 1
target_schema     | public
target_table      | stream_flex1
-[ RECORD 2 ]-----+---------------------
id                | 2
target_schema     | public
target_table      | stream_flex2