This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Data streaming schema tables

Every time you create a scheduler (--create), Vertica creates a schema for that scheduler with the name you specify or the default stream_config.

Every time you create a scheduler (--create), Vertica creates a schema for that scheduler with the name you specify or the default stream_config. Each schema has the following tables:

1 - stream_clusters

This table lists clusters and hosts.

This table lists clusters and hosts. You change settings in this table using the vkconfig cluster tool. See Cluster tool options for more information.

Column Data Type Description
id INTEGER The identification number assigned to the cluster.
cluster VARCHAR The name of the cluster.
hosts VARCHAR A comma-separated list of hosts associated with the cluster.

Examples

This example shows a cluster and its associated hosts.

=> SELECT * FROM stream_config.stream_clusters;

    id    |    cluster     |               hosts
 ---------+----------------+-----------------------------------
  2250001 | streamcluster1 | 10.10.10.10:9092,10.10.10.11:9092
(1 rows)

2 - stream_events

This table logs microbatches and other important events from the scheduler in an internal log table.

This table logs microbatches and other important events from the scheduler in an internal log table.

This table was renamed from kafka_config.kafka_events.

Column Data Type Description
event_time TIMESTAMP The time the event was logged.
log_level VARCHAR

The type of event that was logged.

Valid Values:

  • TRACE

  • DEBUG

  • FATAL

  • ERROR

  • WARN

  • INFO

Default: INFO

frame_start TIMESTAMP The time when the frame executed.
frame_end TIMESTAMP The time when the frame completed.
microbatch INTEGER The identification number of the associated microbatch.
message VARCHAR A description of the event.
exception VARCHAR If this log is in the form of a stack trace, this column lists the exception.

Examples

This example shows typical rows from the stream_events table.

=> SELECT * FROM stream_config.stream_events;
-[ RECORD 1 ]-+-------------
event_time    | 2016-07-17 13:28:35.548-04
log_level     | INFO
frame_start   |
frame_end     |
microbatch    |
message       | New leader registered for schema stream_config. New ID: 0, new Host: 10.20.30.40
exception     |
-[ RECORD 2 ]-+-------------
event_time    | 2016-07-17 13:28:45.643-04
log_level     | INFO
frame_start   | 2015-07-17 12:28:45.633
frame_end     | 2015-07-17 13:28:50.701-04
microbatch    |
message       | Generated tuples: test3|2|-2,test3|1|-2,test3|0|-2
exception     |
-[ RECORD 3 ]-+----------------
event_time    | 2016-07-17 14:28:50.701-04
log_level     | INFO
frame_start   | 2016-07-17 13:28:45.633
frame_end     | 2016-07-17 14:28:50.701-04
microbatch    |
message       | Total rows inserted: 0
exception     |

3 - stream_load_specs

This table describes user-created load specs.

This table describes user-created load specs. You change the entries in this table using the vkconfig utility's load spec tool.

Column Data Type Description
id INTEGER The identification number assigned to the cluster.
load_spec VARCHAR The name of the load spec.
filters VARCHAR A comma-separated list of UDFilters for the scheduler to include in the COPY statement it uses to load data from Kafka.
parser VARCHAR A Vertica UDParser to use with a specified target. If you are using a Vertica native parser, parser parameters serve as a COPY statement parameters.
parser_parameters VARCHAR A list of parameters to provide to the parser.
load_method VARCHAR

The COPY load method to use for all loads with this scheduler.

message_max_bytes INTEGER The maximum size, in bytes, of a message.
uds_kv_parameters VARCHAR A list of parameters that are supplied to the KafkaSource statement. If the value in this column is in the format key=value, the scheduler it to the COPY statement's KafkaSource call.

Examples

This example shows the load specs that you can use with a Vertica instance.


SELECT * FROM stream_config.stream_load_specs;
-[ RECORD 1 ]-----+------------
id                | 1
load_spec         | loadspec2
filters           |
parser            | KafkaParser
parser_parameters |
load_method       | direct
message_max_bytes | 1048576
uds_kv_parameters |
-[ RECORD 2 ]-----+------------
id                | 750001
load_spec         | streamspec1
filters           |
parser            | KafkaParser
parser_parameters |
load_method       | TRICKLE
message_max_bytes | 1048576
uds_kv_parameters |

4 - stream_lock

This table is locked by the scheduler.

This table is locked by the scheduler. This locks prevent multiple schedulers from running at the same time. The scheduler that locks this table updates it with its own information.

Column Data Type Description
scheduler_id INTEGER A unique ID for the scheduler instance that is currently running.
update_time TIMESTAMP The time the scheduler took ownership of writing to the schema.
process_info VARCHAR Information about the scheduler process. Currently unused.

Example

=> SELECT * FROM weblog_sched.stream_lock;
 scheduler_id |       update_time       | process_info
--------------+-------------------------+--------------
            2 | 2018-11-08 10:12:36.033 |
(1 row)

5 - stream_microbatch_history

This table contains a history of every microbatch executed within this scheduler configuration.

This table contains a history of every microbatch executed within this scheduler configuration.

Column Data Type Description
source_name VARCHAR The name of the source.
source_cluster VARCHAR The name of the source cluster. The clusters are defined in stream_clusters.
source_partition INTEGER The number of the data streaming partition.
start_offset INTEGER The starting offset of the microbatch.
end_offset INTEGER The ending offset of the microbatch.
end_reason VARCHAR

An explanation for why the batch ended.The following are valid end reasons:

  • DEADLINE - The batch ran out of time.

  • END_OFFSET - The load reached the ending offset specified in the KafkaSource. This reason is never used by the scheduler, as it does specify an end offset.

  • END_OF_STREAM - There are no messages available to the scheduler or the eof_timeout has been reached.

  • NETWORK_ERROR - The scheduler could not connect to Kafka.

  • RESET_OFFSET - The start offset was changed using the --update and --offset parameters to the KafkaSource. This state does not occur during normal scheduler operations.

  • SOURCE_ISSUE - The Kafka service returned an error.

  • UNKNOWN - The batch ended for an unknown reason.

end_reason_message VARCHAR If the end reason is a network or source issue, this column contains a brief description of the issue.
partition_bytes INTEGER The number of bytes transferred from a source partition to a Vertica target table.
partition_messages INTEGER The number of messages transferred from a source partition to a Vertica target table.
microbatch_id INTEGER The Vertica transaction id for the batch session.
microbatch VARCHAR The name of the microbatch.
target_schema VARCHAR The name of the target schema.
target_table VARCHAR The name of the target table.
timeslice INTERVAL The amount of time spent in the KafkaSource operator.
batch_start TIMESTAMP The time the batch executed.
batch_end TIMESTAMP The time the batch completed.
last_batch_duration INTERVAL The length of time required to run the complete COPY statement.
last_batch_parallelism INTEGER The number of parallel COPY statements generated to process the microbatch during the last frame.
microbatch_sub_id INTEGER The identifier for the COPY statement that processed the microbatch.
consecutive_error_count INTEGER (Currently not used.) The number of times a microbatch has encountered an error on an attempt to load. This value increases over multiple attempts.
transaction_id INTEGER The identifier for the transaction within the session.
frame_start TIMESTAMP The time the frame started. A frame can contain multiple microbatches.
frame_end TIMESTAMP The time the frame completed.

Examples

This example shows typical rows from the stream_microbatch_history table.


=> SELECT * FROM stream_config.stream_microbatch_history;

-[ RECORD 1 ]--+---------------------------
source_name             | streamsource1
source_cluster          | kafka-1
source_partition        | 0
start_offset            | 196
end_offset              | 196
end_reason              | END_OF_STREAM
partition_bytes         | 0
partition_messages      | 0
microbatch_id           | 1
microbatch              | mb_0
target_schema           | public
target_table            | kafka_flex_0
timeslice               | 00:00:09.892
batch_start             | 2016-07-28 11:31:25.854221
batch_end               | 2016-07-28 11:31:26.357942
last_batch_duration     | 00:00:00.379826
last_batch_parallelism  | 1
microbatch_sub_id       | 0
consecutive_error_count |
transaction_id          | 45035996275130064
frame_start             | 2016-07-28 11:31:25.751
frame_end               |
end_reason_message      |

-[ RECORD 2 ]--+---------------------------
source_name             | streamsource1
source_cluster          | kafka-1
source_partition        | 1
start_offset            | 197
end_offset              | 197
end_reason              | NETWORK_ISSUE
partition_bytes         | 0
partition_messages      | 0
microbatch_id           | 1
microbatch              | mb_0
target_schema           | public
target_table            | kafka_flex_0
timeslice               | 00:00:09.897
batch_start             | 2016-07-28 11:31:45.84898
batch_end               | 2016-07-28 11:31:46.253367
last_batch_duration     | 000:00:00.377796
last_batch_parallelism  | 1
microbatch_sub_id       | 0
consecutive_error_count |
transaction_id          | 45035996275130109
frame_start             | 2016-07-28 11:31:45.751
frame_end               |
end_reason_message      | Local: All brokers are down

6 - stream_microbatch_source_map

This table maps microbatches to the their associated sources.

This table maps microbatches to the their associated sources.

Column Data Type Description
microbatch INTEGER The identification number of the microbatch.
source INTEGER The identification number of the associated source.

Examples

This example shows typical rows from the stream_microbatch table.

SELECT * FROM stream_config.stream_microbatch_source_map;
microbatch | source
-----------+--------
         1 |      4
         3 |      2
(2 rows)

7 - stream_microbatches

This table contains configuration data related to microbatches.

This table contains configuration data related to microbatches.

Column Data Type Description
id INTEGER The identification number of the microbatch.
microbatch VARCHAR The name of the microbatch.
target INTEGER The identification number of the target associated with the microbatch.
load_spec INTEGER The identification number of the load spec associated with the microbatch.
target_columns VARCHAR The table columns associated with the microbatch.
rejection_schema VARCHAR The schema that contains the rejection table.
rejection_table VARCHAR The table where Vertica stores messages that are rejected by the database.
max_parallelism INTEGER The number of parallel COPY statements the scheduler uses to process the microbatch.
enabled BOOLEAN When TRUE, the microbatch is enabled for use.
consumer_group_id VARCHAR The name of the Kafka consumer group to report loading progress to. This value is NULL if the microbatch reports its progress to the default consumer group for the scheduler. See Monitoring Vertica message consumption with consumer groups for more information.

Examples

This example shows a row from a typical stream_microbatches table.

=> select * from weblog_sched.stream_microbatches;
-[ RECORD 1 ]-----+----------
id                | 750001
microbatch        | weberrors
target            | 750001
load_spec         | 2250001
target_columns    |
rejection_schema  |
rejection_table   |
max_parallelism   | 1
enabled           | t
consumer_group_id |
-[ RECORD 2 ]-----+----------
id                | 1
microbatch        | weblog
target            | 1
load_spec         | 1
target_columns    |
rejection_schema  |
rejection_table   |
max_parallelism   | 1
enabled           | t
consumer_group_id | weblog_group

8 - stream_scheduler

This table contains metadata related to a single scheduler.

This table contains metadata related to a single scheduler.

This table was renamed from kafka_config.kafka_scheduler. This table used to contain a column named eof_timeout_ms. It has been removed.

Column Data Type Description
version VARCHAR The version of the scheduler.
frame_duration INTERVAL The length of time of the frame. The default is 00:00:10.
resource_pool VARCHAR The resource pool associated with this scheduler.
config_refresh INTERVAL

The interval of time that the scheduler runs before applying any changes to its metadata, such as, changes made using the --update option.

For more information, refer to --config-refresh inScheduler tool options.

new_source_policy VARCHAR

When during the frame that the source runs. Set this value with the --new-source-policy in Source tool options.

Valid Values:

  • FAIR: Takes the average length of time from the previous batches and schedules itself appropriately.

  • START: Runs all new sources at the beginning of the frame. In this case, Vertica gives the minimal amount of time to run.

  • END: Runs all new sources starting at the end of the frame. In this case, Vertica gives the maximum amount of time to run.

Default:

FAIR

pushback_policy VARCHAR

(Not currently used.) How Vertica handles delays for microbatches that continually fail.

Valid Values:

  • FLAT

  • LINEAR

  • EXPONENTIAL

Default:

LINEAR

pushback_max_count INTEGER (Currently not used.) The maximum number of times a microbatch can fail before Vertica terminates it.
auto_sync BOOLEAN

When TRUE, the scheduler automatically synchronizes source information with host clusters. For more information, refer to Automatically consume data from Kafka with a scheduler.

Default:

TRUE

consumer_group_id VARCHAR The name of the Kafka consumer group to which the scheduler reports its progress in consuming messages. This value is NULL if the scheduler reports to the default consumer group named vertica-database_name. See Monitoring Vertica message consumption with consumer groups for more information.

Examples

This example shows a typical row in the stream_scheduler table.

=> SELECT * FROM weblog_sched.stream_scheduler;
-[ RECORD 1 ]------+-----------------------
version            | v9.2.1
frame_duration     | 00:05:00
resource_pool      | weblog_pool
config_refresh     | 00:05
new_source_policy  | FAIR
pushback_policy    | LINEAR
pushback_max_count | 5
auto_sync          | t
consumer_group_id  | vertica-consumer-group

9 - stream_scheduler_history

This table shows the history of launched scheduler instances.

This table shows the history of launched scheduler instances.

This table was renamed from kafka_config.kafka_scheduler_history.

Column Data Type Description
elected_leader_time TIMESTAMP The time when this instance took began scheduling operations.
host VARCHAR The host name of the machine running the scheduler instance.
launcher VARCHAR

The name of the currently active scheduler instance.

Default: NULL

scheduler_id INTEGER The identification number of the scheduler.
version VARCHAR The version of the scheduler.

Examples

This example shows typical rows from the stream_scheduler_history table.

 SELECT * FROM stream_config.stream_scheduler_history;
   elected_leader_time   |     host     |     launcher      | scheduler_id | version
-------------------------+--------------+-------------------+--------------+---------
 2016-07-26 13:19:42.692 | 10.20.100.62 |                   |            0 | v8.0.0
 2016-07-26 13:54:37.715 | 10.20.100.62 |                   |            1 | v8.0.0
 2016-07-26 13:56:06.785 | 10.20.100.62 |                   |            2 | v8.0.0
 2016-07-26 13:56:56.033 | 10.20.100.62 | SchedulerInstance |            3 | v8.0.0
 2016-07-26 15:51:20.513 | 10.20.100.62 | SchedulerInstance |            4 | v8.0.0
 2016-07-26 15:51:35.111 | 10.20.100.62 | SchedulerInstance |            5 | v8.0.0
    (6 rows)

10 - stream_sources

This table contains metadata related to data streaming sources.

This table contains metadata related to data streaming sources.

This table was formerly named kafka_config.kafka_scheduler.

Column Data Type Description
id INTEGER The identification number of the source
source VARCHAR The name of the source.
cluster INTEGER The identification number of the cluster associated with the source.
partitions INTEGER The number of partitions in the source.
enabled BOOLEAN When TRUE, the source is enabled for use.

Examples

This example shows a typical row from the stream_sources table.

select * from  stream_config.stream_sources;
-[ RECORD 1 ]--------------
   id         | 1
   source     | SourceFeed1
   cluster    | 1
   partitions | 1
   enabled    | t
-[ RECORD 2 ]--------------
   id         | 250001
   source     | SourceFeed2
   cluster    | 1
   partitions | 1
   enabled    | t

11 - stream_targets

This table contains the metadata for all Vertica target tables.

This table contains the metadata for all Vertica target tables.

The table was formerly named kafka_config.kafka_targets.

Column Data Type Description
id INTEGER The identification number of the target table
target_schema VARCHAR The name of the schema for the target table.
target_table VARCHAR The name of the target table.

Examples

This example shows typical rows from the stream_tables table.

=> SELECT * FROM stream_config.stream_targets;
-[ RECORD 1 ]-----+---------------------
id                | 1
target_schema     | public
target_table      | stream_flex1
-[ RECORD 2 ]-----+---------------------
id                | 2
target_schema     | public
target_table      | stream_flex2