Apache Kafka integration

Vertica provides a high-performance mechanism for integrating with Apache Kafka, an open-source distributed real-time streaming platform.

Vertica provides a high-performance mechanism for integrating with Apache Kafka, an open-source distributed real-time streaming platform. Because Vertica can both consume data from and produce data for Kafka, you can use Vertica as part of an automated analytics workflow: Vertica can retrieve data from Kafka, perform analytics on the data, and then send the results back to Kafka for consumption by other applications.

Prerequisites

Architecture overview

The Vertica and Kafka integration provides the following features:

  • A UDx library containing functions that load and parse data from Kafka topics into Vertica

  • A job scheduler that uses the UDL library to continuously consume data from Kafka with exactly-once semantics

  • Push-based notifiers that send data collector messages from Vertica to Kafka

  • A KafkaExport function that sends Vertica data to Kafka

Vertica as a Kafka consumer

A Kafka consumer reads messages written to Kafka by other data streams. Because Vertica can read messages from Kafka, you can store and analyze data from any application that sends data to Kafka without configuring each individual application to connect to Vertica. Vertica provides tools to automatically or manually consume data loads from Kafka.

Manual loads

Manually load a finite amount of data from Kafka by directly executing a COPY statement. This is useful if you want to analyze, test, or perform additional processing on a set of messages.

For more information, see Consuming data from Kafka.

Automatic loads

Automatically load data from Kafka with a job scheduler. A scheduler constantly loads data and ensures that each Kafka message is loaded exactly once.

You must install Java 8 on each Vertica node that runs the scheduler. For more information, see Automatically consume data from Kafka with the scheduler.

Vertica as a Kafka producer

A Kafka producer sends data to Kafka, which is then available to Kafka consumers for processing. You can send the following types of data to Kafka:

  • Vertica anayltics results. Use KafkaExport to export Vertica tables and queries.

  • Health and performance data from Data Collector tables. Create push-based notifiers to send this data for consumption for third-party monitoring tools.

  • Ad hoc messages. Use NOTIFY to signal that tasks such as stored procedures are complete.

For more information, see Producing data for Kafka.