Apache Kafka integration

OpenText™ Analytics Database provides a high-performance mechanism for integrating with Apache Kafka, an open-source distributed real-time streaming platform.

OpenText™ Analytics Database provides a high-performance mechanism for integrating with Apache Kafka, an open-source distributed real-time streaming platform. Because the database can both consume data from and produce data for Kafka, you can use the database as part of an automated analytics workflow: The database can retrieve data from Kafka, perform analytics on the data and then send the results back to Kafka for consumption by other applications.

Prerequisites

Architecture overview

The database and Kafka integration provides the following features:

  • A UDx library containing functions that load and parse data from Kafka topics into the database

  • A job scheduler that uses the UDL library to continuously consume data from Kafka with exactly-once semantics

  • Push-based notifiers that send data collector messages from the database to Kafka

  • A KafkaExport function that sends data to Kafka

OpenText™ Analytics Database as a Kafka consumer

A Kafka consumer reads messages written to Kafka by other data streams. Because the database can read messages from Kafka, you can store and analyze data from any application that sends data to Kafka without configuring each individual application to connect to the database. The database provides tools to automatically or manually consume data loads from Kafka.

Manual loads

Manually load a finite amount of data from Kafka by directly executing a COPY statement. This is useful if you want to analyze, test, or perform additional processing on a set of messages.

For more information, see Consuming data from Kafka.

Automatic loads

Automatically load data from Kafka with a job scheduler. A scheduler constantly loads data and ensures that each Kafka message is loaded exactly once.

You must install Java 8 on each database node that runs the scheduler. For more information, see Automatically consume data from Kafka with a scheduler.

OpenText™ Analytics Database as a Kafka producer

A Kafka producer sends data to Kafka, which is then available to Kafka consumers for processing. You can send the following types of data to Kafka:

  • OpenText™ Analytics Database anayltics results. Use KafkaExport to export database tables and queries.

  • Health and performance data from Data Collector tables. Create push-based notifiers to send this data for consumption for third-party monitoring tools.

  • Ad hoc messages. Use NOTIFY to signal that tasks such as stored procedures are complete.

For more information, see Producing data for Kafka.