Skip to main content

Celonis Product Documentation

Installation and Prerequisites
Prerequisites - Kafka

Apache Kafka at least 2.3 version and a Kafka Connect cluster.

What is Kafka Connect?

Kafka Connect is a free, open-source component of Apache Kafka® to continuously import and export data as event streams to integrate Kafka with your existing systems. The Kafka Connect cluster is highly scalable and fault-tolerant to ensure continuous operations.

Read about Kafka Connect

Prerequisites - EMS

The following items are required before you get started:

  • Create the destination data pool

  • Generate the app key

  • Create the destination Vertica table (optional - if not already created, the first message will be used to infer the schema

Steps to follow in EMS

The following steps must be taken to use Celonis Kafka Connector with EMS.

  1. Log in to Admin & Settings

  2. Create an AppKey (in Admin & Settings → Applications)

  3. Create a data pool (in Data → Data Integration)

  4. Allow appkey access to data pool (… next to data pool → Permissions)

  5. Grab the data pool ID (in the browser’s URL bar on going into the data pool)

  6. Create a Vertica table which will be the EMS destination table (table DDL should match Kafka topic schema). If the table already exists, no creation will be attempted.

Manual Installation
  • Download the connector ZIP file. If you're running Kafka version 2.5 or lower, use the 2.12 archive otherwise, use the 2.13 one.

  • Extract the ZIP file contents and copy the contents to the desired location. For example, you can create a directory named /share/Kafka/plugins and copy the connector plugin contents.

  • Add this to the plugin path in your Connect worker properties file. Kafka Connect finds the plugins using its plugin path. A plugin path is a comma-separated list of directories defined in the Kafka Connect worker's configuration. This might already be set up, so there is nothing to do. For example:

plugin.path=/usr/local/share/kafka/plugins

  • Start the Kafka Connect workers with that configuration. Connect will discover all connectors defined within those plugins.

  • Repeat these steps for each machine where Connect is running. Each connector must be available to each worker.

Managed Services

If you are using Kafka or Kafka Connect from a managed service provider, follow the instructions from your service:

AWS MSK Connect

To create the connector, follow the steps provided here. The steps involved will require installing a custom connector, and for that, follow this link and use the connector release jar.

Confluent

Connector installation will be performed using Confluent Hub. Follow the instructions here to enable the Kafka Connect sink.

Azure Event Hub

If you're running Event Hub, you can leverage Kafka Connect and the EMS Sink plugin to load data into the EMS platform. The instructions to enable Kafka Connect for Event Hub can be found here. Once the installation is done, follow the manual steps to enable the connector before creating an instance of it.

Configure the connector

The reference configuration for the connector is available here. .

Here is a sample configuration for the connector:

name=kafka2ems
connector.class=com.celonis.kafka.connect.ems.sink.EmsSinkConnector
tasks.max=1
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
topics=payments 
connect.ems.endpoint=https://***.***.celonis.cloud/continuous-batch-processing/api/v1/***/items
connect.ems.target.table=payments
connect.ems.connection.id=****
connect.ems.commit.size.bytes=10000000
connect.ems.commit.records=100000
connect.ems.commit.interval.ms=30000
connect.ems.tmp.dir=/tmp/ems
connect.ems.authorization.key="AppKey ***"
connect.ems.error.policy=RETRY
connect.ems.max.retries=20
connect.ems.retry.interval=60000
connect.ems.parquet.write.flush.records=1000
connect.ems.debug.keep.parquet.files=false

Examples here.

Verify the installation
Run the Connector

To start Kafka Connect standalone with the above setup, provided that Kafka Connect is already installed, you would run:

$ connect-standalone connect-avro-standalone.properties ems-plugin.properties

Verify data ingestion into EMS using Transformation SQL Editor

The following is required to verify the data ingestion:

  1. In Data → Data Integration → In your data pool

  2. Create a new data job

  3. Inside the data job create a new transformation and save it

  4. Within “Transformation Editor” type SELECT * FROM <table_name>

  5. Highlight and Execute

Upgrade from the previous version

With the following procedure, you can upgrade the connector plugin by taking a short outage:

  1. Download the new connector plugin.

  2. Stop all Kafka Connect workers.

  3. Remove the old connector plugin from the plugin path or classpath.

  4. Install the new connector plugin using the connector plugin installation instructions.

  5. Start up the workers.

  6. Start up the connector (if using distributed mode).