Skip to main content

Celonis Product Documentation

High availability and load balancing setup for on-prem JDBC extractor
Value proposition

High Availability and Load Balancing is a set-up option for the on-premise JDBC (Database) Extractor. It allows you to run multiple on-premise extractor instances connecting to the same database at the same time. It comes with the following benefits:

  • Automatically handle planned and unplanned downtime of your on-premise extractors.

  • Increased reliability and scalability through load balancing between the extractors and an automatic failover mechanism.

  • Expanded capacity for running more extractions concurrently. Handle expansions in the extracted data scope.

How to set it up

The setup of the single extractor instances is done in the same way as if only one extractor is set-up. The minimum extractor version is 2.71.

In the application-local.yml configuration file, look at this section:

uplink:
    enabled: true 
    ur1: https://[team].[cluster].celonis.cloud/uplink/api/public/uplink 
    clientld: id 
    clientSecret: secret 
    extractorID: extractorid

connector:
    send—ping:
        enabled: true
  • url must be identical across all extractor instances.

  • clientId must be identical across all extractor instances.

  • clientSecret must be identical across all extractor instances.

  • extractorID must have an ID for every single extractor instance. If it is not assigned explicitly, the application will auto-generate an id for each new extractor. The id helps to find out which extractor a request was routed to.

  • connector: set send-ping: enabled to true.

Solution overview

With this solution, we provide the capabilities for an active-active load balancing and failover architecture for our on-premise JDBC Extractor.

Keep in mind that:

  • two or more extractors may be installed on separate servers, with each one being an identical and independent extractor instance.

  • automatic failover is in place. Extractor instances that are inactive will be removed from the load balancing and added back once they become active again.

  • each request from Celonis Platform to the extractor is load balanced between the active extractor instances.

High_Availability_and_Load_Balancing_Set-up_for_on-prem_JDBC_Extractor_02.png
Failover behavior
During non-extraction requests

You open an extraction configuration in the Celonis Platform UI, and the schema of a table is requested from the extractor. The extractor Instance goes inactive while servicing the request.

If the request to the extractor times out, it is automatically retried by resending the request to the next active extractor.

If this request also times out, the process repeats for a duration up to the timeout applied to the original UI request.

While extractions are running

If a data Job extraction is running, and the extractor instance running it goes inactive:

  1. The Celonis Platform determines that the extractor instance running the extraction is inactive.

  2. The Celonis Platform automatically restarts the extraction, sending a new “start extraction” request to the extractor instance.

  3. The next available active extractor instance receives and starts running the extraction again.

Load balancing
“Active” and “inactive”

An extractor instance is automatically considered inactive if the Celonis Platform does not receive a regular “heartbeat” notification from it within a timeout period.

Scaling

Scaling is achieved by manually increasing the hardware resources available to each extractor instance, or manually installing additional extractor instances.

Extraction load balancing

Extractions are started with a single request, which contains the batch of tables (the Data Job Extraction) to extract. This means that with load balancing of these extraction requests, each batch of tables can be distributed between each active extractor instance.

For a single table, it is extracted from one extractor instance, meaning additional extractor instances do not speed up extractions per table. Instead, they expand capacity for running more extractions concurrently from a single Celonis Platform Data Connection.

Frequently asked questions about load balancing setup

No, the number of extractors is not limited in any way.

You can add new instances at any time. They will be taken into account for an existing Data Connection automatically.

There is no direct communication between the single extractor instances. Communication happens solely through the Load Balancer in Celonis Platform.

An extractor instance is automatically considered inactive if Celonis Platform does not receive a regular “heartbeat” notification from the extractor instance within a timeout period of 30 minutes.

The distribution of the requests happens at an extraction level. One extraction can contain multiple tables and is always routed to the same extractor instance.