High availability and load balancing setup for on-prem JDBC extractor

Celonis Product Documentation

High availability and load balancing setup for on-prem JDBC extractor

Value proposition

High Availability and Load Balancing is a set-up option for the on-premise JDBC (Database) Extractor. It allows you to run multiple on-premise extractor instances connecting to the same database at the same time. It comes with the following benefits:

Automatically handle planned and unplanned downtime of your on-premise extractors.
Increased reliability and scalability through load balancing between the extractors and an automatic failover mechanism.
Expanded capacity for running more extractions concurrently. Handle expansions in the extracted data scope.

How to set it up

The setup of the single extractor instances is done in the same way as if only one extractor is set-up. The minimum extractor version is 2.71.

How do I set up an on-premise extractor?

In the application-local.yml configuration file, look at this section:

uplink:
    enabled: true 
    ur1: https://[team].[cluster].celonis.cloud/uplink/api/public/uplink 
    clientld: id 
    clientSecret: secret 
    extractorID: extractorid

connector:
    send—ping:
        enabled: true

url must be identical across all extractor instances.
clientId must be identical across all extractor instances.
clientSecret must be identical across all extractor instances.
extractorID must have an ID for every single extractor instance. If it is not assigned explicitly, the application will auto-generate an id for each new extractor. The id helps to find out which extractor a request was routed to.
connector: set send-ping: enabled to true.

Solution overview

With this solution, we provide the capabilities for an active-active load balancing and failover architecture for our on-premise JDBC Extractor.

Keep in mind that:

two or more extractors may be installed on separate servers, with each one being an identical and independent extractor instance.
automatic failover is in place. Extractor instances that are inactive will be removed from the load balancing and added back once they become active again.
each request from Celonis Platform to the extractor is load balanced between the active extractor instances.

High_Availability_and_Load_Balancing_Set-up_for_on-prem_JDBC_Extractor_02.png

Failover behavior

During non-extraction requests

You open an extraction configuration in the Celonis Platform UI, and the schema of a table is requested from the extractor. The extractor Instance goes inactive while servicing the request.

If the request to the extractor times out, it is automatically retried by resending the request to the next active extractor.

If this request also times out, the process repeats for a duration up to the timeout applied to the original UI request.

While extractions are running

If a data Job extraction is running, and the extractor instance running it goes inactive:

The Celonis Platform determines that the extractor instance running the extraction is inactive.
The Celonis Platform automatically restarts the extraction, sending a new “start extraction” request to the extractor instance.
The next available active extractor instance receives and starts running the extraction again.

Load balancing

“Active” and “inactive”

An extractor instance is automatically considered inactive if the Celonis Platform does not receive a regular “heartbeat” notification from it within a timeout period.

Scaling

Scaling is achieved by manually increasing the hardware resources available to each extractor instance, or manually installing additional extractor instances.

Extraction load balancing

Extractions are started with a single request, which contains the batch of tables (the Data Job Extraction) to extract. This means that with load balancing of these extraction requests, each batch of tables can be distributed between each active extractor instance.

For a single table, it is extracted from one extractor instance, meaning additional extractor instances do not speed up extractions per table. Instead, they expand capacity for running more extractions concurrently from a single Celonis Platform Data Connection.

Frequently asked questions about load balancing setup

Is there a limit to the number of extractors connected to the same uplink broker?

No, the number of extractors is not limited in any way.

When can I add new extractor instances to my setup?

You can add new instances at any time. They will be taken into account for an existing Data Connection automatically.

How do the extractor instances communicate with each other?

There is no direct communication between the single extractor instances. Communication happens solely through the Load Balancer in Celonis Platform.

When is an extractor instance set as inactive?

An extractor instance is automatically considered inactive if Celonis Platform does not receive a regular “heartbeat” notification from the extractor instance within a timeout period of 30 minutes.

On which level are the extractions distributed?

The distribution of the requests happens at an extraction level. One extraction can contain multiple tables and is always routed to the same extractor instance.

Was this helpful?

Would you like to provide feedback? Just click here to suggest edits.

Celonis Product Documentation

High availability and load balancing setup for on-prem JDBC extractor

Value proposition

How to set it up

Solution overview

Failover behavior

During non-extraction requests

While extractions are running

Load balancing

“Active” and “inactive”

Scaling

Extraction load balancing

Frequently asked questions about load balancing setup

Is there a limit to the number of extractors connected to the same uplink broker?

When can I add new extractor instances to my setup?

How do the extractor instances communicate with each other?

When is an extractor instance set as inactive?

On which level are the extractions distributed?

Search results