Skip to main content

Celonis Product Documentation

Pipeline End-to-End Overview

The new on-prem client

We are deprecating the uplink-based on-premise extractor and replacing it with the on-prem clients. For more information, see On-prem client replacing Extractor.

End-to-End Flow

Celonis has developed a dedicated SAP Extractor to ensure a continuous data pipeline between Celonis Platform Cloud and the source SAP system. After the pipeline components are set up and the connection is established, users can schedule extraction jobs and fetch the data from SAP tables into Celonis Platform.

These are the components involved in the flow.

Celonis Platform is where the user defines the extraction job, i.e. which tables should be extracted, what kind of filters should be applied, extraction schedule, etc.

Extractor is a middle-men between Celonis Platform and the SAP system. Its role is to poll and fetch the Job requests from Celonis Platform, and then submit the execution information to Celonis RFC Module in SAP. Once the data is retrieved from SAP, Extractor fetches it from SAP and sends it back to Celonis Platform. It is installed in the customer network on a dedicated server.

Celonis RFC Module is responsible for extracting data from the SAP database. It gets the job metadata from the Extractor, i.e. which table, which columns, filters, etc, and then generates a background job in SAP. The job extracts the data and writes it in csv files in a directory that is monitored by the Extractor. This package is imported into SAP and contains 12 functions.

The diagram below shows the components involved and their interaction, and the next sections explain each component in more details.

  1. The extraction request is sent by Celonis Cloud.

  2. The extraction request is published to a message queue, which is polled in real-time by the OPC. The indirect connection option has been implemented because of security reasons, to avoid inbound network calls).

    OPC passes the request to the RFC module which reads the data from the database according to the column and filter definitions.

  3. RFC module writes the data to CSV files on the network shared drive in chunks (by default 50k rows in each file). The data can be optionally pseudonymized before being written to files.

  4. The cloud Extractor polls the network shared drive and sends fetch requests to the RFC module.

  5. RFC module receives the fetch request and reads the data from the network shared drive.

  6. OPC uploads the fetched files to the Celonis Platform.

  7. The cloud Extractor converts CSV files to parquet.

  8. Extractor pushes the parquet files to Celonis Datalake.