Skip to main content

Celonis Product Documentation

Pipeline End-to-End Overview
End-to-End Flow

Celonis has developed a dedicated SAP Extractor to ensure a continuous data pipeline between EMS Cloud and the source SAP system. After the pipeline components are set up and the connection is established, users can schedule extraction jobs and fetch the data from SAP tables into EMS.

These are the components involved in the flow.

Celonis Execution Management System (EMS) is where the user defines the extraction job, i.e. which tables should be extracted, what kind of filters should be applied, extraction schedule, etc.

Extractor is a middle-men between EMS and the SAP system. Its role is to poll and fetch the Job requests from Celonis EMS, and then submit the execution information to Celonis RFC Module in SAP. Once the data is retrieved from SAP, Extractor fetches it from SAP and sends it back to EMS. It is installed in the customer network on a dedicated server.

Celonis RFC Module is responsible for extracting data from the SAP database. It gets the job metadata from the Extractor, i.e. which table, which columns, filters, etc, and then generates a background job in SAP. The job extracts the data and writes it in csv files in a directory that is monitored by the Extractor. This package is imported into SAP and contains 12 functions.

The diagram below shows the components involved and their interaction, and the next sections explain each component in more details.

  1. The user triggers an extraction in Celonis Cloud

  2. The extraction request is published to a message queue, which is polled in real-time by the on-premise Extractor. (Note: the indirect connection option has been implemented because of security reasons, to avoid inbound network calls)

  3. Extractor "translates" the extraction requests, and sends it to the RFC Module in SAP

  4. RFC Module reads the data from the database according to the column and filter definitions

  5. RFC Module writes the data to csv files on the network shared drive in chunks (by default 50k rows in each file). The data is pseudonymized at this point, before the data is written to files.

  6. The Extractor synchronously polls the folder, and fetches the files as they arrive

  7. Extractor converts the csv files to parquet

  8. Extractor pushes the parquet files to Celonis Cloud using the Data Push API.

The Extractor service

This component is responsible for communication between the Data Integration and RFC Module. It is a Java application that is installed on the customer premise. It polls EMS for extraction requests, and then makes RFC calls to the RFC Module to trigger the extraction process. Once the files arrive, they are uploaded to EMS.