Skip to main content

Celonis Product Documentation

Connect with SAP for data extraction

The SAP connection offers several options to configure the extractions. These options are accessible using the data connection settings.

Before you begin:

To extract data from SAP to Celonis Platform, make sure you've already installed the extraction client, the RFC module and set up the technical user with relevant permissions in your SAP instance. If this sounds like a lot, check Connecting to SAP.

Creating a data connection between SAP and the Celonis Platform

You can now create a data connection between SAP and the Celonis Platform from your data pool diagram:

  1. Click Data Connections.

  2. Click Add Data Connection and select Connect to Data Source.

  3. Select the source type based on this table:



    SAP ECC 5.0

    SAP ECC 4.6C

    Supported Version


    minimum SAP ECC 6 EHP 4

    only SAP ECC 5.0

    only SAP ECC 4.6c / 4.7

    Connection type

    (when creating new Data Connection)




    SAP 4.6C


    Available based on request.

    Required RFC module




    Celonis_RFC_Data Extraction_ECC4.6C

    Unsupported features


    • "Buffer chunks in memory for validation"

    • Changelog Extractions

    • Joins within the Extraction

    • connection via middleware (SAP PI/PO and Message Server)

    • advanced settings

  4. Configure the following connection details:

    • Name: The name assigned to this connection.

    • Host: Hostname of the system.

    • System number: A two-digit code, e.g. 00.

    • Client: A three-digit code, e.g. 100.

    • User: The user that was created in Step 1.

    • Compression type: Choose "Native Compression" if supported (recommended), otherwise GZIP, SAPCAR or uncompressed (not recommended).

    • Maximum parallel table extractions: Enter the number of tables that can be extracted in parallel.

  5. Click Test Connection and correct any issues highlighted.

    If you receive an error, check your connection details, then verify that your user is not locked and the SAP system is running. If a connection can be established, you will be redirected back to the connection overview and you will see a notification that the connection has been established.

  6. Click Save.

In most of the cases extractor will connect to SAP system directly. However, sometimes there is a middleware which mediates all connections between external services and SAP.


This option enables the connection via PI/PO (more information). Once it is selected, the "PI/PO Adapter" dropdown becomes available with two options: RFC or SOAP. Depending on the selected adapter, the standard or generic SOAP extractor should be used.


Select this option when the PI/PO uses RFC Adapters to connect to SAP. The standard on-premise SAP Extractor can be used with this option.

The following fields should be defined:

  • Gateway Host: The host of the PI/PO system to which extractor should connect.

  • Gateway Port: The port port of the PI/PO system to which extractor should connect.

  • Program ID: The program ID of the Celonis program in PI/PO.


Select this option when the PI/PO uses SOAP Adapters to connect to SAP. In this scenario, the Generic SOAP PI/PO Extractor, should be used rather than the standard SAP Extractor.Generic SOAP PI/PO Extractor for SAP


If the SOAP Adapters are used, the customer should also generate WSDL files which should becomes available be placed in a folder, preferably in the same directory as the Celonis Extractor.

The following fields become available:

  • Use TLS: Select this if you want to connect to the WSDL endpoints via https.

  • WSDL Files Directory: Enter the directory where the WSDL files have been coped (see the info above).

  • User: The PI/PO user for the authentication.

  • Password: The PI/PO user password.

Message Server

Enables connecting to an SAP server through Logon Groups (SAP Load Balancing). Using this approach the connection to a Message Server is established which is mapped to specific application servers. See Advanced SAP connection configuration for more information.Advanced SAP connection configuration

  • Use Change Logs: Enables the Real-Time Extraction via Change Logs (more information).

  • Include change type/timestamp in extracted data: Extends each table with a column about the change type (insert/update) and the respective change date.

  • Extract in SAP foreground process when 1 chunk or fewer: Small amounts of records are extracted via "direct call" to bypass the background job queues. This speeds up the extraction times.

  • Chunk size: The number of entries that are contained in one chunk (default: 50,000).

    It's possible to turn chunking off entirely by adding the following statement in the on-premise Extractor package: chunked: false in the file application-local.yml.

  • Number of rows to store in memory: Number of rows from the joined table to store in memory (default: 10,000). This number can be lowered in case of memory issues.

  • SAP Job Prefix: Defines the naming convention of SAP background jobs (default: "CEL_EX_") (more information).

  • Run on any SAP Server: If activated, the server on which the SAP background job should be run is not specified. SAP then decides the server to run it. By default the current application server is selected.

  • Buffer chunks in memory for validation (reduce the Chunk size when enabled): This option should only be enabled when there are issues with corrupt files as it slows down the extraction process.

    • Number of retries in case validation fails (default: 100)

    • Retry interval (seconds) (default: 30)

  • Extract Change Log data of the specified client only: When you enable this setting, the real-time extractions will only extract the data for the client that is defined in the connection. When you disable it, data from all clients is extracted.

    SAP systems are usually multi-client environments, where different clients are writing to the same database and tables. However, the real-time extension triggers are client independent, meaning that they capture changes by all clients and log them in the same Change Log table. This may create complications if two separate clients want to extract from the same system, and you can use this setting if you need to avoid that type of issue.

    It's also possible to enable or disable this setting by adding the following statement in the on-premise Extractor package: clientDependent: true (to enable) or clientDependent: false (to disable) in the file application-local.yml .

  • Disable auto deletion of old files in Z-CELONIS_TARGET: Disables the automated clean up of the "leftover" files from the Z_CELONIS_TARGET folder. This option should be selected if advised by your Celonis team.

  • Use non-chunked Change Log: When using the Change Logs for real time extraction, the data is read in a chunked manner by default to avoid deadlocks in the database. Select this option to override that setting and allow the data to be read without chunks. This option should be selected if advised by your Celonis team.

  • Use non-chunked Change Log cleanup: When cleaning up the data from the Change Logs, the data is cleaned up in chunks by default to avoid deadlocks in the database. Select this option to override that setting and allow the data cleanup to be done without chunks. This option should be selected if advised by your Celonis team.

  • Enable cold data extraction: Check this box to archive old data and free up working memory. The table is partitioned based on the age of the data, so the aged data is moved to the persistent memory and is not available unless it is explicitly invoked. Selecting this option makes the aged data available for extraction along with the current data.

  • Use SNC (SAP Secure Network Communications): Enables data encryption between the RFC module and the extractor via SNC (more information).Preparations in the SAP system

Executing transformations in parallel rather than sequentially, where they can potentially block the execution of other transformations, can accelerate data job executions and improve the predictability of their duration.

To achieve that, split a data job in different data jobs. That can start with assigning groups of transformations from one data job to separate ones (n transformations : 1 data job) and end with isolating single statements of a transformation in a separate data job (1 transformation : n data jobs).

Successively increasing the granularity like that and tracking data job execution performance while doing so allows for detecting and further isolating problematic transformations/statements step by step.

Note that this approach requires careful consideration of potential interdependencies between extractions/transformations: it can be necessary to schedule the execution of data jobs generated in that splitting process before others to maintain some sequence.

Due to that complication and compromised maintainability/transparency of your scripts as a consequence of creating parallel transformations, they should only be set up in problematic cases and are not the standard approach.