Step 3 - Create SAP connection in Celonis EMS
The SAP ECC connection offers several (advanced) options to configure the extractions.
The options are accessible via the data connection settings.
Configuring a data connection to SAP
To setup a connection to your SAP the following steps are required:
Create a new connection and choose the correct type based on the following table:
S/4HANA
SAP ECC
SAP ECC 5.0
SAP ECC 4.6C
Supported Version
All
minimum SAP ECC 6 EHP 4
only SAP ECC 5.0
only SAP ECC 4.6c / 4.7
Connection type
(when creating new Data Connection)
SAP
SAP
SAP
SAP 4.6C
Note: available based on request
Required RFC module
Celonis_RFC_Data_Extraction
Celonis_RFC_Data_Extraction
Celonis_RFC_Data_Extraction_ECC5
Celonis_RFC_Data Extraction_ECC4.6C
Unsupported features
-
"Buffer chunks in memory for validation"
Changelog Extractions
Joins within the Extraction
connection via middleware
(SAP PI/PO and Message Server)
advanced settings
Specify the connection details of your SAP system on the following page:
Name: The name that you would like this connection to have
Host: Hostname of the system
System number: a two-digit code, e.g. 00
Client: a three-digit code, e.g. 100
User: the user that was created in Step 1.D
Password: the password of the user
Compression type: choose Native Compression if supported (recommended), otherwise GZIP, SAPCAR or uncompressed (not recommended)
Maximum parallel table extractions: Enter the number of tables that can be extracted in parallel. See Parallel transformations.
Click Save.
If you receive an error here, check you connection details and verify that your user is not locked and the SAP system is running. If the connection can be established, you will be redirected back to the connection overview and you will see a notification that the connection has been established.
Middleware options
In most of the cases extractor will connect to SAP system directly. However, sometimes there is a middleware which mediates all connections between external services and SAP.
![]() |
SAP PI/PO
This option enables the connection via PI/PO (more information). When it is selected, the dropdown "PI/PO Adapter" with 2 options becomes available. Depending on the selected adapter, the standard or generic SOAP extractor should be used.
RFC
Select this option when the PI/PO uses RFC Adapters to connect to SAP. The standard on-premise SAP Extractor can be used with this option.
The following fields should be defined.
Gateway Host - the host of the PI/PO system to which extractor should connect
Gateway Port - the port port of the PI/PO system to which extractor should connect
Program ID - the program ID of the Celonis program in PI/PO
SOAP
Select this option when the PI/PO uses SOAP Adapters to connect to SAP. The Generic SOAP PI/PO Extractor, rather than the standard SAP Extractor, should be used for this scenario.
Note
In case SOAP Adapters are used, the customer should also generate WSDL files which then should be placed in a folder, preferably in the same directory as the Celonis Extractor.
The following fields become available.
Use TLS - select this in case you want to connect to the WSDL endpoints via https
WSDL Files Directory - enter the directory where the WSDL files have been coped (see the info above)
User - the PI/PO user for the authentication
Password - the PI/PO user password
Message Server
Enables connecting to an SAP server through Logon Groups .(SAP Load Balancing). With this approach the connection to a Message Server is established which is mapped to specific application servers (more information).
Change log settings
Use Change Logs - enables the Real-Time Extraction via Change Logs (more information)
Include change type/timestamp in extracted data - extends each table with a column about the change type (insert/update) and the respective change date.
Extract in SAP foreground process when 1 chunk or fewer - small amounts of records are extracted via "direct call" to bypass the background job queues. This speeds up the extraction times.
Advanced SAP data connection settings
Chunk size - The number of entries that are contained in one chunk (default: 50,000).
It's possible to turn chunking off entirely by adding the statement
chunked: false
in the fileapplication-local.yml
in the on-premise Extractor package.Number of rows to store in memory - This number can be lowered in case of memory issues (default: 10,000).
Number of rows from the joined table to store in memory (default: 10,000).
SAP Job Prefix - defines the naming convention of SAP background jobs (default: "CEL_EX_") (more information).
Run on any SAP Server- If activated, the server on which the SAP background job should be run is not specified. SAP then decides the server to run it. By default the current application server is selected.
Buffer chunks in memory for validation - This option should only be enabled when there are issues with corrupt files as it slows down the extraction process.
Number of retries in case validation fails (default: 100)
Retry interval (seconds) (default: 30)
Extract Change Log data of the specified client only - When you enable this setting, the real-time extractions will only extract the data for the client that is defined in the connection. When you disable it, data from all clients is extracted.
SAP systems are usually multi-client environments, where different clients are writing to the same database and tables. However, the real-time extension triggers are client independent, meaning that they capture changes by all clients and log them in the same Change Log table. This may create complications if two separate clients want to extract from the same system, and you can use this setting if you need to avoid that type of issue.
It's also possible to enable or disable this setting by adding the statement
clientDependent: true
(to enable) orclientDependent: false
(to disable) in the fileapplication-local.yml
in the on-premise Extractor package.Use SNC (SAP Secure Network Communications) - enables data encryption between the RFC module and the extractor via SNC (more information)
Parallel transformations
Executing transformations in parallel rather than sequentially, where they can potentially block the execution of other transformations, can accelerate data job executions and improve the predictability of their duration.
To achieve that, split a data job in different data jobs. That can start with assigning groups of transformations from one data job to separate ones (n transformations : 1 data job) and end with isolating single statements of a transformation in a separate data job (1 transformation : n data jobs).
Successively increasing the granularity like that and tracking data job execution performance while doing so allows for detecting and further isolating problematic transformations/statements step by step.
Note that this approach requires careful consideration of potential interdependencies between extractions/transformations: it can be necessary to schedule the execution of data jobs generated in that splitting process before others to maintain some sequence.
Due to that complication and compromised maintainability/transparency of your scripts as a consequence of creating parallel transformations, they should only be set up in problematic cases and are not the standard approach.