Skip to main content

Celonis Product Documentation

Extractions

Extractions can be created using the New Extraction button in the Data Job view. Within the Extraction, you can add tables by clicking the Add button above the table list. This will display all the available tables that the Data Connection provides.

You can also search for specific tables. The search for most connections is case-insensitive (exception: Database (JDBC)) and you can use percentage signs (%) as a wildcard, such as EK% matches all tables which start with EK.

Table Configuration

After you add a table to an Extraction, you can change multiple settings depending on the data connection type. These are the most common options:

  • General options

    • Column subset: Specify which columns should be extracted by clicking Configure next to the column count.

    • Pseudonymized columns: In the same manner as the column subsets you can specify if and which columns should be pseudonymized during the extraction.

    • Join: You can add one or multiple join partners to the table. Each join partner can either be joined through the primary keys of the tables or through a custom join path. In order for the primary key join to work, the primary keys of the table to be joined need to be included in the primary keys of the base table. You can also add a filter for each joined table.

    • Primary key columns: You can override the primary keys of the source system to be used during delta loading.

  • Time Filter Options

    • Creation date filter: A filter on a data column that will be used in both full and delta loads. This filter will be combined with the "Filter Statement" under "Additional Filters" with an AND condition. So both conditions have to be met.

    • Change date filter: A filter on a date column that automatically looks for the maximum date in the existing table and sets a filter to only extract data newer than this maximum date. This filter will be combined with the "Delta Filter Statement" under "Additional Filters" with an AND condition.

  • Additional filters (please check the operator section below for details)

    • Filter: Specify in SQL syntax what rows will be extracted, e.g. COLUMN1 > 5.

    • Delta Filter: Specify in SQL syntax what additional filters should be applied when the job is executed a delta load. This filter statement is combined with the normal filter with the logical AND operator.

Changes to the table configuration are indicated by a "pencil" icon next to the table name. The changes can be applied by clicking Save above the table list. Adding and removing tables is independent of this and it is automatically saved.

Available filter operators

The following operators are available:

  • >

  • <

  • =

  • IN + list of strings in parentheses

  • NOT IN + list of strings in parentheses

  • only for SAP ECC and JDBC data connections with an extractor version mid-May 2019 or later):

    • LIKE + string with percentage sign as wildcard

    • NOT LIKE + string with percentage sign as wildcard

Pseudonymization

If you select to pseudonymize a column in the extraction configuration, column values as displayed in the source system will be replaced with hashed values.

In contrast to anonymization, pseudonymization will replace the same values as displayed in the source system with the same hashed values, which enables consistent utilization of pseudonymized values in analyses. It is not possible to reconstruct the original values.

Where? Pseudonymization happens during data extraction. Celonis requests source system data, then pseudonymizes that data upon receipt and eventually converts it into parquet format before providing it in the Celonis Platform.

How? Users can flexibly configure to use SHA-1, SHA-256, or SHA-512 for hashing (a fixed salt is used) in Data Connection > Advanced Settings.

Note

We are using a hardcoded salt which cannot be modified by the user. It is not available due to data security reasons. The pseudonymization algorithm (sha-1, sha-256, sha-512) can be modified by the Extractor Connection.

The correction function to use is SELECT SHA1 (CONCAT('SO_ANO', TABLE.COLUMN)).

Override the primary key columns

You can explicitly specify which columns should be used as primary key columns during a delta load. This is only necessary if the source system does not provide this information on its own. This could be the case if you extract views from a database, for example. Source systems like Salesforce, SAP or Oracle EBS provide this information so it is not necessary to use this feature.

Please note that overriding the primary key columns does not change how data is stored in the database. It is only used for delta loads and not saved.

Extraction Settings
Option to ignore metadata changes on table level

It can occur that the metadata changes for source tables, i.e. columns are added or removed. In order to preserve data integrity such changes in the source system table will cause the delta load to fail. A full load of the data would preserve the data integrity and at the same time allow columns to be added or removed.

There might be some use cases however, where tables change constantly or NULL values are not a big data integrity risk when analyzing the data. In those cases, you might simply want to nullify non-existent values. This is what the new option provides.

Note

The default remains to reject metadata changes. Moreover, only added and removed columns will be handled. Data type changes still have to be resolved manually or with a full load.

Existing data in the Celonis Platform

column_a (primary key)

column_b

a

1

b

2

c

3

Newly pushed data

column_a

column_c

a

7

b

8

d

11

As you can see, one new entry is added (d). However, column_b does not exist and column_c is new.

Merged data set with nullification

column_a

column_b

column_c

a

1

7

b

2

8

c

3

NULL

d

NULL

11

As you can see, the existing value "c" did not receive an update and therefore has a NULL value in column_c which did not exist before. The new entry with primary key "d" is NULL in column_b because it does not exist in the new data.

Select the Ignore metadata changes option in the Extraction Settings.

35555502.png
Default Option A: Standard (Recommended)

The delta load will be canceled when metadata in the source system changes. Metadata changes will not be ignored. A full reload or manual changes of existing tables is necessary to clear the conflict.

Option B: Ignore metadata changes

The delta load will include metadata changes and run through with a warning. Only delta-loaded rows will include data for the newly added columns (see above). Rows that are not part of the delta load will have the newly added columns nullified. This can lead to inconsistency in your data. A full load is recommended for data consistency.

Debug mode
35555498.png

The debug mode can be enabled via the Extraction Settings by a Team Admin. It will stay active for 3 days and is then automatically disabled.

Once enabled detailed log information for the data extraction job will be displayed in the execution logs. This allows for more transparency and easier troubleshooting.