Skip to main content

Celonis Product Documentation

Extractions

Extractions can be created using the "New Extraction" button in the data job view.

Within the extraction you can add tables by clicking on the "Add" button above the table list. This will display all the available tables that the data connection provides.

You can also search for specific tables. The search for most connections is case-insensitive (exception: Database (JDBC)) and you can use percentage signs (%) as a wildcard, e.g. EK% matches all tables which start with EK.

If the extractor is new enough, you also have the option to bulk import tables using the "Import table list" option.

Table configuration

After you have added a table, you can change multiple settings depending on the data connection types . These are the most common options:

  • General options

    • Column subset: Specify which columns should be extracted by clicking on "Configure" next to the column count

    • Pseudonymized columns: In the same manner as the column subsets you can specify if and which columns should be pseudonymized during the extraction.

    • Join: You can add one or multiple join partners to the table. Each join partner can either be joined through the primary keys of the tables or through a custom join path. In order for the primary key join to work, the primary keys of the table to be joined need to be included in the primary keys of the base table. You can also add a filter for each joined table.

    • Primary key columns: You can override the primary keys of the source system to be used during delta loading.

  • Time Filter Options

    • Creation date filter: A filter on a data column which will be used in both full and delta loads. This filter will be combined with the "Filter Statement" under "Additional Filters" with an AND condition. So both conditions have to be met.

    • Change date filter: A filter on a date column which automatically looks for the maximum date in the existing table and sets a filter to only extract data newer than this maximum date. This filter will be combined with the "Delta Filter Statement" under "Additional Filters" with an AND condition.

  • Additional filters (please check the operator section below for details)

    • Filter: Specify in SQL syntax what rows will be extracted, e.g. COLUMN1 > 5

    • Delta Filter: Specify in SQL syntax what additional filters should be applied when the job is executed a delta load. This filter statement is combined with the normal filter with the logical AND operator

Changes to the table configuration are indicated by a "pencil" icon next to the table name. The changes can be applied by clicking on "Save" above the table list. Adding and removing tables is independent of this and it is automatically saved.

Available filter operators

The following operators are available:

  • >

  • <

  • =

  • IN + list of strings in parentheses

  • NOT IN + list of strings in parentheses

  • only for SAP ECC and JDBC data connections with an extractor version mid-May 2019 or later):

    • LIKE + string with percentage sign as wildcard

    • NOT LIKE + string with percentage sign as wildcard

Pseudonymization

If you select to pseudonymize a column in the extraction configuration, column values as displayed in the source system will be replaced with hashed values.

In contrast to anonymization, pseudonymization will replace the same values as displayed in the source system with the same hashed values, which enables consistent utilization of pseudonymized values in analyses.

It is not possible to reconstruct the original values.

Where? Pseudonymization happens during data extraction. Celonis requests source system data, then pseudonymizes that data upon receipt and eventually converts it into parquet format before providing it in the EMS.

How? Users can flexibly configure to use SHA-1, SHA-256, or SHA-512 for hashing (a fixed salt is used) in the Data Connection >> Advanced Settings.

Override the primary key columns

You can explicitly specify which columns should be used as primary key columns during a delta load. This is only necessary if the source system does not provide this information on its own. This could be the case if you extract views from a database, for example. Source systems like Salesforce, SAP or Oracle EBS provide this information so it is not necessary to use this feature.

Please note that overriding the primary key columns does not change how data is stored in the database. It is only used for delta loads and not saved.

Extraction Settings
Option to ignore metadata changes on table level

It can occur that the metadata changes for source tables, i.e. columns are added or removed. In order to preserve data integrity such changes in the source system table will cause the delta load to fail. A full load of the data would preserve the data integrity and at the same time allow columns to be added or removed.

There might be some use cases however, where tables change constantly or NULL values are not a big data integrity risk when analyzing the data. In those cases, you might simply want to nullify non-existent values. This is what the new option provides.

Note

The default remains to reject metadata changes. Moreover, only added and removed columns will be handled. Data type changes still have to be resolved manually or with a full load.

Existing data in the EMS

column_a (primary key)

column_b

a

1

b

2

c

3

Newly pushed data

column_a

column_c

a

7

b

8

d

11

As you can see, one new entry is added (d). However, column_b does not exist and column_c is new.

Merged data set with nullification

column_a

column_b

column_c

a

1

7

b

2

8

c

3

NULL

d

NULL

11

As you can see, the existing value c did not receive an update and therefore has a NULL value in column_c which did not exist before. The new entry with primary key d is NULL in column_b because it does not exist in the new data.

Select the option to ignore metadata changes in theExtraction Settings.

35555502.png
Default Option A: Standard (Recommended)

The delta load will be cancelled, when meta data in source system changes. Metadata changes will not be ignored. A full reload or manual changes of existing tables is necessary to clear the conflict

Option B: Ignore metadata changes

The delta laod will include meta data changes and run through with a warning. Only delta loaded rows will include data for the newly added columns (see above). Rows which are not part of the delta load will have the newly added columns nullified. This can lead to inconsistency in your data. A full load is recommended for data consistency.

Debug mode
35555498.png

The debug mode can be enabled via the Extraction Settings by a Team Admin. It will stay active for 3 days and automatically be disabled.

Once enabled detailed log information for the data extraction job will be displayed in the execution logs.

This allows for more transparency and easier troubleshooting.