Skip to main content

Celonis Product Documentation

Extracting substrings applying pseudonymisation


This feature is available only for selected teams. Reach out to the support team if you need it to be activated for your team.

In older versions of the RFC Module you can apply anonymisation only to the whole column. This usually works fine, because the anonymisation is applied to the "username", which is always a separate column in the table.

However, in some specific scenarios, the username is a part of a composite column, and is basically concatenated with other values. If you anonymise the whole column, then you cannot carve the username out of it.

To support this use case, we have added the concept of calculated columns (supported in RFC Version 1.8.4 and Extractor 2020-07-15) . This allows you to apply the function SUBSTRING to a column, and then anonymise the returned value. So if in the db you have a value like this - "XXXXX_username_YYYYY", applying the function substring to it will isolate the value username, and then anonymise it before extracting to the cloud.

Below are the steps how to set this up:

  • In Data Integration, navigate to the Data Job=>table where the column is stored. Click on the button "Configure calculated column".

  • In the pop up add a new calculated column and define its name and the formula. Currently only the formula ANON_SUBSTRING is supported.

    It accepts three parameters:

    • Column name - this is the source column name that you want the function to be applied to

    • Starting character - the index of the starting character. Index of the first character is 1

    • Length - how many characters should be included

For example, ANON_SUBSTRING(ABCD,2,2) will be applied to 'BC'.

In the screenshot below, the 1st character of the field USR02.ANAME is anonymised.

  • Make sure that the source column is also included in the extraction and marked to be anonymised. This is a technical pre-requisite, so please include the column even if you don't need it.

  • Run the extraction as usually. The calculated column will be available in the table as the other standard columns, i.e. USR02.COL_SUBST