Skip to main content

Celonis Product Documentation

Delta transformations in data jobs

For a stable and optimized data pipeline, we recommend executing delta transformations in your data jobs. Delta transformations process only the new or updated records from your data, rather than the full load each time. When configuring delta transformations, you need to set the primary keys for each table, allowing a MERGE to be performed.

For more information about setting primary keys, see: Executing delta data model loads.

To configure delta transformations in the data model, use the following information:

To be able to process the delta transformation, you need to isolate it. This is done by pushing the extracted delta into a separate staging table, for example if you are extracting the BKPF, the new delta is pushed not only to the BKPF, but also in a staging table called _CELONIS_TMP_BKPF_NEW_DATA. This enables you to access this data and use it for transformations.

After the extraction is complete, transformations should access the delta from the staging table. This is best peformed using a placeholder, so in the SQL instead of referencing to BKPF, the select is done against <$BKPF>. Depending on the execution type - delta vs full - this placeholder is then dynamically translated into a table name.

  • Delta executions: The placeholder is replaced by _CELONIS_TMP_BKPF_NEW_DATA, so the query becomes SELECT * FROM _CELONIS_TMP_BKPF_NEW_DATA.

  • Full executions: The placeholder is replaced by BKPF, so the query becomes SELECT * FROM BKPF.

The delta should be isolated only for the tables which are a source for transformations, i.e. against which a SELECT statement is written. So, if you're extracting 4 tables, but only BKPF acts as source for select, then you need to capture the delta for this table only.

To reduce the manual steps, this step, i.e. identifying which tables to generate a staging table for, has been automated. Before running the extraction, the “SQL parsers” scans all the SQL scripts in the given Data Job. Whenever it spots placeholder, e.g. &lt;$BKPF&gt;, it also marks this table as eligible for generating a staging table for. If you have 10 tables in the extraction, but only one SELECT statement against a placeholder, i.e. &lt;$BKPF&gt;, then only one staging table will be generated - _CELONIS_STAGING_BKPF.

A diagram displaying this process is below:

delta_transformation_diagram.png

After a successful run, the staging table is cleaned up. This ensures that the during the next execution, the same records aren't reprocessed. However, it is important to note that not all records are cleaned up. For referential integrity, some are reprocessed twice. This is to address dependencies between tables, i.e. when there is a JOIN between them.

In case a record in a dependent table arrives late, it is beneficial to process this twice so that the JOIN works properly. One example is VBAK and VBAP. When building the case table based on VBAP, information is fetched from VBAK via a join. A VBAP record is extracted, but its header record might be missing due to timing issues. In this case, the VBAP record is reprocessed during the next delta run to ensure that VBAK has already been extracted.

Delta transformations require that the transformed data is persisted and that the incoming data is merged rather than dropped and recreated. As such, the SQL scripts you write should implement these requirements.

For example, if you are doing some transformations based on BSID, then you should create a target table which will store those calculations.

DROP TABLE IF EXISTS BSID_CALCULATED;  CREATE TABLE BSID_CALCULATED(
MANDT varchar(12),
BUKRS varchar(16),
BELNR varchar(40),
GJAHR varchar(16),
BUZEI varchar(12),
WRBTR_CONVERTED float,
SKFBT_CONVERTED float,
WSKTO_CONVERTED float,
BUKRS_TEXT varchar(100),
ZTERM_MD varchar(16),
ZTAGG_MD varchar(8),
ZDART_MD varchar(4),
ZFAEL_MD varchar(8),
ZMONA_MD varchar(8),
ZTAG1_MD varchar(12),
ZPRZ1_MD float,
ZTAG2_MD varchar(12),
ZPRZ2_MD float,
ZTAG3_MD varchar(12));

In this example, a calculation table has been created for BSID. This is where just the transformed fields and the primary keys are then stored, rather than the whole BSID to reduce the impact on the APC. Primary Keys are then used to unify the BSID and the transformed values in a view.

As a next step the incoming delta data should be merged to this persisted table. One way to merge is to simply remove the records from the transformed table, and then insert them again. Here is an example on how to implement the deletion for the table BSID.

DELETE FROM "BSID_CALCULATED"
WHERE EXISTS (SELECT 1 FROM <$BSID> as NEW_DATA WHERE AR_BSID_STAGING.MANDT = NEW_DATA.MANDT AND AR_BSID_STAGING.BUKRS = NEW_DATA.BUKRS AND AR_BSID_STAGING.BELNR = NEW_DATA.BELNR AND AR_BSID_STAGING.GJAHR = NEW_DATA.GJAHR AND AR_BSID_STAGING.BUZEI = NEW_DATA.BUZEI );

For this, all the records from the BSID Staging (via placeholder) are selected and the matchs from the target table are removed, i.e. BSID_CALCULATED.

These calculations are then inserted into BSID_CALCULATED.

INSERT INTO "BSID_CALCULATED"
SELECT BSID.MANDT ,BSID.BUKRS ,BSID.BELNR ,BSID.GJAHR ,BSID.BUZEI ……… some query ……… FROM <$BSID>
……… some more joins and filters ………

To use the delta transformations you need to include the extraction and transformation tasks in the same data job. This ensures that the extracted delta data will be made available only for the transformations that run inside the same data jobs as the extractions.

By doing this, you ensure that:

  • You automatically identify which table should be extracted to a staging table. The SQL parser scans all the scripts in the data job to accomplish this.

  • You clean up all the staging tables once the data job has successfully run.