Troubleshooting data extraction and pre-processing

Here's how to troubleshoot object-centric process mining issues during the stage when you're extracting your source system data and preparing it to be used to create objects and events. The main documentation for this stage is at Quickstart: Extract and transform your data into objects and events.

Troubleshooting object-centric process mining lists all the troubleshooting topics to help you create object-centric data models.

I have already extracted data into an existing Data Pool that is not the OCPM Data Pool

If you were already a Celonis user before object-centric process mining was available, you most likely have data extractions already set up for your existing use cases. You can make the data in an existing Data Pool available in the OCPM Data Pool as views, by exporting the Data Connection from the source Data Pool, then importing it to the OCPM Data Pool. See: Sharing data between data pools.

You might want to maintain case-centric process mining and object-centric process mining in parallel for the same data (for example, during a migration and cut-over phase). If you do, we advise using a single Data Pool containing all your Data Connections and extractions, which serves multiple Data Pools containing your transformations and Data Models or perspectives, including the OCPM Data Pool and your case-centric Data Pools. Export the data from the extraction Data Pool and import it to the Data Pools where transformations are carried out, including the OCPM Data Pool. If you’re not already doing this, the move to object-centric process mining is a good time to consider setting this up. It reduces data duplication, so it reduces Analytics Processing Capacity (APC) usage and load on the source systems, and it reduces the complexity of any subsequent steps considerably.

To import data from an existing Data Pool to the OCPM Data Pool, follow these steps:

In your existing Data Pool, click on the context menu for the Data Connection (the three vertical dots), and select Share Data with other Data Pools.
In the Share Data with other Data Pools window, select the OCPM Data Pool as the target, and share all the tables in the Data Connection.
Go to the OCPM Data Pool, create a new Data Connection, and select Import data from another Data Pool. Select the Data Connection you have just shared and click Synchronize. The views are automatically created in the OCPM Data Pool.
Tip
You can’t export a Data Connection from the OCPM Data Pool and share it with another Data Pool, so this procedure doesn’t work the other way around.

Because you are importing the data to the OCPM Data Pool as views:

Changes to the data in the tables that you already shared from your existing Data Pool (for example, new records being extracted) are automatically reflected in the views in the OCPM Data Pool.
You’ll need to synchronize the import again if the structure of the shared tables changes, or if you add or remove tables from the Data Connection in the existing Data Pool. To pick up the changes, select the Data Connection in the OCPM Data Pool and click Synchronize.
Any SQL statements you run against the imported views in the OCPM Data Pool do not impact the tables in the existing Data Pool.
Your APC isn’t impacted by the views, which are essentially just stored SQL queries.

I want to pre-process my raw data before transforming it into objects and events

You might want to pre-process the data from your source system, for example to filter it, convert one data type to another, or rename, modify, or add columns. However, if you are planning to use the Celonis object types and event types from the processes in our catalog, which are supplied with transformations, we recommend that you provide tables with the standard table names used in the source system. If you use alternative table names for pre-processed data (for example, if you create a temporary table called TMP_BSEG to supply data in place of the BSEG table), you’ll need to put in additional effort to create and maintain custom overwrites for the supplied transformations for objects and events.

To pre-process the data while keeping the standard table names, we recommend that you create a new pre-processing scope in the OCPM Data Pool. Here’s an overview of what you’ll need to do:

Set up a dummy Data Connection in the OCPM Data Pool to create a pre-processing scope there.
Create transformations in a data job in the global scope, which has access to both the source data and the pre-processing scope. In the transformations, create new views in the pre-processing scope with the same names as each table or view in the source data.
Carry out your pre-processing in the SQL code that you use to create the new views.
Deploy the Celonis-supplied transformations for objects and events to the pre-processing scope, so that they operate on your pre-processed data.

Using views for pre-processing means that the data remains dynamic (because the views always return the latest data in the source table), so the pre-processing data job takes very little time to run, and it won’t impact your APC. A disadvantage is that depending on the complexity of the pre-processing, the use of views can be detrimental to the performance of the transformations running on the pre-processed data. If you experience performance issues in the OCPM data job (ocpm-data-job) with this setup, try using tables instead of views in the pre-processing scope (and don’t forget to use ANALYZE STATISTICS for the created tables).

To set up the dummy Data Connection, follow these steps:

In Data Integration or from the Objects & Events dashboard, navigate to the OCPM Data Pool.
Select Data Connections.
Click Add Data Connection.
Select Connect to Data Source. The following steps work for SAP or Database - select whichever matches your context best.
Select an uplink (the status doesn’t matter here, as the uplink doesn’t have to be live) and click Next.
Give your connection a name - we’ve used “Pre-Processing”.
Insert dummy values for the remaining mandatory fields, as we’ve done in this example.
Click Save.
When you see the connection test error message ("Connection test failed, please check the Data Connector's configuration...") ,click Save Anyway. This isn’t a live connection.
Check the connection overview for the OCPM Data Pool, and verify that the dummy Data Connection is present.

To create the transformations that pre-process the data, follow these steps:

Create a new data job in the global scope of the OCPM Data Pool. You need to use the global scope because it is the only place with access to both the source data and the pre-processing scope.

For every view that you need in the pre-processing scope, create a transformation in the pre-processing data job containing SQL code based on the following template:

DROP VIEW IF EXISTS <%=DATASOURCE:PREPROCESSING_SCOPE%>."TABLE";
CREATE OR REPLACE VIEW <%=DATASOURCE:PREPROCESSING_SCOPE%>."TABLE" AS (	
    SELECT *
    FROM <%=DATASOURCE:SOURCE_SCOPE%>."TABLE"
);

Give your views the same names as the tables in the source data. For example:

DROP VIEW IF EXISTS <%=DATASOURCE:PRE-PROCESSING%>."CDHDR";
CREATE OR REPLACE VIEW <%=DATASOURCE:PRE-PROCESSING%>."CDHDR" AS (
    SELECT *
    FROM <%=DATASOURCE:SAP_ECC%>."CDHDR"
);

Add any pre-processing that you want to do for the data in each table, in the SELECT statement for the corresponding view, so that the view in the pre-processing scope will contain the pre-processed data. Here are some examples of pre-processing for common troubleshooting situations. You might also want to combine the different templates below.
- Rename a table to the name expected by the Celonis transformations (see The table names or column data types in my extracted data are different to the SAP standard (for example, because I am extracting it from a Data Lake)).
```
DROP VIEW IF EXISTS <%=DATASOURCE:PREPROCESSING_SCOPE%>."NEW_NAME";
CREATE OR REPLACE VIEW <%=DATASOURCE:PREPROCESSING_SCOPE%>."NEW_NAME" AS (
    SELECT *
    FROM <%=DATASOURCE:SOURCE_SCOPE%>."OLD_NAME"
);
```
- Convert a column data type to the type expected by the Celonis transformations, using CAST (see The table names or column data types in my extracted data are different to the SAP standard (for example, because I am extracting it from a Data Lake)).
```
DROP VIEW IF EXISTS <%=DATASOURCE:PREPROCESSING_SCOPE%>."BKPF";
CREATE OR REPLACE VIEW <%=DATASOURCE:PREPROCESSING_SCOPE%>."BKPF" AS (
    SELECT
        MANDT,
        CAST(BUKRS AS VARCHAR),
        BELNR,
        GJAHR,
        ...
    FROM <%=DATASOURCE:SOURCE_SCOPE%>."BKPF"
);
```
- Replace NULL values for a primary key (here, the accounting document number BKPF-BELNR) with an empty string, using the COALESCE() function. Don’t use ISNULL or IFNULL, as these functions are not supported for object-centric transformations. (See I have NULL values in ID attributes.)
```
DROP VIEW IF EXISTS <%=DATASOURCE:PREPROCESSING_SCOPE%>."BKPF";
CREATE OR REPLACE VIEW <%=DATASOURCE:PREPROCESSING_SCOPE%>."BKPF" AS (
    SELECT
        MANDT,
        BUKRS,
        COALESCE(BELNR,''),
        GJAHR,
        ...
    FROM <%=DATASOURCE:SOURCE_SCOPE%>."BKPF"
);
```
- Include only data starting with the year 2023 in a view (see I only want to create objects and events for a limited data scope, for example only one year of data).
```
DROP VIEW IF EXISTS <%=DATASOURCE:PREPROCESSING_SCOPE%>."BKPF";
CREATE OR REPLACE VIEW <%=DATASOURCE:PREPROCESSING_SCOPE%>."BKPF" AS (
    SELECT *
    FROM <%=DATASOURCE:SOURCE_SCOPE%>."BKPF"
    WHERE GJAHR >= '2023'
);
```
When you’ve created all the transformations, run the pre-processing data job to create the views in the pre-processing scope in the OCPM Data Pool, and verify that they contain the required changes.
In the OCPM Data Pool, schedule the pre-processing data job to run before the OCPM data job (ocpm-data-job).

To deploy the Celonis-supplied transformations for objects and events to the pre-processing scope, follow the steps in Quickstart: Extract and transform your data into objects and events, but in Stage 2 (Enable the processes you want), select your dummy Data Connection for the pre-processing scope, instead of the Data Connection for your source system. Then the Celonis-supplied transformations for the objects and events in the process will be deployed to the pre-processing scope and operate on your pre-processed data.

I only want to create objects and events for a limited data scope, for example only one year of data

You might want to restrict the data scope to a limited subset, for example one year of data, for performance or other reasons. Normally, you would limit the data scope as part of the table extraction. However, you might have reasons to continue extracting a table in full, for example in order to not impact existing use cases. In that case, you can limit the data scope as part of a data pre-processing stage, following the method described in I want to pre-process my raw data before transforming it into objects and events. There’s an example in that section showing you how to use only data from 2023.

If you don’t need the full data for other use cases, you can just limit the data scope by filtering it in the extraction job directly into the OCPM Data Pool. Use the filter settings described in Creating extraction tasks using the visual editor.

I am extracting data from multiple source systems (for example, separate SAP instances)

If your data is extracted from multiple source systems (for example, separate instances for SAP Spain, SAP Germany, and SAP France), you can use the catalog to deploy the transformations for a process multiple times, one for each source system. When you run the OCPM data job (ocpm-data-job), the data from all of the systems is merged into one integrated table for each of the object types and event types used in the process. This also works across different source system types, so you can combine data from (for example) SAP ECC and Oracle EBS.

First you’ll need to get all the tables for each source system into the OCPM Data Pool. For each source system, you can choose to set up the Data Connection directly in the OCPM Data Pool and extract the source data. Alternatively, you can export (share) the Data Connection from another Data Pool and then import it into the OCPM Data Pool, as described in I have already extracted data into an existing Data Pool that is not the OCPM Data Pool.

When you’ve set up or imported a Data Connection for each source system into the OCPM Data Pool, go to the catalog and install the required processes for each Data Connection in turn. The OCPM data job (ocpm-data-job) combines the data processing for each source system into a single transformation for each of the object types and event types, which produces a single table containing the objects or events from all of the source data.

I have not extracted all the tables that are used in the Celonis transformations

The Celonis-supplied transformations from the catalog require a specific set of tables and columns to be extracted over the Data Connection. If you are reusing an existing dataset from another Data Pool that was extracted for case-centric processing, your data likely will not contain all the required tables and columns.

If your use case only needs a subset of the objects and events in a process, you don’t have to extract the tables relating to the objects and events you don’t need. When you publish the OCPM data job (ocpm-data-job) to production, we automatically create a transformation called transformation_initialize_source_tables as the first transformation of the data job. This transformation creates empty placeholder tables to substitute for any tables that aren’t in your dataset, but are required by the Celonis-supplied transformations. So in this situation the transformations won’t fail. Note that transformation_initialize_source_tables won’t add any missing tables that are required by custom transformations.

Because the placeholder tables contain no records, the objects and events that would normally be created from them won’t be created. To see which object types and event types are impacted, check the perspective load in Data Integration by clicking on the load details of the corresponding Data Model. This will show you a list of the loaded tables for objects, events, and relationships together with the number of loaded rows. If the number of rows is zero, there were no records to create the objects, events, or relationships from.

In the Loaded Data Model window, expand "Data load progress" to see the list of loaded tables, with a table name and the number of loaded rows.

The transformation_initialize_source_tables transformation doesn’t handle missing columns in your dataset. If you have extracted a table that’s required by the Celonis-supplied transformations, but it’s missing some columns, the transformation for that table will fail with an error message telling you the first column that’s missing. Check your extractions and add in the missing columns.

The table names or column data types in my extracted data are different to the SAP standard (for example, because I am extracting it from a Data Lake)

The Celonis-supplied transformations from the processes in the catalog require the standard table names and column data types used in the source system. If your tables have had a prefix added to the table name (for example SAP_VBAK) or use an alternative table name, or if you have columns with nonstandard data types, there are several possible solutions:

Rename the tables or change the column data types during extraction. This is the easiest solution, but you can only do it if you aren't using the tables and columns for any other use case that requires the existing setup.
- To rename a table, in the extraction configuration, select the checkbox Rename target table for any tables that need renaming, as shown in this example. Then carry out a full load to re-extract the table with the new name.
- To change a column's data type, go to the extraction configuration where the table containing the column is defined, and click on the column configuration. Adjust the data types and click Confirm. Save the updated settings, then carry out a full load to re-extract the column with the new data type.
  Note
  Extractors don’t support all types of conversions. For example, you couldn’t convert a VARCHAR into a DATE. Also, not all extractors allow data type conversions at all. If you can’t do the conversion you want with the extractor you’re using, you’ll need to try one of the other solutions.
Rename the tables or change the column data types during a pre-processing stage. If you need to pre-process the data for other reasons as well, this is a convenient solution, and it works for shared tables. I want to pre-process my raw data before transforming it into objects and events explains how to set up pre-processing, and shows you how to rename a table, or use CAST to perform a data type conversion, while you’re creating the views.
Create partial overwrites or full overwrites to replace the Celonis-supplied transformations. You can create a partial overwrite to change the definition of an individual column, or a full overwrite to change the name of a table. Creating custom transformations explains how to do this.
Important
Creating a full overwrite of the transformation for a Celonis object type or event type means that the transformation won’t be automatically updated with future improvements to the Celonis catalog. For example, if a new property is added to a Celonis object type to support a new app for a business process, the new property won’t get populated in an object-centric data model where the Celonis-supplied transformation for the object type has been fully overwritten, and the app won’t work. If you do use a full overwrite, be prepared to manually monitor and implement the future updates to the Celonis object type or event type. It’s better to make changes during extraction or as a pre-processing stage if you can.

I have NULL values in ID attributes

In the Celonis-supplied transformations from the processes in the catalog, primary keys are enforced. For example, in the Material object type, for SAP source system data, the concatenation of the columns "MARA"."MANDT" and "MARA"."MATNR" is defined as part of the ID for the object, and the ID is defined as the primary key for the object. The defined primary keys are equivalent to the primary keys used by the source system - for example, for MARA the columns MANDT and MATNR are defined as primary keys in SAP.

Depending on the configuration of your source system, primary key columns can be empty. In that case, they are extracted into Celonis with NULL values. However, the OCPM data job (ocpm-data-job) fails if a column that is part of the primary key definition contains NULL values. In this situation, when you run the OCPM data job, you’ll see an error message similar to “ERROR: Cannot set a NOT NULL column (ID) to a NULL value in INSERT/UPDATE statement”. There are several possible solutions:

Remove records with NULL values for the primary key during the table extraction, using the filter settings described in Creating extraction tasks using the visual editor. Whether you can use this solution depends on the context - you might need to retain the records with NULL values in the key columns because they are relevant for the analytical use case you are trying to implement. In this case, to stop the transformations from failing, you’ll need to replace the NULL values with empty strings or some other placeholder of your choice.
Replace the NULL values during a pre-processing stage. If you need to pre-process the data for other reasons as well, this is a convenient solution, and it works for shared tables. I want to pre-process my raw data before transforming it into objects and events explains how to set this up, and shows you how to use the COALESCE() function to replace NULL values for a primary key with an empty string. Don’t use ISNULL or IFNULL, as these functions are not supported for object-centric transformations.
Create a full overwrite of the Celonis-supplied transformation, and change the primary key definition for the object type or event type. You can overwrite IDs, attributes, joins, and filters with a full overwrite, and it replaces the Celonis-supplied transformation entirely in the OCPM data job (ocpm-data-job). If you use a transformation overwrite to make changes to the primary key, you'll need to take a lot of care to ensure that the resulting configuration is logically correct and technically workable - it's easy to end up with unintended consequences.
Important
Also, creating a full overwrite of the transformation for a Celonis object type or event type means that the transformation won’t be automatically updated with future improvements to the Celonis catalog. For example, if a new property is added to a Celonis object type to support a new app for a business process, the new property won’t get populated in an object-centric data model where the Celonis-supplied transformation for the object type has been fully overwritten, and the app won’t work. If you do use a full overwrite, be prepared to manually monitor and implement the future updates to the Celonis object type or event type. It’s better to make changes during extraction or as a pre-processing stage if you can.