Skip to main content

Celonis Product Documentation

Before getting started


The Duplicate Checker App is purpose-built for SAP ECC and works out-of-the-box with the Celonis SAP ECC Accounts Payable Connector and a standard, full cloud EMS setup. Any infrastructure, source system or data model deviations require app customizations and are not covered by the standard app.

First and foremost, it is necessary to have an advanced understanding of the following concepts to ensure a smooth installation. If not, please complete the trainings on these topics visiting the Celonis Academy:

  • Studio (i.e. concepts like variables or extensions)

  • Views

  • Knowledge Model

  • Skills

  • Data Integration

  • Data Models

Please create a Service Desk ticket to ensure that the following backend flags are enabled for a given team:

  • Studio - Apps (Knowledge Models, Business Views, Skills)

  • Celonis: Machine Learning Sensor

  • Celonis: Celonis Smart Sensor

  • Celonis: Create Task

  • Celonis: Update Augmented Attribute

In order to run the ML Sensor and to yield results from the algorithm, the data model should contain at a minimum the following information:

  • Invoice value (integer/floating-point format)

  • Document date (date format)

  • Invoice reference (string format)

  • Vendor name (string format)

  • Event log or activity table

The app expects the following tables:

  • BSEG (Document line table - Case table)

  • BKPF (Document header table)

  • LFA1 (Vendor table)

  • _CEL_AP_ACTIVITIES (Activity table) with activities (“Create Credit Memo”, “Reverse Invoice”)


Apart from invoices (31 posting key), it is recommended to also include credit memos (21 posting key) in the case table to be able to identify invoice-unrelated credit memos. This reduces the number of potential false duplicates.

The use case

Detecting duplicate invoices in the Accounts Payable process prevents unnecessary payments to vendors. For duplicate invoices already paid, there is the possibility to reclaim previous payments from vendors. The goal is to identify duplicates as early as possible before they are being paid to be able to set a payment block or reverse the invoice. Today, this results in a lot of manual effort.

The focus of the Duplicate Checker App is facilitating this process by detecting duplicates that have slipped through the ERP controls. The app achieves this by using a Celonis fuzzy matching algorithm that expands beyond traditional exact matches computed by ERP systems. After having analyzed the invoice data, the algorithm groups invoices together that are suspected to be duplicates based on different matching patterns. These patterns are based on the following four dimensions: Invoice value, document date, invoice reference and vendor name.

Once the groups have been computed, the app allows users to review the potential duplicates, provide feedback whether it is a duplicate or not, and then take action accordingly.

Components and data flow

The Duplicate Checker app works across three layers in the EMS: Definition layer, execution layer and data layer. It is essential to understand how the data flows across the different layers and the purpose of each component in order to be able to refine and debug the application.

Definition layer: This layer stores the business logic and common understanding of the application in the form of filters, KPIs, attributes, records, etc.

Execution layer: The execution layer is where the work of the app is done or visualized. It always uses the definition layer to retrieve the right data from the data layer.

Data layer: The data layer is where the data is stored and ready to be used at the other layers.

The following graph depicts the interaction between the different layers and components:

  1. The ML Sensor gets the definitions from the knowledge model (e.g. filters and attribute definitions)

  2. The ML Sensor queries the data from the data model (given that the data model has been loaded)

  3. The algorithm computes groups of potential duplicate invoices based on different matching patterns, and then writes back the results into a table (name: “DUPLICATE_INVOICES + package-key + knowledge-model-key”) in the Data Pool. This is done via an upsert based on the primary key of the document, meaning that if the table exists, already existing rows are updated and non-existing rows added. If the table does not exist, it gets created from scratch.

  4. A dedicated Data Job, with the name “GROUPS_DUPLICATE_INVOICES + package-key + knowledge-model-key”), is automatically created in the Data Pool of the data model to create a table (view) with group level information of the “DUPLCIATE_INVOICES” entries, with the name “DUPLICATE_GROUP_INVOICES”. During the first run, the two tables are added and subsequently loaded to the data model.

  5. The Studio View takes the definitions from the knowledge model

  6. And queries the data from the data model based on these definitions. After the next data extraction, the cycle starts again.

As seen from the steps above, the data flow is always linear. For example, the ML Sensor cannot work correctly if the document date is not defined in the knowledge model or if the definition in the knowledge model does not return the correct data from the data model. It is therefore recommended to always check the input component in order to make the following component work.

While the app itself starts with the ML Sensor, it is recommended to to always start any debugging at the data layer and work up through the definition layer over to the execution layer. For instance: Is there an invoice line, invoice header and vendor table in the data model?

The algorithm

The algorithm is at the heart of the Duplicate Checker App. It groups invoices together that are suspected to be duplicates based on different matching patterns.

At a high level, the algorithm works as follows:

  • Every time new invoices are loaded into the connected data model, the algorithm compares them against the invoice backlog. Important here is that only new (unchecked) documents are checked against other new (unchecked) documents as well as against old (already checked) documents. Old (already checked) documents are not checked against old (already checked) documents.

  • The comparison is performed on the four invoice fields as defined in the knowledge model. By default these are: Vendor Name "INVOICE_VENDOR_NAME", Invoice Value "INVOICE_VALUE", Invoice Reference "INVOICE_REFERENCE" and Document Date "INVOICE_DATE".

  • Fuzzy matching logic flags and groups invoices with a high-match probability based on different patterns.

  • The algorithm excludes reversed documents from the result set based on the activities in the data model via the "INVOICE_REVERSAL_FLAG".

  • Filters defined in the ML Sensor reduce the scope of documents checked by the algorithm.

  • Filters defined in the Action View can exclude certain groups from being displayed, reducing the number of potential false positives.