Skip to main content

Celonis Product Documentation

Default duplicate invoice patterns

The Duplicate Invoice Checker's algorithm groups invoices that might be duplicates based on different matching patterns. In the standard setup and the default logic, where the Machine Learning (ML) Sensor fully powers the algorithm, four fields are checked: Invoice Reference, Document Date, Invoice Value, and Vendor Name. It is also possible to customize the algorithm logic by defining custom matching patterns. These customizations are done in the knowledge model and will extend the configuration in the ML Sensor (see Custom patterns) for a high-level overview of the algorithm.

Standard Patterns

By default, the algorithm is standardized across all customers and fully powered by the Machine Learning Sensor (ML Sensor). In the ML Sensor, you define the four standard columns to be checked as well as the filters to determine which documents should be checked. The standard search (also called matching) patterns would then be applied.

Exact match

The fields “Invoice Reference”, “Document Date”, “Invoice Value”, and “Vendor Name” match exactly in both of the invoices being compared.

Similar vendor

The fields “Invoice Reference”, “Document Date”, and “Invoice Value” match exactly in both of the invoices being compared. The field “Vendor Name” is a fuzzy match. Here's how the fuzzy match is determined:

  1. Remove everything except uppercase and lowercase letters and digits - such as special characters and white spaces. For example, "AcmeValue" and "Acme Value" might match.

  2. Remove all company keywords like "Corp," "LLC," etc. For example, "Celonis SE" and "Celonis GmbH" might match.

  3. Check for matches with the given string similarity metric and the given threshold. The similarity is 1 if the given string similarity metric exceeds the threshold. In that situation, the strings are considered an approximate match.

Example 

similar_vendor_pattern.png
Similar reference

The fields “Document Date”, “Invoice Value”, and “Vendor Name” match exactly in both of the invoices being compared. The field “Invoice Reference” is a fuzzy match. Here's how the fuzzy match is determined:

  1. Remove everything except uppercase and lowercase letters and digits - such as special characters and white spaces. For example, "A R-AMC1234" and "AR AMC\ 1234" might be a match.

  2. Check whether the invoice references are exactly equal with the exception of 0-3 extra characters in one of the two records. For example, "AR-AMC1234" and "AMC1234" might be a match.

  3. Check for common scanning errors, such as the letter "B" in place of the number "8". For example, "AR-AMC1238" and "AR-AMC123B" might be a match.

  4. Check for transposed characters. For example, "AR-AMC1234" and "AR-MAC1234" might be a match.

Example 

similar_reference_pattern.png
Similar date

The fields “Invoice Reference”, “Invoice Value”, and “Vendor Name” match the invoices being compared. The field “Document Date” is a fuzzy match. Here's how the fuzzy match is determined:

  1. Check whether the dates are the same except that the month and day are swapped. For example, "2020-01-02" and "2020-02-01" might match.

  2. Check whether the dates are the same except that the month has been swapped for another. For example, “2020-07-02" and "2020-06-02" might match.

  3. Check whether the distance between the dates is less than 7 days. This threshold is based on experience with customers. For example, invoices dated "2020-07-02" and "2020-07-08" might be duplicates if all the other fields match.

Example 

similar_date_pattern.png
Similar value

The fields “Invoice Reference”, “Document Date”, and “Vendor Name” match exactly in both of the invoices being compared. The field “Invoice Value” is a fuzzy match. Here's how the fuzzy match is determined:

  1. Allow a small absolute difference between the two values. For example, “5,080” and “5,000” might be a match.

  2. Check for transposed digits. For example, “150,234” and “105,234” might be a match.

Example 

similar_value_pattern.png
Multiple

With the standard fuzzy patterns, the expectation is that three columns are exact and one column is fuzzy for one set of documents. A set consists of two documents. There can be the case of multiple patterns where multiple sets of documents are connected through different patterns.

The following shows a group formed with multiple patterns - “Similar Reference” and “Similar Vendor”:

DuplicateCheckApp_Muliple.png

In the example, invoices A and B were matched due to the similar Invoice Reference. The other three fields, “Document Date”, “Invoice Value”, and “Vendor Name”, each match exactly. At the same time, invoices B and C were matched due to the similar Vendor names. The other three fields, “Invoice Reference”, “Document Date”, and “Invoice Value”, each match exactly. Since invoice B is part of the two sets of documents, it acts as a bridge connecting all three invoices in one group.

The following shows a set of documents that are not grouped:

DuplicateCheckApp_MulipleNoGroup.png

In the example, invoices A and B, as one set of documents, were not matched due to the “Invoice Reference” as well as the “Vendor Name” being similar. To match such cases, a custom pattern would need to be defined (see Custom Patterns).