Skip to main content

Celonis Product Documentation

Default duplicate invoice patterns

The Duplicate Invoice Checker's algorithm groups invoices that might be duplicates based on different matching patterns. In the standard setup and the default logic, where the Machine Learning (ML) Sensor fully powers the algorithm, four fields are checked: Invoice Reference, Document Date, Invoice Value, and Vendor Name. It is also possible to customize the algorithm logic by defining custom matching patterns. These customizations are done in the knowledge model and will extend the configuration in the ML Sensor (see Custom patterns) for a high-level overview of the algorithm.

Standard Patterns

By default, the algorithm is standardized across all customers and fully powered by the Machine Learning Sensor (ML Sensor). In the ML Sensor, you define the four standard columns to be checked as well as the filters to determine which documents should be checked. The standard search (also called matching) patterns would then be applied.

Exact match

The fields “Invoice Reference”, “Document Date”, “Invoice Value”, and “Vendor Name” match exactly in both of the invoices being compared.

Similar vendor

The fields “Invoice Reference”, “Document Date”, and “Invoice Value” match exactly in both of the invoices being compared. The field “Vendor Name” is a fuzzy match. Here's how the fuzzy match is determined:

  1. Remove everything except uppercase and lowercase letters and digits - such as special characters and white spaces. For example, "AcmeValue" and "Acme Value" might match.

  2. Remove all company keywords like "Corp," "LLC," etc. For example, "Celonis SE" and "Celonis GmbH" might match.

  3. Check for matches with the given string similarity metric and the given threshold. The similarity is 1 if the given string similarity metric exceeds the threshold. In that situation, the strings are considered an approximate match.

Invoice A

Match

Invoice B

Invoice Reference

XX0116621912

Exact

XX0116621912

Document Date

2024-06-10

Exact

2024-06-10

Invoice Value

5135.50

Exact

5135.50

Vendor Name

Celonis SE

Fuzzy

Celonis GmbH

Similar reference

The fields “Document Date”, “Invoice Value”, and “Vendor Name” match exactly in both of the invoices being compared. The field “Invoice Reference” is a fuzzy match. Here's how the fuzzy match is determined:

  1. Remove everything except uppercase and lowercase letters and digits - such as special characters and white spaces. For example, "A R-AMC1234" and "AR AMC\ 1234" might be a match.

  2. Check whether the invoice references are exactly equal with the exception of 0-3 extra characters in one of the two records. For example, "AR-AMC1234" and "AMC1234" might be a match.

  3. Check for common scanning errors, such as the letter "B" in place of the number "8". For example, "AR-AMC1238" and "AR-AMC123B" might be a match.

  4. Check for transposed characters. For example, "AR-AMC1234" and "AR-MAC1234" might be a match.

Invoice A

Match

Invoice C

Invoice Reference

XX0116621912

Fuzzy

XX011662I912

Document Date

2024-06-10

Exact

2024-06-10

Invoice Value

5135.50

Exact

5135.50

Vendor Name

Celonis SE

Exact

Celonis SE

Similar date

The fields “Invoice Reference”, “Invoice Value”, and “Vendor Name” match the invoices being compared. The field “Document Date” is a fuzzy match. Here's how the fuzzy match is determined:

  1. Check whether the dates are the same except that the month and day are swapped. For example, "2024-01-02" and "2024-02-01" might match.

  2. Check whether the dates are the same except that the month has been swapped for another. For example, “2024-07-02" and "2024-06-02" might match.

  3. Check whether the distance between the dates is less than 7 days. This threshold is based on experience with customers. For example, invoices dated "2024-07-02" and "2024-07-08" might be duplicates if all the other fields match.

Invoice A

Match

Invoice D

Invoice Reference

XX0116621912

Exact

XX0116621912

Document Date

2024-06-10

Fuzzy

2024-07-10

Invoice Value

5135.50

Exact

5135.50

Vendor Name

Celonis SE

Exact

Celonis SE

Similar value

The fields “Invoice Reference”, “Document Date”, and “Vendor Name” match exactly in both of the invoices being compared. The field “Invoice Value” is a fuzzy match. Here's how the fuzzy match is determined:

  1. Allow a small absolute difference between the two values. For example, “5,080” and “5,000” might be a match.

  2. Check for transposed digits. For example, “150,234” and “105,234” might be a match.

Invoice A

Match

Invoice E

Invoice Reference

XX0116621912

Exact

XX0116621912

Document Date

2024-06-10

Exact

2024-06-10

Invoice Value

5135.50

Fuzzy

5153.50

Vendor Name

Celonis SE

Exact

Celonis SE

Multiple

With the standard fuzzy patterns, the expectation is that three columns are exact and one column is fuzzy for one set of documents. A set consists of two documents. There can be the case of multiple patterns where multiple sets of documents are connected through different patterns.

The following shows a group formed with multiple patterns - “Similar Reference” and “Similar Vendor”:

Invoice A

Match

Invoice C

Match

Invoice F

Invoice Reference

XX0116621912

Fuzzy

XX011662I912

Exact

XX011662I912

Document Date

2024-06-10

Exact

2024-06-10

Exact

2024-06-10

Invoice Value

5135.50

Exact

5135.50

Exact

5135.50

Vendor Name

Celonis SE

Exact

Celonis SE

Fuzzy

Celonis GmbH

In the example, invoices A and B were matched due to the similar Invoice Reference. The other three fields, “Document Date”, “Invoice Value”, and “Vendor Name”, each match exactly. At the same time, invoices B and C were matched due to the similar Vendor names. The other three fields, “Invoice Reference”, “Document Date”, and “Invoice Value”, each match exactly. Since invoice B is part of the two sets of documents, it acts as a bridge connecting all three invoices in one group.

The following shows a set of documents that are not grouped:

Invoice A

Match

Invoice G

Invoice Reference

XX0116621912

Exact

XX0116621912

Document Date

2024-06-10

Fuzzy

2024-07-10

Invoice Value

5135.50

Exact

5135.50

Vendor Name

Celonis SE

Fuzzy

Celonis GmbH

In the example, invoices A and B, as one set of documents, were not matched due to the “Invoice Reference” as well as the “Vendor Name” being similar. To match such cases, a custom pattern would need to be defined (see Custom Patterns).