Skip to main content

Celonis Product Documentation

The Duplicate Checking patterns

The Duplicate Checking App's algorithm groups together invoices that might be duplicates, based on different matching patterns of the four fields Invoice Reference, Document Date, Invoice Value, and Vendor Name. This topic describes the different patterns and their determination logic. For a high level overview of the algorithm, see The algorithm.

Exact Match:

The fields Invoice Reference, Document Date, Invoice Value, and Vendor Name each match exactly in both of the invoices being compared.

Similar Vendor:

The fields Invoice Reference, Document Date, and Invoice Value each match exactly in both of the invoices being compared. The field Vendor Name is a fuzzy match. Here's how the fuzzy match is determined:

  1. Remove everything except uppercase and lowercase letters and digits - such as special characters and white spaces. For example, "AcmeValue" and "Acme Value" might be a match.

  2. Remove all company key words like "Corp", "LLC", and so on. For example, "Celonis SE" and "Celonis GmbH" might be a match.

  3. Check for matches with the given string similarity metric and the given threshold. The similarity is 1 if the given string similarity metric is greater than the threshold. In that situation, the strings are considered an approximate match.

Example:

similar_vendor_pattern.png

Similar Reference:

The fields Document Date, Invoice Value, and Vendor Name each match exactly in both of the invoices being compared. The field Invoice Reference is a fuzzy match. Here's how the fuzzy match is determined:

  1. Remove everything except uppercase and lowercase letters and digits - such as special characters and white spaces. For example, "A R-AMC1234" and "AR AMC\ 1234" might be a match.

  2. Check whether the invoice references are exactly equal with the exception of 0-3 extra characters in one of the two records. For example, "AR-AMC1234" and "AMC1234" might be a match.

  3. Check for common scanning errors such as the letter "B" in place of the number "8". For example, "AR-AMC1238" and "AR-AMC123B" might be a match.

  4. Check for transposed characters. For example, "AR-AMC1234" and "AR-MAC1234" might be a match.

Example:

similar_reference_pattern.png

Similar Date:

The fields Invoice Reference, Invoice Value, and Vendor Name each match exactly in both of the invoices being compared. The field Document Date is a fuzzy match. Here's how the fuzzy match is determined:

  1. Check whether the dates are the same except that the month and day are swapped. For example, "2020-01-02" and "2020-02-01" might be a match.

  2. Check whether the dates are the same except that the month has been swapped for another. For example, “2020-07-02" and "2020-06-02" might be a match.

  3. Check whether the distance between the dates is less than 7 days. This threshold is based on experience with customers. For example, invoices dated "2020-07-02" and "2020-07-08" might be duplicates if all the other fields are exact matches.

Example:

similar_date_pattern.png

Similar Value:

The fields Invoice Reference, Document Date, and Vendor Name each match exactly in both of the invoices being compared. The field Invoice Value is a fuzzy match. Here's how the fuzzy match is determined:

  1. Compare the numeric values by computing their string similarity. The implemented algorithms are 'step', 'linear', 'exp', 'gauss' or 'squared'. In case of agreement, the similarity is 1 and in case of complete disagreement it is 0.

  2. Check for transposed digits. For example, 150,234 and 105,234 might be a match.

Example:

similar_value_pattern.png