Skip to main content

Celonis Product Documentation

Configuring the AI Annotation for the object-centric Duplicate Invoice Checker

AI Annotations provide actionable Signals that highlight opportunities for business improvement. In the context of the object-centric Duplicate Invoice Checker, the AI Annotation performs the duplicate invoice check. The AI Annotation serves as a stable and scalable backend, while also offering a configuration interface where you can define filter dimensions and customize the duplicate invoice search patterns.

The AI Annotation replaces the Machine Learning Sensor and the Knowledge Model custom pattern configuration that were used in earlier versions of the object-centric Duplicate Invoice Checker, and are still used in the case-centric Duplicate Invoice Checker. If you haven’t already, follow the steps in Prerequisites for the object-centric Duplicate Invoice Checker to get the AI Annotation for the Duplicate Invoice Checker enabled for your team.

Here’s how to view and configure the AI Annotation for the Duplicate Invoice Checker:

  1. Go to Data > Machine Learning and select AI Annotations. If you can’t see the AI Annotation button, you can get to AI Annotations using the URL https://[team].[cluster].celonis.cloud/machine-learning/ui/ml-signals, where [team].[cluster] is your Celonis team name.

  2. Click the Duplicate Invoice Checker Signal. The General Information section shows the name and description as well as the scheduling options.

  3. Validate the default search patterns that the Duplicate Invoice Checker uses to identify matches. Algorithm and default search patterns for the object-centric Duplicate Invoice Checker describes the default patterns, which are:

    • Exact Match (EXACT)

    • Similar Reference (REF_FUZZY)

    • Similar Value (VALUE_FUZZY)

    • Similar Date (DATE_FUZZY)

    • Similar Vendor (VENDOR_FUZZY)

    Click on a search pattern to see what attributes it’s looking at. For example, the VENDOR_FUZZY search pattern is looking at the attributes ReferenceDocumentNumber, Amount, DocumentDate, and VendorName.

    Click on an attribute to see what comparer the search pattern uses for it, and the parameters that the comparer is using. Comparers and parameters at the end of this topic explains what each comparer does and what its parameters are.

  4. Customize the default search patterns if you need to for your business requirements.

    • Add or change any values you want to modify for the parameters of the default patterns.

    • Add any attributes you want the pattern to look at by clicking Add Attributes. Select an attribute of the VendorAccountItem object, and the comparer you want to use, then choose appropriate parameters. See Comparers and parameters for the explanation of what each comparer does and what its parameters and their defaults are.

    • Change any attributes you want to swap for another by selecting another attribute from the list of available attributes of the VendorAccountItem object type.

    • Click Save at the bottom of the page when you’ve finished making changes to the configuration.

  5. Add any custom search patterns you want to use in addition to (or instead of) the default patterns. Any pattern must have at least one exact comparer. We recommend you don’t use any more than 15 search patterns in total.

    • Click Add Search Patterns next to the list of search patterns to add a new pattern. We'll give it a name.

    • Click Add Attributes to add each of the attributes (columns) you want the pattern to look at.

      Tip

      The more attributes a pattern is checking, the more resources it uses to check the same number of documents.

      We recommend you don’t check any more than 7 attributes in a single pattern. It’s less resource-intensive to create more patterns. This has a relatively lower impact on memory usage because the patterns work independently of each other.

      Be careful you don’t go too far in the other direction, and create patterns with very few attributes and low thresholds for matching - you might end up with groups containing thousands of documents.

    • Use the dropdowns to select an attribute of the VendorAccountItem object, and the comparer you want to use. Then choose appropriate parameters.  See Comparers and parameters for the explanation of what each comparer does and what its parameters and their defaults are.

    • If you need to remove an attribute again, click Delete in the Attribute options popup.

    • Click Save at the bottom of the page when you’ve finished making changes to the configuration.

  6. Set filters that you want to apply to the invoices before checking - excluded invoices won’t be checked by the algorithm. The only filters set by default are a minimum reference length of 5, and excluding internal vendors. Click Save at the bottom of the page when you’ve finished changing the configuration.

    The available standard filter dimensions are:

    • Start date: The start date (inclusive) from which documents are considered. Use YYYY-MM-DD format (for example, 2024-05-01). The start date for an item is in the field "o_DuplicateInvoiceChecker_VendorAccountItem"."DocumentDate", which in turn is taken from  "o_celonis_VendorAccountCreditItem"."DocumentDate" and "o_celonis_VendorAccountDebitItem"."DocumentDate".

    • Exclude internal vendors: Whether to exclude (the default) or include documents from internal vendors. The flag is "o_DuplicateInvoiceChecker_VendorAccountItem__Vendor"."InternalFlag", which in turn is taken from "o_celonis_Vendor"."InternalFlag".

    • Minimum reference length: Documents with a vendor’s reference number shorter than the threshold are excluded. By default, this is set to 5. Specify a whole number.

    • Maximum reference frequency: Documents where the reference occurs more than the specified frequency throughout the dataset are removed. Specify a whole number.

    • Amount threshold: Remove documents with an invoice amount below the specified threshold. Specify a whole or decimal number (for example, 100.50).

    • Company code: The company codes to be excluded. Specify the codes as a comma-separated list (for example '1000','2000').

    • Vendor name: The vendor names to be excluded. Specify the names as they appear in the data, including any suffixes, as a comma-separated list (for example 'Celonis SE','Celo GmbH').

    • Document type: The document types to be excluded. Specify the codes as a comma-separated list (for example 'KR','RE','KG').

    • Transaction type: The document types to be checked. If any transaction types are defined in this field, all other types will be excluded. Specify the codes as a comma-separated list (for example ’InvoiceItem’,’CreditMemoItem’).

    If the standard filter dimensions don't meet your requirements, you can add custom PQL filters to remove documents from checking. Click Add to create a custom filter in the PQL editor. Use the syntax for the PQL function FILTER (see FILTER). We'll validate your syntax when you save the configuration.

    Tip

    It’s also possible to add custom filters in the Duplicate Invoice Checker's Action View to hide groups. It's better to exclude documents from checking by pre-filtering them in the AI Annotation before they appear in the results, because then you won't have wasted resources. Post-filtering in the Action View is useful as a backup if you don't want to reset the result table or modify your backend settings, or if you want to handle a false positive that only comes up occasionally. Editing views for the object-centric Duplicate Invoice Checker has the instructions to do it.

  7. Click Run at the top right to run the AI Annotation manually and check your results in the Studio package.

    Tip

    The Duplicate Checker App won't re-evaluate documents it's already checked. If you want to re-check all documents with the new search pattern configuration, you’ll need to reset the result tables. Resetting the result tables explains how.

  8. When you’ve run the AI Annotation, click on the Signal again and view the Logs section, which displays the logs from the last execution. A green checkmark shows a successful execution, while a red cross shows an unsuccessful one, along with information about the error.

  9. When you’re happy with the overall configuration, make sure you’ve saved your changes. Then change the schedule option to “On Data Model Reload”, which runs the job automatically every time the data model is reloaded. You can still run the job on demand using the Run button when the schedule is set to “On Data Model Reload”.

Resetting the result tables

If you’re already using the Duplicate Invoice Checker in production, when you change the search pattern configuration, the app does not re-evaluate documents it’s already checked. Changes will only be applied to newly checked documents. If you want to re-check all documents with the new search pattern configuration, you’ll need to carry out a manual reset of the result tables.

Warning

Clearing the results will result in groups having different UUIDs, even for groups formed with the same documents. Make sure to create a backup of the tables if you want to preserve groups that have user feedback on them.

To reset the result tables, first perform a DELETE FROM on the Duplicate Invoice Checker’s three result tables in the data pool. The SQL to delete the results looks like this:

DELETE FROM <%=OCPM_SCHEMA%>."o_DuplicateInvoiceChecker_VendorAccountItem";
DELETE FROM <%=OCPM_SCHEMA%>."o_DuplicateInvoiceChecker_CheckedInvoice";
DELETE FROM <%=OCPM_SCHEMA%>."o_DuplicateInvoiceChecker_DuplicateGroup";

Then run the transformation that populates the table of VendorAccountItem objects, which is named auto_generated_dic_combine_vendor_credit_debit_account_items. Finally, load the perspective.

Comparers and parameters

Each search pattern for the Duplicate Invoice Checker consists of a combination of attributes, comparers and parameters.

To add an attribute to a search pattern, click Add by the list of existing attributes. Then choose an attribute from the available attributes of the VendorAccountItem object type.

Note

VendorAccountItem is a Celonis object type supplied with the dedicated perspective for this app, perspective_DuplicateInvoiceChecker_DuplicateInvoiceCheckerApp. To create it, we combine the VendorAccountCreditItem and VendorAccountDebitItem object types from the standard Accounts Payable perspective. VendorAccountItem only contains the attributes that are part of those Celonis object types as supplied - it won’t contain any custom attributes that you added to extend those object types.

Select a comparer for the attribute, then set the parameters you want for it. The defaults are shown in the descriptions of the comparers in this topic. Click Save at the bottom of the page when you’ve finished making changes to the configuration.

The available comparers are the same as for the case-centric version of the Duplicate Invoice Checker:

Tip

Exact and Different attribute checks are less resource-intensive than fuzzy checks. The more columns checked for exact and different matches, the less memory is used. For instance, a pattern with three exact attribute checks uses less memory than a pattern with only one exact attribute check. Each additional fuzzy check in a pattern significantly increases memory usage.

Exact

Find items with the same value. To meet this condition, the value of this attribute must be exactly the same on both invoices.

Different

Find items with different values. To meet this condition, the value of this attribute must be different on both invoices.

Company Name

Find similar company names. This can be used as a general string comparer.

The parameters are:

  • Similarity metric: The method used for calculating the string similarity score. The options are “jaro” (Jaro distance) or “levenshtein” (Levenshtein distance). The default is “jaro”.

  • Similarity threshold: The threshold of the similarity score above (or below) which a fuzzy match is considered.

    • For the Jaro distance (“jaro”), the range is from a maximum of 1 (an exact match) to a minimum of 0 (no similarities) - the higher, the stricter. Pairs scoring above the specified threshold are selected by the algorithm. The pattern’s supplied default of 0.85 is suitable for this scoring method.

    • The Levenshtein distance (“levenshtein”) is the minimum number of single-character edits required to make two strings match. The threshold is the maximum number of edits allowed - the lower, the stricter. 0 is an exact match, and the maximum is the length of the longer string.  Pairs scoring below the specified threshold are selected by the app.  For example, a similarity threshold of 8 means that pairs with up to 8 character edits are selected by the app.

  • Company suffixes: A list of company suffixes to be removed before comparing the company names. Suffix removal is case-insensitive. The default comprises these suffixes:

    gmbh,ag,llc,inc,ltd,limited,sdn,bhd,se,corporation,corp,sl,coltd,group,mbh,co,kg,ltda,sa,sro,des,sas,sasu,zoo,sp,sau,cokg
Date

Find similar invoice dates. This can be used as a general date comparer. The time part of datetime fields is ignored.

The parameter is:

  • Maximum day difference: The maximum difference in days between two dates for them to be identified as a fuzzy match. The default is 7.

Apart from looking for a matching date within the allowed difference in days, the date comparer also checks for commonly swapped months, and for swapped days and months. Commonly swapped months are June and July as well as September and October. Swapped days and months are, for example, 06.03 and 03.06. Each of these checks (day difference, swapped day and month, commonly swapped months) is made separately. A match is declared when any of these is true.

Reference

Find similar invoice references.

Tip

The Reference comparer is particularly resource-intensive. If possible, avoid adding multiple fuzzy checks with the “Reference” comparer in a single pattern.

The parameters are:

  • Confusion characters: Pairs of letters and numbers that are often recognized as each other by mistake, for example, “8” and “B.” Use two lists of characters where each position in the character list forms a pair. The default lists are “86i10oqsz” and “bgllddd52” - here “8” and “b” form the first pair. The lists and the comparison are case-insensitive.

  • Maximum skipped characters: The maximum number of characters that can be skipped in one of the strings so that it is equal to the other string. The default is 3, meaning the app can make a fuzzy match by skipping up to three characters from the longer string to reduce it to match the shorter string.

    For example, take the two references “abcd1234” and “abcd124”. If we skip the character “3” in the first string, the two strings are identical, so this match requires one skipped character. Matching the strings “abc123” and “abca12z3” requires two skipped characters - if we skip the second “a” and “z” in the second string, the two strings are identical.

    A string of “abc123” and “abg123” would never be considered a match because skipping “c” does not result in the two strings matching (“ab123” vs. “abg123”), and skipping “g” does not result in the two strings matching (“abc123” vs “ab123”).

  • Maximum swapped characters: The maximum number of swapped character pairs. The default is 3, meaning the app can make a fuzzy match that includes up to three swapped character pairs. This requires the two strings to have the same characters.

    For example, the two references “ABC” and “BAC” contain two swapped character pairs, because “A and B” and “B and A” each represent a pair of inverted characters. Two characters need to be swapped to turn “ABC” into “BAC” and vice versa.

    Note

    If there is a numerical inversion in a reference such as “01” and “10”, we automatically consider this not to be a match. For instance, the references “ABC01” and “ABC10” do not match. This prevents false positives with recurring invoices, which often have incremental numbers. If you want to consider swapping errors in numbers, use the Value comparer.

Value

Find similarities between invoice values. This can be used as a general numeric comparer. Each of the following checks is made separately. A match is declared when any of the checks is true.

The parameters are:

  • Maximum value difference: The maximum allowed absolute difference between the two numeric values. The default is 80.

  • Maximum swapped numbers: The maximum number of swapped number pairs. The default is 3, meaning the app can make a fuzzy match that includes up to three swapped number pairs. This requires the two numbers to have the same digits.

    For example, there are four swapped number pairs in “21500” and “12005”, where “2 and 1”, “1 and 2”, “5 and 0” and “0 and 5” represent a pair of inverted characters. To turn “21500” into “12005” and vice versa, four digits need to be swapped.

  • Maximum skipped characters: The maximum number of skipped digits when comparing two numeric values. The default value is 0, meaning that the app doesn't check for skipped digits. If you define a value of 1 or more, it does. For example, with a value of 2, “21500" and “200” match.