Skip to main content

Celonis Product Documentation

Web Page Data Extractions

By default, Task Mining captures only the elements a user interact with (and related details). For example, when the user clicks a button, Task Mining captures a left click on button "Submit" in application Salesforce, with related information such as ActiveWindow title, etc.

Custom data extractions allow you to define additional data to capture even if the user does not interact with it, e.g. a support order id shown in a web page which might be useful for the analysis. For this case, the user can define custom rules for extracting specific data from the website. With each interaction on the website, this information will be extracted and appended to the user interaction data.

For a video tutorial, see:


  • Extractions are currently only supported when using the Task Mining Browser Extension

  • Extractions only work when visiting web pages and the extension is enabled


  • The Analysts defines data extractions in the "Client Settings" during the project setup.

  • Afterwards, when users run Task Mining and visit a web page for which extractions are set up, the extractions are run and results stored in the column "WebPageExtractions".

When to Use

Event Processing Rules

Event processing Rules contain information for elements the user interacted with and add them to the event log.

Web Page Data Extractions

If the captured information (even when capturing all attributes) is not sufficient, you can think about adding additional information.

Use Web Page Data Extractions to e.g. add information of a headline or label on websites showing an id to the event log. This information was not there using Event Processing Rules as the user does not interact with it.

In Summary, Web Page Data Extractions can be used to add additional information to the event log.


The web page data extractions can be defined in the configuration editor. Each data extraction consists of three parts:

  • Key: An ID under that the extracted data item is stored. It must be unique within the config file.

  • URL: A regular expression that describes the URL of the websites for which the data item should be extracted. The data item will only be extracted for web pages that match the URL. The regular expression must be in JavaScript syntax.

  • Path: A path to define which element/value to extract.

    • An XPath expression that describes the path of the data item in the document structure of the web page.

    • Or a JQuery selector that describes the path of the data item in the document structure of the web page.

    • Example for extracting the support case id in Salesforce producing the same results

      Xpath path expression

      jQuery path expression

      //div[@class="split-right"]/section[contains(@class, "active")]//p[contains(text(),"Case Number")]/following-sibling::p/slot/lightning-formatted-text/text()

      jquery:.slds-form-element__label:contains("Case Number") + div.slds-form-element__control > .slds-form-element__static

Xpath Path expression


  • The XPath query is run against the html document and does only access html attributes, but not JavaScript properties.

    • I.e. $(‘#firstname’).value is a property that contains the user entered text of the text field

    • $x(//[@id=’#firstname’)/@value contains the value that was set in the html document attribute, which is not updated by the user typing text.

  • To enable you to get user entered text, the Task Mining will check for each result node if it has a ‘value’ property. If yes, then this value is returned. Otherwise, by default the textContent will be returned (text visible within node)

JQuery Path expression

Example of Salesforce Full Expression to Extract the Case Number:

jquery:.slds-form-element__label:contains("Case Number") + div.slds-form-element__control > .slds-form-element__static






.slds-form-element__label:contains("Account Name")

Select the label with the text “Account Name” in it. Use this as the starting element

+ div.slds-form-element__control

From there, walk to the next sibling with class .slds-form-element__control

> .slds-form-element__static

From there, search for the child with class .slds-form-element__static

Storage Format

All extracted data items are stored within the user interaction event in the WebPageExtractions column in JSON format. The root element is a list of JSON objects where each object represents the result of an extraction rule. Each object consists of a key attribute identifying the web page extraction rule and a data attribute containing the extracted data. As a XPath expression might return multiple data items, the data attribute is a list of strings, each representing a result item of the XPath expression.

For examples of Web Page Data Extractions, see Examples of Web Page Data Extraction.