Skip to main content

Task Mining data processing and scheduling

Task Mining data processing overview

You can configure how and when your Task Mining data is processed. You can choose to process new Task Mining data only or to reprocess all data when new data is added. If new Task Mining data only is processed, you can optionally schedule when this processing will be performed,

Task Mining data is processed in batches of up to 10 million rows for efficiency but processing is limited by the number of:

  • Rows the data model can load into one table (typically two billion). 

  • Concurrent users sending data per realm (up to 30,000 users for larger realms).

For more information, see the Workforce Productivity app. For information on tables, see the Task Mining table reference.

Task Mining data processing options

Note

You access the data processing options from the Task Mining project home page by selecting Run & Schedule and choosing an option from the Run dropdown. Any issues are displayed in the Run History on the Run & Schedule screen.

Data processing option

Description

Use case

Process new Task Mining data only (delta)

Processes new incoming data only.

Data needs to be updated as quickly as possible while ensuring the data processing duration is as short as possible.

Reprocess existing data (full)

Reprocesses all existing data to apply new or updated rules, labels, business events or tasks.

Data updates need to be consistently applied across the entire data set to ensure analyses are always consistent. As the entire data set is being processed, the data processing duration is longer than if new data alone were being processed.

  1. The Task Mining Client software sends new data to the user_interaction_event_log table,

  2. Default and custom Labels are applied to raw events.

  3. The resulting events are stored in the TM_Labeled_Data table.

  4. Tasks are applied and:

    • Tasks containing the list of task names defined by the user are stored in the Tasks table.

    • Task instances found for Tasks are stored in the Task_Instances table.

    • Tasks_Join is an n:n join table between a task instance and TM_Labeled_Data events.

  5. The Data Model automatically reloads to reflect the newly-processed data.

  1. User triggers re-processing in the Run & Schedule screen.

  2. Temporary tables are created in the background.

    These temporary tables are essentially copies of existing tables and do not contribute to APC consumption.

  3. The union of the user_interaction_event_log and user_interaction_event_log_history tables is queried in batches of 10 million rows until all sessions that were available when re-processing was triggered have been processed.

  4. Default and custom Labels are applied to raw events.

  5. The resulting events are stored in the TM_Labeled_Data_reprocessing table.

  6. Tasks and Business Events are applied.

  7. Steps 3 to 6 are performed for new data in batches of up to 10 million rows until all raw events have been processed.

  8. The Data Model automatically reloads to reflect the newly-processed data.

  9. If reprocessing is:

    • Successful, the temporary reprocessing tables which contain the re-processed results are renamed and replace the original tables until re-processing is performed again.

    • Unsuccessful, all temporary tables are deleted and the re-processing job execution status in the Run & Schedule screen is set to Failed.

Task Mining processing scheduling options

Tip

You access the processing scheduling options from the Task Mining project home page by selecting Run & Schedule and selecting Schedule.

Data processing scheduling options

Description

Use case

Run when new data is uploaded.

Captured Task Mining data is processed every time new data is uploaded to the Task Mining Data Pool by the Task Mining Client software. There may be a delay of up to 20 minutes between the Task Mining Client software status showing as uploaded and the processing run starting.

Available for processing new Task Mining data only (delta) only.

Data must be available in Studio as quickly as possible and there are no resource utilization issues that interfere with other data transformations.

Run by schedule

Captured Task Mining data is processed at a specified time/date.

Available for processing new Task Mining data only (delta) only.

For performance reasons, data processing is run when users are not working, for example, overnight.

No schedule

Data processing is triggered by a user manually selecting Run in Run & Schedule.

Available for processing new Task Mining data only (delta) and re-processing existing Task Mining data (full).

Gives flexibility when there are no specific timing or performance constraints.