Activity Table and Case Table

Description

In this section you can find the definition and description of activity tables and case tables.

Background

Process mining is based on event logs. An event log is a collection of events. An event is described by a number of attributes. The following three event attributes are always required for process mining.

Case
Activity
Timestamp

There is also one optional attribute, Sorting.

Case

An instance of a process is called a case. The case attribute indicates which process instance the event belongs to. A case usually consists of multiple events and is often a (non-negative) number.

Activity

The activity attribute describes the action that is captured by the event in text; for example: "Sales order created".

Timestamp

The timestamp indicates the time when the event took place.

Sorting

The sorting column is optional and identifies sorting if the timestamps are not unique. See Activity table sorting for more information.

A sequence of events, ordered by their timestamps, that belong to the same case is called a trace. The traces of all the different cases with the same activity sequence represent a variant. The throughput time between two events of a case is the time difference between the corresponding timestamps. Therefore, the throughput time of a case is equal to the throughput time between the first and the last event of the corresponding trace.

Activity table

The data model normally contains an event log, which we call the activity table. The activity table always contains the three columns of the required event log attributes, while additional columns may be present. Within one case, the corresponding rows in the activity table are always sorted based on the timestamp column.

Usually, the activity table is not directly present in the source systems and therefore needs to be generated depending on the business process being analyzed. Since the source system is a relational database in most cases, this is usually done in SQL in the so-called transformation step. The transformation result can be a database view. However, a persisted table is usually created for performance reasons.

This procedure is comparable to the extract, transform, load procedure (ETL) in data warehouses. Like all the other tables, the resulting activity table is then imported into the data model. The user can specify the case and activity table in a user interface and mark the corresponding columns of the activity table as the case, activity and timestamp columns.

You can create a data model without an activity table. You don't need to add one when you create a data model. Furthermore you can add multiple activity tables to one data model. We call this a Multi Event log data model. You can use a Multi Event log to connect multiple processes within one data model. There are operators that help in these scenarios such as MERGE_EVENTLOG, Automerge or TRANSIT_COLUMN.

Case table

The case table provides information about each case and acts as the fact table in the snowflake schema. It always includes the case column, containing all distinct case IDs, while other columns provide additional information on the cases. There is a 1:N relationship between the case table and the activity table. If the case table is not specified in the data model, it will be generated automatically during the data model load. The case table then consists of one column containing all distinct case IDs from the activity table. This guarantees that a case table always exists, and the Celonis PQL functions and operators can rely on it.