Throughput Times

Description

These examples show various ways of throughput time calculations in PQL.

Throughput time calculations are vital in powerful process mining applications. In the following, different kinds of throughput times and possible calculations using PQL are presented:

Throughput Times per Case: Learn how to calculate throughput times inside a case, e.g. the time from the start of each case to the end of the case or between specified activities within a case.
Throughput Times between Activities: Learn how to calculate the throughput times between consecutive activities inside a case.
Throughput Times over multiple Cases: Learn how to calculate throughput times over multiple cases, e.g. the throughput times of orders consisting of multiple cases.

Throughput Times per Case

Throughput time calculations on case level can be solved using CALC_THROUGHPUT. It is usually used together with REMAP_TIMESTAMPS, which adds many options for calendar specifications.

[1]

Here, the case throughput time from the first activity to the last activity of each case is calculated in days. As DAYS is specified as the time unit, the minutes of the input timestamps are ignored:

Query

Column1

         "Cases"."Case"

Column2

         CALC_THROUGHPUT ( CASE_START TO CASE_END , REMAP_TIMESTAMPS ( "Activities"."Timestamp" , DAYS ) )

Input

Output

Activities

Case : int	Activity : string	Timestamp : date
1	'A'	Wed Jan 01 2020 00:00:00.000
1	'C'	Mon Jan 06 2020 00:00:00.000
1	'C'	Tue Jan 07 2020 12:00:00.000
2	'A'	Fri Jan 03 2020 00:00:00.000
2	'A'	Thu Jan 09 2020 12:00:00.000
2	'C'	Sat Jan 11 2020 00:00:00.000

Cases

Case : int
1
2

Foreign Keys

Activities.Case

Cases.Case

Result

Column1 : int	Column2 : int
1	6
2	8

[2]

In this example, the throughput time between the last A-activity and the first C-activity of each case is calculated:

Query

Column1

         "Cases"."Case"

Column2

         CALC_THROUGHPUT ( LAST_OCCURRENCE [ 'A' ] TO FIRST_OCCURRENCE [ 'C' ] , REMAP_TIMESTAMPS ( "Activities"."Timestamp" , DAYS ) )

Input

Output

Activities

Case : int	Activity : string	Timestamp : date
1	'A'	Wed Jan 01 2020 00:00:00.000
1	'C'	Mon Jan 06 2020 00:00:00.000
1	'C'	Tue Jan 07 2020 12:00:00.000
2	'A'	Fri Jan 03 2020 00:00:00.000
2	'A'	Thu Jan 09 2020 12:00:00.000
2	'C'	Sat Jan 11 2020 00:00:00.000

Cases

Case : int
1
2

Foreign Keys

Activities.Case

Cases.Case

Result

Column1 : int	Column2 : int
1	5
2	2

[3]

If the case throughput time should be calculated in days, but the hours should not be ignored, we can calculate the throughput times in hours first and then divide the result by 24 (hours per day):

Query

Column1

         "Cases"."Case"

Column2

         CALC_THROUGHPUT ( CASE_START TO CASE_END , REMAP_TIMESTAMPS ( "Activities"."Timestamp" , HOURS ) ) / 24

Input

Output

Activities

Case : int	Activity : string	Timestamp : date
1	'A'	Wed Jan 01 2020 00:00:00.000
1	'C'	Mon Jan 06 2020 00:00:00.000
1	'C'	Tue Jan 07 2020 12:00:00.000
2	'A'	Fri Jan 03 2020 00:00:00.000
2	'A'	Thu Jan 09 2020 12:00:00.000
2	'C'	Sat Jan 11 2020 00:00:00.000

Cases

Case : int
1
2

Foreign Keys

Activities.Case

Cases.Case

Result

Column1 : int	Column2 : float
1	6.5
2	8.0

[4]

In this example, only weekdays should be counted towards the case throughput time, ignoring Saturdays and Sundays. This can be done using the WEEKDAY_CALENDAR in REMAP_TIMESTAMPS:

Query

Column1

         "Cases"."Case"

Column2

         CALC_THROUGHPUT ( CASE_START TO CASE_END , REMAP_TIMESTAMPS ( "Activities"."Timestamp" , DAYS , WEEKDAY_CALENDAR ( MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY ) ) )

Input

Output

Activities

Case : int	Activity : string	Timestamp : date
1	'A'	Wed Jan 01 2020 00:00:00.000
1	'C'	Mon Jan 06 2020 00:00:00.000
1	'C'	Tue Jan 07 2020 12:00:00.000
2	'A'	Fri Jan 03 2020 00:00:00.000
2	'A'	Thu Jan 09 2020 12:00:00.000
2	'C'	Sat Jan 11 2020 00:00:00.000

Cases

Case : int
1
2

Foreign Keys

Activities.Case

Cases.Case

Result

Column1 : int	Column2 : int
1	4
2	6

[5]

The activity table of this example contains several Error-activities. This example shows how to calculate the throughput time between the case start and the first Error-activity of each case. This can be done by mapping all Error-activities (those are all activities starting with Error) to a common string ( ErrorActivity in this example) using a CASE WHEN in the last argument of the CALC_THROUGHPUT function. As the result of this CASE WHEN is now taken as the activity column to be used in CALC_THROUGHPUT, the ErrorActivity can now be used inside the FIRST_OCCURRENCE range specifier:

Query

Column1

         "Cases"."Case"

Column2

         CALC_THROUGHPUT ( CASE_START TO FIRST_OCCURRENCE [ 'ErrorActivity' ] , REMAP_TIMESTAMPS ( "Activities"."Timestamp" , DAYS ) , CASE WHEN "Activities"."Activity" LIKE 'Error%' THEN 'ErrorActivity' ELSE "Activities"."Activity" END )

Input

Output

Activities

Case : int	Activity : string	Timestamp : date
1	'A'	Wed Jan 01 2020 00:00:00.000
1	'C'	Mon Jan 06 2020 00:00:00.000
1	'Error X'	Tue Jan 07 2020 12:00:00.000
2	'A'	Fri Jan 03 2020 00:00:00.000
2	'Error Y'	Thu Jan 09 2020 12:00:00.000
2	'B'	Sat Jan 11 2020 00:00:00.000

Cases

Case : int
1
2

Foreign Keys

Activities.Case

Cases.Case

Result

Column1 : int	Column2 : int
1	6
2	6

Throughput Times between Activities

Throughput times between consecutive activities can be calculated using SOURCE and TARGET, usually combined with a DateTime Difference function or REMAP_TIMESTAMPS. The Process Explorer and the Variant Explorer use this approach to calculate the throughput time for each edge in the graph.

[6]

This simple example shows how to calculate the number of days between consecutive activities of each case using SOURCE/TARGET and DAYS_BETWEEN:

Query

Column1

         "Cases"."Case"

Column2

         SOURCE ( "Activities"."Activity" )

Column3

         TARGET ( "Activities"."Activity" )

Column4

         DAYS_BETWEEN ( SOURCE ( "Activities"."Timestamp" ) , TARGET ( "Activities"."Timestamp" ) )

Input

Output

Activities

Case : int	Activity : string	Timestamp : date
1	'A'	Thu Jan 02 2020 00:00:00.000
1	'B'	Sat Jan 04 2020 00:00:00.000
1	'C'	Mon Jan 06 2020 00:00:00.000
2	'A'	Thu Jan 02 2020 00:00:00.000
2	'B'	Mon Jan 20 2020 00:00:00.000
2	'D'	Sat Jan 25 2020 00:00:00.000
2	'C'	Tue Jan 28 2020 00:00:00.000

Cases

Case : int
1
2

Foreign Keys

Activities.Case

Cases.Case

Result

Column1 : int	Column2 : string	Column3 : string	Column4 : float
1	'A'	'B'	2.0
1	'B'	'C'	2.0
2	'A'	'B'	18.0
2	'B'	'D'	5.0
2	'D'	'C'	3.0

[7]

Here, we use REMAP_TIMESTAMPS on the SOURCE/TARGET timestamps and subtract those values to get the difference between consecutive timestamps in days. By using the WEEKDAY_CALENDAR option, we only take weekdays into account, meaning that Saturday and Sunday do not count towards the time difference:

Query

Column1

         "Cases"."Case"

Column2

         SOURCE ( "Activities"."Activity" )

Column3

         TARGET ( "Activities"."Activity" )

Column4

         REMAP_TIMESTAMPS ( TARGET ( "Activities"."Timestamp" ) , DAYS , WEEKDAY_CALENDAR ( MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY ) ) - REMAP_TIMESTAMPS ( SOURCE ( "Activities"."Timestamp" ) , DAYS , WEEKDAY_CALENDAR ( MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY ) )

Input

Output

Activities

Case : int	Activity : string	Timestamp : date
1	'A'	Thu Jan 02 2020 00:00:00.000
1	'B'	Sat Jan 04 2020 00:00:00.000
1	'C'	Mon Jan 06 2020 00:00:00.000
2	'A'	Thu Jan 02 2020 00:00:00.000
2	'B'	Mon Jan 20 2020 00:00:00.000
2	'D'	Sat Jan 25 2020 00:00:00.000
2	'C'	Tue Jan 28 2020 00:00:00.000

Cases

Case : int
1
2

Foreign Keys

Activities.Case

Cases.Case

Result

Column1 : int	Column2 : string	Column3 : string	Column4 : int
1	'A'	'B'	2
1	'B'	'C'	0
2	'A'	'B'	12
2	'B'	'D'	5
2	'D'	'C'	1

Connection to the Process Explorer

The SOURCE and TARGET functions are used in the Process Explorer and Variant Explorer to calculate the edge KPIs, as shown in the following examples:

[8]

This example shows the query that is used to calculate the average throughput time between activities in days. Therefore, the difference of the two timestamps is first calculated in SECONDS and then converted to DAYS by dividing the difference by 60*60*24. Finally, the result is rounded. The corresponding Process Explorer would look like this:

Query

Column1

         SOURCE ( "Activities"."Activity" )

Column2

         TARGET ( "Activities"."Activity" )

Column3

         ROUND ( AVG ( ( REMAP_TIMESTAMPS ( TARGET ( "Activities"."Timestamp" ) , SECONDS ) - REMAP_TIMESTAMPS ( SOURCE ( "Activities"."Timestamp" ) , SECONDS ) ) / ( 60 * 60 * 24 ) ) )

Input

Output

Activities

Case : int	Activity : string	Timestamp : date
1	'A'	Thu Jan 02 2020 00:00:00.000
1	'B'	Sat Jan 04 2020 00:00:00.000
1	'C'	Mon Jan 06 2020 00:00:00.000
2	'A'	Thu Jan 02 2020 00:00:00.000
2	'B'	Mon Jan 20 2020 00:00:00.000
2	'D'	Sat Jan 25 2020 00:00:00.000
2	'C'	Tue Jan 28 2020 00:00:00.000

Cases

Case : int
1
2

Foreign Keys

Activities.Case

Cases.Case

Result

Column1 : string	Column2 : string	Column3 : int
'A'	'B'	10
'B'	'C'	2
'B'	'D'	5
'D'	'C'	3

[9]

The query below calculates the average number of days between consecutive activities, with activity 'B' being ignored. This is the query that is used when activities are ignored using the 'Eye' icon in the Process Explorer:

The corresponding Process Explorer looks like this:

Query

Column1

         SOURCE ( "Activities"."Activity" , REMAP_VALUES ( "Activities"."Activity" , [ 'B' , null ] ) )

Column2

         TARGET ( "Activities"."Activity" )

Column3

         ROUND ( AVG ( ( REMAP_TIMESTAMPS ( TARGET ( "Activities"."Timestamp" ) , SECONDS ) - REMAP_TIMESTAMPS ( SOURCE ( "Activities"."Timestamp" ) , SECONDS ) ) / ( 60 * 60 * 24 ) ) )

Input

Output

Activities

Case : int	Activity : string	Timestamp : date
1	'A'	Thu Jan 02 2020 00:00:00.000
1	'B'	Sat Jan 04 2020 00:00:00.000
1	'C'	Mon Jan 06 2020 00:00:00.000
2	'A'	Thu Jan 02 2020 00:00:00.000
2	'B'	Mon Jan 20 2020 00:00:00.000
2	'D'	Sat Jan 25 2020 00:00:00.000
2	'C'	Tue Jan 28 2020 00:00:00.000

Cases

Case : int
1
2

Foreign Keys

Activities.Case

Cases.Case

Result

Column1 : string	Column2 : string	Column3 : int
'A'	'C'	4
'A'	'D'	23
'D'	'C'	3

Throughput Times over multiple Cases

Throughput times over multiple cases can be calculated using PU_MIN and PU_MAX, combined with a DateTime Difference function or REMAP_TIMESTAMPS. This can be used to calculate the time range covered by all cases that are grouped together based on a certain property.

[10]

This example shows how to calculate the throughput time of each order. An order consists of one or more order items, which are the cases in the example. The throughput time of an order is the time difference between the first activity among all cases related to that order and the last activity among all cases of that order. We can use PU_MIN and PU_MAX to get the smallest (earliest) and largest (latest) timestamps related to each order and compare those using DAYS_BETWEEN:

Query

Column1

         "Orders"."ID"

Column2

         DAYS_BETWEEN ( PU_MIN ( "Orders" , "Activities"."Timestamp" ) , PU_MAX ( "Orders" , "Activities"."Timestamp" ) )

Input

Output

Activities

Case : int	Activity : string	Timestamp : date
1	'A'	Wed Jan 01 2020 00:00:00.000
1	'B'	Mon Jan 06 2020 00:00:00.000
1	'C'	Tue Jan 07 2020 00:00:00.000
2	'A'	Fri Jan 03 2020 00:00:00.000
2	'C'	Thu Jan 09 2020 00:00:00.000
3	'A'	Fri Jan 03 2020 00:00:00.000
3	'B'	Sun Jan 05 2020 12:00:00.000

Cases

Case : int	Order : int
1	100
2	100
3	200

Orders

ID : int
100
200

Foreign Keys

Activities.Case	Cases.Case
Cases.Order	Orders.ID

Result

Column1 : int	Column2 : float
100	8.0
200	2.5

[11]

This example shows how to calculate the average throughput time of an order, using an AVG around the query that calculates the throughput time of each order. Instead of using DAYS_BETWEEN like in the previous example, we can also use REMAP_TIMESTAMPS again to get the number of days between two timestamps. As explained above, REMAP_TIMESTAMPS only counts full days with the DAYS specification. Using REMAP_TIMESTAMPS allows you to take calendars into account, as already shown above:

Query

Column1

         AVG ( REMAP_TIMESTAMPS ( PU_MAX ( "Orders" , "Activities"."Timestamp" ) , DAYS ) - REMAP_TIMESTAMPS ( PU_MIN ( "Orders" , "Activities"."Timestamp" ) , DAYS ) )

Input

Output

Activities

Case : int	Activity : string	Timestamp : date
1	'A'	Wed Jan 01 2020 00:00:00.000
1	'B'	Mon Jan 06 2020 00:00:00.000
1	'C'	Tue Jan 07 2020 00:00:00.000
2	'A'	Fri Jan 03 2020 00:00:00.000
2	'C'	Thu Jan 09 2020 00:00:00.000
3	'A'	Fri Jan 03 2020 00:00:00.000
3	'B'	Sun Jan 05 2020 12:00:00.000

Cases

Case : int	Order : int
1	100
2	100
3	200

Orders

ID : int
100
200

Foreign Keys

Activities.Case	Cases.Case
Cases.Order	Orders.ID

Result

Column1 : float
5.0

Throughput Times

Description

Throughput Times per Case

Throughput Times between Activities

Connection to the Process Explorer

Throughput Times over multiple Cases

Search results