Skip to main content

Celonis Product Documentation

LINK_PATH
Description

Warning

To use this feature, Object Link needs to be configured with Object Link in the data model.

The LINK_PATH operator creates an internal activity and case table to represent individual paths calculated by traversing the Object Link graph. The resulting activity table is joined towards the input table and to the internal case table in a N:1 fashion. Additional tables containing link attributes are generated and can be accessed via the LINK_PATH_SOURCE/TARGET operators. An example of the created tables and how they are joined is shown below:

Object_Link_Join_Example.png

Warning

In some settings, this operator may require excessive CPU time. If the execution time exceeds 10 minutes, the execution is stopped and an error is reported.

The output of LINK_PATH is a column containing all objects of all calculated paths, where each row's data corresponds to the value of the object in the input column. If only individual objects are of interest and a complete path is not required, use LINK_SOURCE/LINK_TARGET instead. As the generated tables are joined to the input table, all objects, as specified in the Object Link mapping table, must reside within this one table in order to avoid join cycles in the data model.

Syntax
 LINK_PATH ( input_table.column [, direction ] [, CONSTRAINED BY ( [ START ( start_objects_expression ) ] [, END ( end_objects_expression ) ] [, LENGTH ( comparison length ) ] [, ALL ( all_objects_expression ) ] [, WITH/WITHOUT CYCLES] ) ] )
  • LINK_PATH acquires object attributes from the specified input_table.column. Therefore, all objects, as they are specified in the Object Link mapping table, must be contained in this table. Entries of the input object table that are not mentioned in the mapping table are not legitimate objects for the Object Link graph and are ignored.

  • direction specifies the traversal direction. Valid values for the parameter are FORWARDS (default) and BACKWARDS.

  • The graph traversal can further be CONSTRAINED BY:

    • start_objects_expression is a condition to specify the start objects of the graph traversal.

    • end_objects_expression is a condition to specify the end objects of the graph traversal.

    • comparison is one of = (equal), != (not equal), <> (not equal), < (less than), <= (less than or equal), >(greater than) or >= (greater than or equal). In conjunction with length, which is a positive integer, this can be used to specify the desired path lengths (as measured by the number of objects).

    • all_objects_expression is a condition to specify which objects of the graph may be traversed.

    • WITH CYCLES enables the traversal of cycles and WITHOUT CYCLES disables the traversal of cycles. The default is WITHOUT CYCLES.

NULL handling
  • Object Link mapping table entries with NULL values in the IN or OUT column will register the object specified in the none-NULL entry. However, no link will be added to the Object Link graph.

  • NULL values in the input_table.column are retained in the output.

Scenarios

The Object Link graph traversed by LINK_PATH can be arbitrarily complex and large. Formally, the graph is a directed multigraph, meaning that each link has a source and target object as well as an identity which allows multiple unique links between the same source and target pair. We refer to a group of links that share the same source and target objects as a multi-link. Additionally, the graph may contain cycles and self-loops. Objects therefore can appear multiple times within a path.

Object Link graphs typically have objects that do not have INCOMING or OUTGOING links, these are called implicit START or END objects respectively and can also be found via the LINK_OBJECTS operator. In the following illustrations, implicit START objects are indicated with an incoming link without a source and END objects are marked with a thick border.

Simple, acyclic scenario

This scenario shows an Object Link graph without cycles and multi-links. Note that the isolated Object A can still be traversed. The implicit START objects are A and B, the implicit END objects are G and H.

Object_Link_Acyclic_Scenario.png
Cyclic, multi-link scenario

This scenario shows an Object Link graph containing cycles and multi-links. The implicit START objects are A and F, the implicit END objects are D and E.

Object_Link_Cyclic_Multi_Edge_Scenario.png
Constraints

Parameters within the the CONSTRAINED BY clause are called constraints and control the traversal performed by LINK_PATH. They are intended to limit the result of the operator by specifying the critical areas within the graph. Also see the chapter about "Why does LINK_PATH hit the table row limit?". Each constraint may only appear once but can otherwise be freely combined with each other.

Unconstrained

A LINK_PATH call that does not specify any constraints is called unconstrained and has the following default behavior:

  • START constraint: Use implicit START objects of the Object Link graph.

  • END constraint: Use implicit END objects of the Object Link graph.

  • LENGTH constraint: Use default LENGTH limit of 10. A warning will be emitted if longer paths exist.

  • ALL constraint: Allow ALL objects to be traversed.

  • CYCLES constraint: Enable/Disable cycles to be traversed.

[1]

LINK_PATH internally creates new tables, most importantly an activity table in which each individual path is considered as a case. This allows process mining operators to be applied to graph-based data. This example shows how we apply process operators like CASE_ID_COLUMN and TIMESTAMP_COLUMN on the result of LINK_PATH. Note that the ACTIVITY_COLUMN will contain only NULL values and that we can acquire any object attribute that is defined in the input_table. Since traversal from object A will finish immediately due to being both a start and end object we get a path of length 1, only containing said object. More interesting paths result from traversals starting at B.

Query

Column1

         CASE_ID_COLUMN ( LINK_PATH ( "Case"."ID" ) )
        

Column2

         LINK_PATH ( "Case"."ID" )
        

Column3

         MILLIS_BETWEEN ( {t 0 } , TIMESTAMP_COLUMN ( LINK_PATH ( "Case"."ID" ) ) )
        

Column4

         LINK_PATH ( "Case"."OBJ_ATTR" )
        

Input

Output

Case

ID : string

OBJ_ATTR : int

'A'

0

'B'

1

'C'

2

'D'

3

'E'

3

'F'

4

'G'

4

'H'

4

'J'

null

MAPPING

OUT : string

IN : string

'A'

null

'B'

'C'

'C'

'D'

'C'

'E'

'D'

'F'

'D'

'G'

'E'

'G'

'E'

'H'

'F'

'G'

Result

Column1 : int

Column2 : string

Column3 : float

Column4 : int

0

'A'

0.0

0

1

'B'

0.0

1

1

'C'

1.0

2

1

'D'

2.0

3

1

'F'

3.0

4

1

'G'

4.0

4

2

'B'

0.0

1

2

'C'

1.0

2

2

'D'

2.0

3

2

'G'

3.0

4

3

'B'

0.0

1

3

'C'

1.0

2

3

'E'

2.0

3

3

'G'

3.0

4

4

'B'

0.0

1

4

'C'

1.0

2

4

'E'

2.0

3

4

'H'

3.0

4

[2]

Besides the activity table, LINK_PATH also generates a new case table accordingly. We can use the generated tables together with Process and PU-functions to aggregate object information on a path level.

Query

Column1

         CASE_TABLE(LINK_PATH("Case"."ID"))."PATH_ID"
        

Column2

         VARIANT ( LINK_PATH ( "Case"."ID" ) )
        

Column3

         PU_COUNT ( CASE_TABLE ( LINK_PATH ( "Case"."ID" ) ) , LINK_PATH ( "Case"."ID" ) )
        

Input

Output

Case

ID : string

OBJ_ATTR : int

'A'

0

'B'

1

'C'

2

'D'

3

'E'

3

'F'

4

'G'

4

'H'

4

'J'

null

MAPPING

OUT : string

IN : string

'A'

null

'B'

'C'

'C'

'D'

'C'

'E'

'D'

'F'

'D'

'G'

'E'

'G'

'E'

'H'

'F'

'G'

Result

Column1 : int

Column2 : string

Column3 : int

0

'A'

1

1

'B, C, D, F, G'

5

2

'B, C, D, G'

4

3

'B, C, E, G'

4

4

'B, C, E, H'

4

[3]

The resulting LINK_PATH activity table is also connected to the input table containing the traversed objects. With the following query we can find out which objects occur most often in all paths.

Query

Column1

         "Case"."ID"
        

Column2

         COUNT ( LINK_PATH ( "Case"."ID" ) )
        

Input

Output

Case

ID : string

OBJ_ATTR : int

'A'

0

'B'

1

'C'

2

'D'

3

'E'

3

'F'

4

'G'

4

'H'

4

'J'

null

MAPPING

OUT : string

IN : string

'A'

null

'B'

'C'

'C'

'D'

'C'

'E'

'D'

'F'

'D'

'G'

'E'

'G'

'E'

'H'

'F'

'G'

Result

Column1 : string

Column2 : int

'A'

1

'B'

4

'C'

4

'D'

2

'E'

2

'F'

1

'G'

3

'H'

1

START and END

Traversal of the Object Link graph can be defined by specifying conditions on the input_table. Conditions in the START constraint specify where the traversal begins and the END constraint indicates where a path is finished. Note while specifying these constraints that the conditions are not automatically combined with the implicit START or END objects. If the implicit objects without INCOMING or OUTGOING links shall be used with additional conditions, we can utilize the LINK_OBJECTS operator to explicitily formulate the implicit objects.

[4]

By specifying only object C with attribute 2 as START and objects F, G and H as END objects, we limit the traversal to a smaller part of the Object Link graph. Note that we gained paths that are subsets of each other, namely C, D, F which is included in C, D, F, G. If possible try to avoid the duplication of information in order to reduce computational complexity and the resulting table size.

Query

Column1

         VARIANT ( LINK_PATH ( "Case"."ID" , CONSTRAINED BY ( START ( "Case"."OBJ_ATTR" = 2 ) , END ( "Case"."ID" IN ( 'F' , 'G' , 'H' ) ) ) ) )
        

Input

Output

Case

ID : string

OBJ_ATTR : int

'A'

0

'B'

1

'C'

2

'D'

3

'E'

3

'F'

4

'G'

4

'H'

4

'J'

null

MAPPING

OUT : string

IN : string

'A'

null

'B'

'C'

'C'

'D'

'C'

'E'

'D'

'F'

'D'

'G'

'E'

'G'

'E'

'H'

'F'

'G'

Result

Column1 : string

'C, D, F'

'C, D, F, G'

'C, D, G'

'C, E, G'

'C, E, H'

[5]

In the following example, we use every implicit START object that also has outgoing links. A is disqualified as a START object since it is an isolated object.

Query

Column1

         VARIANT ( LINK_PATH ( "Case"."ID" , CONSTRAINED BY ( START ( LINK_OBJECTS("Case"."ID")."INCOMING" = 0 AND LINK_OBJECTS("Case"."ID")."OUTGOING" > 0 ) ) ) )
        

Input

Output

Case

ID : string

OBJ_ATTR : int

'A'

0

'B'

1

'C'

2

'D'

3

'E'

3

'F'

4

'G'

4

'H'

4

'J'

null

MAPPING

OUT : string

IN : string

'A'

null

'B'

'C'

'C'

'D'

'C'

'E'

'D'

'F'

'D'

'G'

'E'

'G'

'E'

'H'

'F'

'G'

Result

Column1 : string

'B, C, D, F, G'

'B, C, D, G'

'B, C, E, G'

'B, C, E, H'

LENGTH

Arbitrarily long paths could result from traversal on an Object Link graph. To allow and disallow paths of certain lengths we can make use of the LENGTH constraint.

[6]

In this example, we only want paths whose length (as measured by the number of objects) is less than or equal to 4, and the paths must start with object B.

Query

Column1

         VARIANT ( LINK_PATH ( "Case"."ID" , CONSTRAINED BY ( LENGTH ( <= 4 ) , START ( "CASE"."ID" = 'B' ) ) ) )
        

Input

Output

Case

ID : string

OBJ_ATTR : int

'A'

0

'B'

1

'C'

2

'D'

3

'E'

3

'F'

4

'G'

4

'H'

4

'J'

null

MAPPING

OUT : string

IN : string

'A'

null

'B'

'C'

'C'

'D'

'C'

'E'

'D'

'F'

'D'

'G'

'E'

'G'

'E'

'H'

'F'

'G'

Result

Column1 : string

'B, C, D, G'

'B, C, E, G'

'B, C, E, H'

ALL

Thanks to the ALL constraint, we can guarantee that all objects in the resulting paths fulfill a certain condition.

[7]

If we want every path that does not involve traversing certain objects, we can avoid them via the ALL constraint. Here we gain every path that never visits object D.

Query

Column1

         VARIANT ( LINK_PATH ( "Case"."ID" , CONSTRAINED BY ( ALL ( "Case"."ID" != 'D' ) ) ) )
        

Input

Output

Case

ID : string

OBJ_ATTR : int

'A'

0

'B'

1

'C'

2

'D'

3

'E'

3

'F'

4

'G'

4

'H'

4

'J'

null

MAPPING

OUT : string

IN : string

'A'

null

'B'

'C'

'C'

'D'

'C'

'E'

'D'

'F'

'D'

'G'

'E'

'G'

'E'

'H'

'F'

'G'

Result

Column1 : string

'A'

'B, C, E, G'

'B, C, E, H'

CYCLES

With the cycles constraint we can enable or disable the traversal of cycles in the created graph. A cycle is detected when the object would occur twice in the traversed path. By default, no cycles are traversed.

[8]

If we enable cycles, we get additional paths that contain the cycle between B and C.

Query

Column1

         VARIANT ( LINK_PATH ( "Case"."ID" , CONSTRAINED BY ( WITH CYCLES ) ) )
        

Input

Output

Case

ID : string

OBJ_ATTR : int

'A'

0

'B'

1

'C'

2

'D'

3

'E'

1

'F'

1

MAPPING

OUT : string

IN : string

LINK_ATTR : int

'A'

'B'

1

'A'

'B'

2

'B'

'C'

3

'B'

'C'

4

'B'

'C'

6

'B'

'E'

7

'C'

'B'

5

'C'

'D'

8

'F'

'B'

9

Result

Column1 : string

'A, B, C, B, C, D'

'A, B, C, B, E'

'A, B, C, D'

'A, B, E'

'F, B, C, B, C, D'

'F, B, C, B, E'

'F, B, C, D'

'F, B, E'

[9]

If we disable cycles, we reduce the amount of resulting paths. Paths visiting B and C twice are not in the result set.

Query

Column1

         VARIANT ( LINK_PATH ( "Case"."ID" , CONSTRAINED BY ( WITHOUT CYCLES ) ) )
        

Input

Output

Case

ID : string

OBJ_ATTR : int

'A'

0

'B'

1

'C'

2

'D'

3

'E'

1

'F'

1

MAPPING

OUT : string

IN : string

LINK_ATTR : int

'A'

'B'

1

'A'

'B'

2

'B'

'C'

3

'B'

'C'

4

'B'

'C'

6

'B'

'E'

7

'C'

'B'

5

'C'

'D'

8

'F'

'B'

9

Result

Column1 : string

'A, B, C, D'

'A, B, E'

'F, B, C, D'

'F, B, E'

Why does LINK_PATH hit the table row limit?

When the LINK_PATH operator traverses a large Object Link graph, it stores the paths it finds as rows in a table. If this table becomes too large, an error is returned indicating that the table row limit has been reached. To demonstrate how quickly the number of paths can grow, let's look at a simple example production process for making chocolate strawberry cake, as shown in the following figure.

Object_Link_Chocolate_Strawberry_Cake_Graph.png

When this Object Link graph is traversed without constraints using the query below, the result consists of 45 paths and 180 rows, which is quite a lot for such a small graph.

 LINK_PATH ( "Materials"."Description" )

However, if our use case does not require packaged cakes, we can add a constraint to stop traversing the graph when we reach the unpackaged chocolate strawberry cake as shown in the following query. This drastically reduces the size of the result, which is now only 14 paths and 44 rows. That's three times fewer paths and four times fewer rows just by omitting three packaged cakes. In real processes with thousands of objects and links, the number of paths will grow even faster. Therefore, it is critical to limit the traversal of the Object Link graph to only those objects that are essential to a use case.

 LINK_PATH ( "Materials"."Description" , CONSTRAINED BY ( END ( "Materials"."Description" = 'Chocolate strawberry cake' ) ) )

The following figure gives a rough indication of when the table row limit might be reached for Object Link graphs of different sizes if no constraints are used. For example, for an Object Link graph with 10,000 objects, the table row limit is likely to be reached when the number of links exceeds 26,000. In general, we strongly recommend the use of constraints once the ratio of links to objects is greater than 2.

Object_Link_Link_Path_Table_Row_Limit_Plot.png

Use the following queries to calculate the number of links, the number of objects, and the ratio of links to objects for your Object Link graph.

 COUNT ( LINK_SOURCE ( table.column ) )
 COUNT_TABLE ( LINK_OBJECTS ( table.column ) )
 COUNT ( LINK_SOURCE ( table.column ) ) / GLOBAL ( COUNT_TABLE ( LINK_OBJECTS ( table.column ) ) )
Advanced Examples

[10]

LINK_PATH can also handle more complex graphs that contain cycles and multiple links between two objects. When traversing an Object Link graph like the one above, LINK_PATH has two mentionable features:

  1. Since the output of LINK_PATH is object-based, there is no clear distinction between two possible subpaths A ➔ B over link 1 or link 2. This prevents identical paths from being duplicated, which drastically reduces the operator's output. Traversed links and their attributes can still be accessed using the LINK_PATH_SOURCE and LINK_PATH_TARGET operators.

  2. Although the output is link-independent, LINK_PATH marks the links as they are traversed so that they are not used a second time. For the given example, this means that object B can only be visited once from object C, breaking the B ➔ C ➔ B ➔ C ➔ B ➔ ... loop.

Query

Column1

         CASE_ID_COLUMN ( LINK_PATH ( "Case"."ID" , CONSTRAINED BY ( START ( "Case"."ID" = 'A' ) , WITH CYCLES ) ) )
        

Column2

         LINK_PATH ( "Case"."ID" , CONSTRAINED BY ( START ( "Case"."ID" = 'A' ) , WITH CYCLES ) )
        

Input

Output

Case

ID : string

OBJ_ATTR : int

'A'

0

'B'

1

'C'

2

'D'

3

'E'

1

'F'

1

MAPPING

OUT : string

IN : string

LINK_ATTR : int

'A'

'B'

1

'A'

'B'

2

'B'

'C'

3

'B'

'C'

4

'B'

'C'

6

'B'

'E'

7

'C'

'B'

5

'C'

'D'

8

'F'

'B'

9

Result

Column1 : int

Column2 : string

0

'A'

0

'B'

0

'C'

0

'B'

0

'C'

0

'D'

1

'A'

1

'B'

1

'C'

1

'B'

1

'E'

2

'A'

2

'B'

2

'C'

2

'D'

3

'A'

3

'B'

3

'E'

[11]

The result of LINK_PATH are individual paths. However, sometimes we want to get a combined view of paths that have certain properties in common. The following example shows how to distinctively sum up the object attributes for all paths that share the same end object.

Query

Column1

         PU_LAST ( CASE_TABLE ( LINK_PATH ( "Case"."ID" ) ) , LINK_PATH ( "Case"."ID" ) )
        

Column2

         SUM ( CASE WHEN INDEX_ORDER ( UNIQUE_ID ( LINK_PATH ( "Case"."ID" ) , PU_LAST ( CASE_TABLE ( LINK_PATH ( "Case"."ID" ) ) , LINK_PATH ( "Case"."ID" ) ) ) , PARTITION BY ( UNIQUE_ID ( LINK_PATH ( "Case"."ID" ) , PU_LAST ( CASE_TABLE ( LINK_PATH ( "Case"."ID" ) ) , LINK_PATH ( "Case"."ID" ) ) ) ) ) = 1 THEN LINK_PATH ( "Case"."OBJ_ATTR" ) ELSE NULL END )
        

Input

Output

Case

ID : string

OBJ_ATTR : int

'A'

0

'B'

1

'C'

2

'D'

3

'E'

3

'F'

4

'G'

4

'H'

4

'J'

null

MAPPING

OUT : string

IN : string

'A'

null

'B'

'C'

'C'

'D'

'C'

'E'

'D'

'F'

'D'

'G'

'E'

'G'

'E'

'H'

'F'

'G'

Result

Column1 : string

Column2 : int

'A'

0

'G'

17

'H'

10

See also: