# Celonis Product Documentation

##### MATCH_PROCESS_REGEX
###### Description

Filters the variants based on a regular expression defined over the activities.

MATCH_PROCESS_REGEX matches the variants of a process based on a regular expression. The regular expression defines a pattern over the activities of the variant.

If the regular expression contains an activity name that does not exist, then a warning is displayed.

### Warning

Computation Times In some settings, this operator may require excessive CPU time. If the execution time exceeds 10 minutes, the execution is stopped and an according error is reported.

[1]

Regular expression with non-existing activity and non-matching wildcard: Empty result and warnings.

Query

Filter

         FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , 'X' >> LIKE '%foo%' ) = 1;


Column1

         "Table1"."CASE_ID"


Column2

         "Table1"."ACTIVITY"


Input

Output

Table1

CASE_ID : string

ACTIVITY : string

startTimestamp : date

'1'

'Activity A'

Fri Jan 01 2016 01:00:00.000

'1'

'Activity D'

Fri Jan 01 2016 02:00:00.000

'2'

'Activity A'

Fri Jan 01 2016 03:00:00.000

'2'

'Activity C'

Fri Jan 01 2016 04:00:00.000

'3'

'Activity A'

Fri Jan 01 2016 05:00:00.000

'3'

'Activity B'

Fri Jan 01 2016 06:00:00.000

(empty table)

### Warning

MATCH_PROCESS_REGEX: Could not find activity ['X'].

### Warning

MATCH_PROCESS_REGEX: Could not find activity like ['%foo%'].

###### Result

MATCH_PROCESS_REGEX returns an INT value for each case which is 1 if the variant matches the pattern or 0 if it does not match. The resulting column is a temporary column of the case table.

###### Tips
• You can define an alias for an activity and use this alias as an abbreviation for the activity name inside the regular expression. This improves the readability of the pattern if you have long activity names.

• You can also define an alias for a subexpression. This enables you to give a logical name to a certain subexpression and use this subexpression at different places inside the regular expression. This allows you to structure your regular expressions and improve its understandability

• MATCH_PROCESS_REGEX is very useful if you want to filter on variants by a very specific and/or complex pattern, e.g. containing sequences or loops. If you want to filter for the occurrence of a single activity or very simple patterns, it is far more efficient to use normal conditional expressions instead of regular expressions.

• Instead of specifying the activity column, it is also possible to use another string column of the activity table. For example, you can write a process regex that matches cases based on the user type.

###### Syntax
 MATCH_PROCESS_REGEX ( activity_table.string_column, regular_expression )

• activity_table.string_column: A string column of an activity table. Usually, the activity column of an activity table is used.

• regular_expression: a regular expression to match the variants against. The patterns that can be used inside the regular expression are described below.

###### Patterns

This is an overview over all patterns that can be used within the regular expression. Full example queries as well as more detailed descriptions can be found below.

SyntaxMeaningExample
' 'Case contains the activity'A'
^Case starts with the activity^ 'Scan Invoice'
$Case ends with the activity'B'$
>>Activities directly follow'A' >> 'B'
|Logical OR'A' | 'B'
( )Group of activities('A' | 'B') >> ('C' >> 'D')
*0 or more occurrences('A' >> 'B')*
+1 or more occurrences('A' >> 'B')+
?0 or 1 occurrences('A' >> 'B')?
{<from>, <to>}Between <from> and <to> occurrences('A' >> 'B'){1, 3}
'*'Any activity matches'A' >> ('*')+ >> 'B'
ANYAny activity matches'A' >> (ANY)+ >> 'B'
.Any activity matches'A' >> . >> 'C'
LIKE '%...%'Activities that contain string'A' >> LIKE '% Invoice%'
[' ',' ']Set of activities of which one needs to match'A' >> ['B','D','E'] >> 'C'
[! ' ']Set of activities of which none matches'A' >> [! 'B'] >> 'C'
ASGives an alias to a regex('A' >> (ANY)*) AS sequence, sequence >> 'B'
###### Example

[2]

Matches any variant that starts with activity 'A' followed by activity 'B', followed by one or more activities 'A' or 'B' in arbitrary order. After that, an activity 'C' has to occur. However, before and after that activity any number (zero or more) of any possible activity may occur. Finally, the variant has to end with an activity 'H'.

Query

Filter

 [ ^ ] <regular_expression> [ $]  ###### Examples [8] Matches every variant that starts with an activity 'A' followed by an activity 'B'. Query Filter  FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , ^ 'A' >> 'B' ) = 1;  Column1  "Table1"."CASE_ID"  Column2  "Table1"."ACTIVITY"  Input Output Table1 CASE_ID : string ACTIVITY : string startTimestamp : date '1' 'A' Fri Jan 01 2016 01:00:00.000 '1' 'B' Fri Jan 01 2016 02:00:00.000 '1' 'C' Fri Jan 01 2016 03:00:00.000 '1' 'D' Fri Jan 01 2016 04:00:00.000 '2' 'A' Fri Jan 01 2016 05:00:00.000 '2' 'A' Fri Jan 01 2016 06:00:00.000 '2' 'B' Fri Jan 01 2016 07:00:00.000 '2' 'C' Fri Jan 01 2016 08:00:00.000 Result Column1 : string Column2 : string '1' 'A' '1' 'B' '1' 'C' '1' 'D' [9] Matches every variant that ends with an activity 'C' followed by an activity 'D'. Query Filter  FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , 'C' >> 'D'$ ) = 1;


Column1

         "Table1"."CASE_ID"


Column2

         "Table1"."ACTIVITY"


Input

Output

Table1

CASE_ID : string

ACTIVITY : string

startTimestamp : date

'1'

'A'

Fri Jan 01 2016 01:00:00.000

'1'

'B'

Fri Jan 01 2016 02:00:00.000

'1'

'C'

Fri Jan 01 2016 03:00:00.000

'1'

'D'

Fri Jan 01 2016 04:00:00.000

'2'

'A'

Fri Jan 01 2016 05:00:00.000

'2'

'A'

Fri Jan 01 2016 06:00:00.000

'2'

'B'

Fri Jan 01 2016 07:00:00.000

'2'

'C'

Fri Jan 01 2016 08:00:00.000

'3'

'A'

Fri Jan 01 2016 09:00:00.000

'3'

'C'

Fri Jan 01 2016 10:00:00.000

'3'

'D'

Fri Jan 01 2016 11:00:00.000

'3'

'A'

Fri Jan 01 2016 12:00:00.000

'3'

'C'

Fri Jan 01 2016 13:00:00.000

'3'

'D'

Fri Jan 01 2016 14:00:00.000

Result

Column1 : string

Column2 : string

'1'

'A'

'1'

'B'

'1'

'C'

'1'

'D'

'3'

'A'

'3'

'C'

'3'

'D'

'3'

'A'

'3'

'C'

'3'

'D'

###### Activity sets

Activity sets are an alternative way to express a choice between different activities. An activity set matches any activity that is given in its defintion. Activity sets are defined as a (optionally comma-separated) list of activity names enclosed by square brackets. In contrast to choices, activity sets can only be defined for activities, but not for arbitrary regular expressions. However, activity sets can be inverted by adding a ! between the opening bracket and the first activity name, so the activity set matches the complement of the defined activities. This way, it is easy to express that all activities except one ore more certain activities should match.

###### Syntax
 "[" [ ! ] <activity_name> ( , <activity_name> )* "]"

###### Examples

[10]

Matches each variant that contains an activity 'A', 'B', or 'F'.

Query

Filter

         FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , [ 'A' , 'B' , 'F' ] ) = 1;


Column1

         "Table1"."CASE_ID"


Column2

         "Table1"."ACTIVITY"


Input

Output

Table1

CASE_ID : string

ACTIVITY : string

startTimestamp : date

'1'

'A'

Fri Jan 01 2016 01:00:00.000

'1'

'B'

Fri Jan 01 2016 02:00:00.000

'1'

'C'

Fri Jan 01 2016 03:00:00.000

'2'

'E'

Fri Jan 01 2016 04:00:00.000

'2'

'F'

Fri Jan 01 2016 05:00:00.000

'2'

'G'

Fri Jan 01 2016 06:00:00.000

'3'

'X'

Fri Jan 01 2016 07:00:00.000

'3'

'Y'

Fri Jan 01 2016 08:00:00.000

'3'

'Z'

Fri Jan 01 2016 09:00:00.000

Result

Column1 : string

Column2 : string

'1'

'A'

'1'

'B'

'1'

'C'

'2'

'E'

'2'

'F'

'2'

'G'

[11]

Matches each variant that contains an activity which is not 'A', 'B', or 'X'.

Query

Filter

         FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , [ ! 'A' , 'B' , 'X' ] ) = 1;


Column1

         "Table1"."CASE_ID"


Column2

         "Table1"."ACTIVITY"


Input

Output

Table1

CASE_ID : string

ACTIVITY : string

startTimestamp : date

'1'

'A'

Fri Jan 01 2016 01:00:00.000

'1'

'B'

Fri Jan 01 2016 02:00:00.000

'1'

'C'

Fri Jan 01 2016 03:00:00.000

'2'

'A'

Fri Jan 01 2016 04:00:00.000

'2'

'B'

Fri Jan 01 2016 05:00:00.000

'2'

'A'

Fri Jan 01 2016 06:00:00.000

'3'

'X'

Fri Jan 01 2016 07:00:00.000

'3'

'Y'

Fri Jan 01 2016 08:00:00.000

'3'

'Z'

Fri Jan 01 2016 09:00:00.000

Result

Column1 : string

Column2 : string

'1'

'A'

'1'

'B'

'1'

'C'

'3'

'X'

'3'

'Y'

'3'

'Z'

###### Grouping

Regular expressions which contain multiple sub-expressions (e.g., choices or concatenations) can be bracketed to group them in order to to clearify the structure. Grouping may be also necessary to nest regular expressions.

###### Syntax
 "(" <regular_expression> ")"

###### Example

[12]

Matches a sequence of Activities 'A', 'B', and 'C'. The regular expression is grouped by a pair of brackets (without quantifier).

Query

Filter

         FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , ( 'A' >> 'B' >> 'C' ) ) = 1;


Column1

         "Table1"."CASE_ID"


Column2

         "Table1"."ACTIVITY"


Input

Output

Table1

CASE_ID : string

ACTIVITY : string

startTimestamp : date

'1'

'A'

Fri Jan 01 2016 01:00:00.000

'1'

'B'

Fri Jan 01 2016 02:00:00.000

'1'

'C'

Fri Jan 01 2016 03:00:00.000

'1'

'D'

Fri Jan 01 2016 04:00:00.000

'2'

'A'

Fri Jan 01 2016 05:00:00.000

'2'

'A'

Fri Jan 01 2016 06:00:00.000

'2'

'B'

Fri Jan 01 2016 07:00:00.000

'2'

'C'

Fri Jan 01 2016 08:00:00.000

'3'

'A'

Fri Jan 01 2016 09:00:00.000

'3'

'B'

Fri Jan 01 2016 10:00:00.000

'3'

'B'

Fri Jan 01 2016 11:00:00.000

'3'

'C'

Fri Jan 01 2016 12:00:00.000

Result

Column1 : string

Column2 : string

'1'

'A'

'1'

'B'

'1'

'C'

'1'

'D'

'2'

'A'

'2'

'A'

'2'

'B'

'2'

'C'

###### Quantifiers

A quantifier defines that a regular expression may occur a variable number of times. Quantifiers must always be applied to a group expression. There are three different quantifiers available:

• * The regular expression may occur an arbitrary number of times, i.e. it occurs zero or more times.

• + The regular expression occurs at least once, i.e. it occurs one ore more times.

• ? The regular expression is optional, i.e. it may occur exactly once or not.

• {<from>, <to>} The regular expression may occur between <from> and <to> times (ie. <from>, <from>+1, <from>+2, ..., <to>-1, <to>). Note that both <from> and <to> are included in the range. For example, ('A'){1, 3} will match all variants that contain the activity 'A', one or two or three times. You can use this quantifier to make your regular expressions more readable:

• 'A' >> 'A' >> ('A')? >> ('A')? becomes ('A'){2, 4},

• ('A' >> 'A' >> 'A') | ('A' >> 'A' >> 'A' >> 'A') becomes ('A'){3,4},

• ('A')? | ('A' >> 'A') | ('A' >> 'A' >> 'A' >> 'A') | ('A' >> 'A' >> 'A' >> 'A' >> 'A') becomes ('A'){0,2} | ('A'){4,5}.

If <from> and <to> are equal then it's enough to specify <from>, for example ('A'){3, 3} can be written as ('A'){3}. The arguments <from> and <to> must satisfy all of the following conditions:

• both <from> and <to> must be integers that are greater or equal zero,

• <from> and <to> can't both be zero,

• <from> must be less or equal <to>.

###### Syntax
 "(" <regular_expression> ")" ( "+" | "?" | "*" | "{" <from> ["," <to>] "}")

###### Examples

[13]

Matches each variant that has an arbitrary number of activities 'B' between activities 'A' and 'C'.

Query

Filter

         FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , 'A' >> ( 'B' ) * >> 'C' ) = 1;


Column1

         "Table1"."Case"


Column2

         "Table1"."Activity"


Input

Output

Table1

Case : int

Activity : string

Timestamp : date

1

'A'

Fri Jan 01 2016 01:00:00.000

1

'B'

Fri Jan 01 2016 02:00:00.000

1

'C'

Fri Jan 01 2016 03:00:00.000

2

'A'

Fri Jan 01 2016 04:00:00.000

2

'B'

Fri Jan 01 2016 05:00:00.000

2

'B'

Fri Jan 01 2016 06:00:00.000

2

'C'

Fri Jan 01 2016 07:00:00.000

3

'A'

Fri Jan 01 2016 08:00:00.000

3

'C'

Fri Jan 01 2016 09:00:00.000

Result

Column1 : int

Column2 : string

1

'A'

1

'B'

1

'C'

2

'A'

2

'B'

2

'B'

2

'C'

3

'A'

3

'C'

[14]

Matches each variant that has at least one activity 'B' between activities 'A' and 'C'.

Query

Filter

         FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , 'A' >> ( 'B' ) + >> 'C' ) = 1;


Column1

         "Table1"."Case"


Column2

         "Table1"."Activity"


Input

Output

Table1

Case : int

Activity : string

Timestamp : date

1

'A'

Fri Jan 01 2016 01:00:00.000

1

'B'

Fri Jan 01 2016 02:00:00.000

1

'C'

Fri Jan 01 2016 03:00:00.000

2

'A'

Fri Jan 01 2016 04:00:00.000

2

'B'

Fri Jan 01 2016 05:00:00.000

2

'B'

Fri Jan 01 2016 06:00:00.000

2

'C'

Fri Jan 01 2016 07:00:00.000

3

'A'

Fri Jan 01 2016 08:00:00.000

3

'C'

Fri Jan 01 2016 09:00:00.000

Result

Column1 : int

Column2 : string

1

'A'

1

'B'

1

'C'

2

'A'

2

'B'

2

'B'

2

'C'

[15]

Matches each variant that has no or one activity 'B' between activities 'A' and 'C'.

Query

Filter

         FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , 'A' >> ( 'B' ) ? >> 'C' ) = 1;


Column1

         "Table1"."Case"


Column2

         "Table1"."Activity"


Input

Output

Table1

Case : int

Activity : string

Timestamp : date

1

'A'

Fri Jan 01 2016 01:00:00.000

1

'B'

Fri Jan 01 2016 02:00:00.000

1

'C'

Fri Jan 01 2016 03:00:00.000

2

'A'

Fri Jan 01 2016 04:00:00.000

2

'B'

Fri Jan 01 2016 05:00:00.000

2

'B'

Fri Jan 01 2016 06:00:00.000

2

'C'

Fri Jan 01 2016 07:00:00.000

3

'A'

Fri Jan 01 2016 08:00:00.000

3

'C'

Fri Jan 01 2016 09:00:00.000

Result

Column1 : int

Column2 : string

1

'A'

1

'B'

1

'C'

3

'A'

3

'C'

[16]

Matches each variant that has 1 or 2 consecutive 'B' activities between activities 'A' and 'C'.

Query

Filter

         FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , 'A' >> ( 'B' ) { 1 , 2 } >> 'C' ) = 1;


Column1

         "Table1"."CASE_ID"


Column2

         "Table1"."ACTIVITY"


Input

Output

Table1

CASE_ID : string

ACTIVITY : string

startTimestamp : date

'1'

'A'

Fri Jan 01 2016 01:00:00.000

'1'

'B'

Fri Jan 01 2016 02:00:00.000

'1'

'C'

Fri Jan 01 2016 03:00:00.000

'2'

'A'

Fri Jan 01 2016 04:00:00.000

'2'

'B'

Fri Jan 01 2016 05:00:00.000

'2'

'B'

Fri Jan 01 2016 06:00:00.000

'2'

'C'

Fri Jan 01 2016 07:00:00.000

'3'

'A'

Fri Jan 01 2016 08:00:00.000

'3'

'C'

Fri Jan 01 2016 09:00:00.000

Result

Column1 : string

Column2 : string

'1'

'A'

'1'

'B'

'1'

'C'

'2'

'A'

'2'

'B'

'2'

'B'

'2'

'C'

###### Pitfall
• If not specified to match exactly at the start and/or at the end of the variant, a regular expression may match at any position in the variant as an arbitrary number of activities at beginning / end may be skipped. Consequently, a ? quantifier for example may also match any number of occurrence of its subexpression, if it is the only subexpression or the first/last subexpression of a concatenation.

###### Example

[17]

Matches each variant that has an activity 'B'. Because there is no exact match at start or end of the variant defined, all given variants are matched as the additional activity 'B' for variant 2 is implicitly skipped.

Query

Filter

         FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , ( 'B' ) ? ) = 1;


Column1

         "Table1"."Case"


Column2

         "Table1"."Activity"


Input

Output

Table1

Case : int

Activity : string

Timestamp : date

1

'A'

Fri Jan 01 2016 01:00:00.000

1

'B'

Fri Jan 01 2016 02:00:00.000

1

'C'

Fri Jan 01 2016 03:00:00.000

2

'A'

Fri Jan 01 2016 04:00:00.000

2

'B'

Fri Jan 01 2016 05:00:00.000

2

'B'

Fri Jan 01 2016 06:00:00.000

2

'C'

Fri Jan 01 2016 07:00:00.000

3

'A'

Fri Jan 01 2016 08:00:00.000

3

'C'

Fri Jan 01 2016 09:00:00.000

Result

Column1 : int

Column2 : string

1

'A'

1

'B'

1

'C'

2

'A'

2

'B'

2

'B'

2

'C'

3

'A'

3

'C'

###### Defining and referencing an alias

Sometimes, similar subexpressions may occur at different positions in the regular expression. This may be simple activities as well as complex subexpressions. An alias provides an elegant way to avoid repeating these subexpressions at different positions in the regular expression. It is a simple name that is assigned to a specific regular expression. By this name, the assigned regular expression can be referneced in order to re-use it in another regular expresssion. The regular expressions and their related aliases are given as a comma-separated list. The last regular expression of this list is the root expression which is the entry point for the pattern matching. Therefore, it cannot be referenced by another regular expression. Consequently, the root expression is never assigned to an alias.

###### Tips/Pitfalls
• Avoid to create cyclic definitions of regular expressions when using aliases as cycles will result in an error.

• Aliases can be used as a shortcut for long activity names. This may save a lot of typing and increase the readability of complex regular expressions.

• Be aware that the match at start symbol must be placed before the first aliased expressions, the match at end symbol must be placed after the root expression.

• It is not possible to reference aliases defined in another PROCESS_MATCH_REGEX operator. Only aliases within the same operator can be referenced.

###### Syntax
 <regular_expression> ( AS <alias_name> "," <regular_expression> )*

###### Empty Cases

Empty cases, meaning cases with no activities or only null activities, do never match.

[18]

Example for empty cases (no activities or all activities are null) which are never a match.

Query

Column1

         "CaseTable"."CASE_ID"


Column2

         MATCH_PROCESS_REGEX ( "ActivityTable"."ACTIVITY" , [ ! 'B' ] )


Input

Output

ActivityTable

CASE_ID : string

ACTIVITY : string

TIMESTAMP : date

'1'

'A'

Fri Jan 01 2016 01:00:00.000

'2'

'B'

Fri Jan 01 2016 02:00:00.000

'3'

null

Fri Jan 01 2016 03:00:00.000

CaseTable

CASE_ID : string

'1'

'2'

'3'

'4'

Foreign Keys

 CaseTable.CASE_ID ActivityTable.CASE_ID

Result

Column1 : string

Column2 : int

'1'

1

'2'

0

'3'

0

'4'

0

###### Example

[19]

Matches every variant that contains a sequence of 'A', 'B', 'C', and 'D'. Activities 'A' and 'B' are renamed to 'AliasA' and 'AliasB', respectively.

Query

Filter

         FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , 'B' AS AliasB , 'A' AS AliasA , AliasA >> AliasB >> 'C' >> 'D' ) = 1;


Column1

         "Table1"."CASE_ID"


Column2

         "Table1"."ACTIVITY"


Input

Output

Table1

CASE_ID : string

ACTIVITY : string

startTimestamp : date

'1'

'A'

Fri Jan 01 2016 01:00:00.000

'1'

'B'

Fri Jan 01 2016 02:00:00.000

'1'

'C'

Fri Jan 01 2016 03:00:00.000

'1'

'D'

Fri Jan 01 2016 04:00:00.000

'2'

'E'

Fri Jan 01 2016 05:00:00.000

'2'

'F'

Fri Jan 01 2016 06:00:00.000

'2'

'G'

Fri Jan 01 2016 07:00:00.000

'2'

'I'

Fri Jan 01 2016 08:00:00.000

Result

Column1 : string

Column2 : string

'1'

'A'

'1'

'B'

'1'

'C'

'1'

'D'