Skip to main content

Celonis Product Documentation

Join functionality

Description

Celonis performs implicit joins when a query accesses several tables. This means that the user does not have to define the join as part of a query, but the tables are joined as defined in the Data Model Editor.

Implicit joins

[1]

Here ActivityTable and CaseTable are queried. The customer information of the CaseTable is implicitly joined to the activities of the ActivityTable:

activity_case_table.png

Joins are always executed as left outer equi joins. In this example the ActivityTable is on the left side of the join.

Therefore the activity D, which doesn't have a join partner in the case table, is part of the result, while Case ID 4 is not. Also it is noteworthy that NULL values are never joined, as it is the case for activity E.

It depends on the join cardinality which table is on the left side of the join. Celonis supports 1:N relationships. 1:N relationships means that if for example Table A and B are joined together, every row in B can have zero or one join partner in table A, but not more. There is no limitation in the other direction. Every row in A can have an arbitrary number of join partners.

In N:1 relationships, Celonis automatically detects the table on the N-side based on the loaded data. 1:1 relationships are not natively supported, but are treated as a special case of a 1:N relationship. N:M joins are not supported because they are rare, because they can be transformed into simpler relationships by adding a mapping table in between the two tables, and because they would make it much harder to understand the result.

The N-side is always put on the left side of the join. In this example, it is easy to see that case 1 and case 2 have two join partners in the activity table, while every activity relates to exactly one case. Therefore the ActivityTable is on the left side of the join.

In other words, ActivityTable is the fact table, and CaseTable is the dimension table in that join. These terms are explained in more detail below.

Query

Column1

         "ActivityTable"."Activity"
        

Column2

         "CaseTable"."Customer"
        

Input

Output

ActivityTable

Activity : string

Case : int

'A'

1

'B'

1

'A'

2

'C'

2

'D'

3

'E'

null

CaseTable

ID : int

Customer : string

1

'Customer - X'

2

'Customer - Y'

4

'Customer - Z'

null

'Customer - NULL'

Foreign Keys

CaseTable.ID

ActivityTable.Case

Result

Column1 : string

Column2 : string

'A'

'Customer - X'

'B'

'Customer - X'

'A'

'Customer - Y'

'C'

'Customer - Y'

'D'

null

'E'

null

Join Relationships

Not every table can be joined to every other table within one data model. To understand which tables can be joined together one has to know that Celonis supports an extended snowflake schema for a data model. A snowflake schema consists of one fact table which can have multiple N:1 relationships to dimension tables. These dimension tables can themselves have N:1 relationships to other tables. Celonis extends the classic snowflake schema by supporting multiple fact tables, but no cyclic join graphs. To resolve a cyclic join graph you may have to add a table multiple times to the data model.

The following join graph shows an example of a snowflake schema:

snowflake_schema.png

More information on the snowflake schema can be found in the respective Wikipedia article.

The Common Table

To process a join, Celonis first identifies the common table. In general, the common table of two tables A and B is the table on the N-side among all tables in between A and B, including A and B themselves.

In the above example schema, for Dimension Table C and Dimension Table D, the Dimension Table A is identified as the common table, because it is the table on the N-side between the two tables. This means that if a column from Dimension Table C should be joined with a column from Dimension Table D, Celonis identifies the Dimension Table A as the common table. It then joins the needed columns from Dimension Table C and Dimension Table D to the Dimension Table A. We call this step pulling up the data.

The common table of two tables does not need to be a third table in between, but can also be one of those two tables. In the above example schema, for Dimension Table C and the Fact Table, the Fact Table is identified as the common table because it is the table on the N-side of the join graph between the involved tables.

This works perfectly as long as you have only one possible table on the N-side of the join graph. If you have multiple tables on the N-side, it is not possible to identify a single common table for your join.

The following example schema shows this situation:

snowflake_schema_no_common_table.png

In this example, there is no common table for Fact Table 1 and Fact Table 2, because both tables are on the N-side of the join graph. The same holds for Dimension Table A and Fact Table 2. Please note that all other table combinations do have a common table, i.e.

  • Dimension Table A and Dimension Table B (common table: Fact Table 1),

  • Dimension Table A and Fact Table 1 (common table: Fact Table 1),

  • Dimension Table B and Fact Table 1 (common table: Fact Table 1),

  • Dimension Table B and Fact Table 2 (common table: Fact Table 2).

Understanding the "No common table" error message

In Celonis, two tables can only be joined when they have a common table. So if all tables which are used inside a query do not have a common table, they cannot be joined together, and the following error is returned:

Warning

No common table could be found. The tables ["Fact Table 1"] and ["Fact Table 2"] are connected, but have no common table. This means that they do not have a direct (or indirect) 1:N or N:1 relationship. Join path: [Fact Table 1]N <-- 1![Dimension Table B]!1 --> N[Fact Table 2]. For more information on the join path, search for "Join functionality" in PQL documentation.

As mentioned, the join graph is acyclic, which means there is exactly one join path between any two connected tables. In cases where no common table can be determined between two connected tables, the join path is also shown in the error message (see above). The join path contains the following information:

  • Table name: [Fact Table 1]

  • A table surrounded by exclamation marks breaks the possibility of a common table: ![Dimension Table B]!

  • Join direction: --> or <--

  • Join cardinalities: 1 --> N or N <-- 1

Impact on KPI calculations

When introducing the join to another table inside one KPI, this join affects all other dimensions and KPIs of that particular query. Take a look at the following three-part example:

[2]

This query calculates the average throughput time using the CALC_THROUGHPUT function. Since case 1 has a throughput time of 8 days, and case 2 has a throughput time of 2 days, we get the expected result of 5 days:

Query

Column1

         AVG ( CALC_THROUGHPUT ( CASE_START TO CASE_END , REMAP_TIMESTAMPS ( "ACTIVITIES"."TIMESTAMP" , DAYS ) ) )
        

Input

Output

ACTIVITIES

CASE_ID : int

ACTIVITY : string

TIMESTAMP : date

1

'A'

Fri Jan 01 2021 13:00:00.000

1

'B'

Tue Jan 05 2021 13:00:00.000

1

'C'

Wed Jan 06 2021 13:00:00.000

1

'D'

Sat Jan 09 2021 13:00:00.000

2

'B'

Fri Jan 01 2021 13:00:00.000

2

'D'

Sun Jan 03 2021 13:00:00.000

CASES

CASE_ID : int

1

2

Foreign Keys

CASES.CASE_ID

ACTIVITIES.CASE_ID

Result

Column1 : float

5.0

[3]

Now we add another KPI to our query. It should count the overall number of D-activities. As you can see, the result of the throughput time calculation is now 6 days, although the formula for this KPI and the input data did not change. The result of CALC_THROUGHPUT is on case level (i.e. the throughput time is calculated per case). This means that when we only use CALC_THROUGHPUT in the query, the common table of that query is the Case table.

However, when adding the KPI counting the D-activities, the join to the Activity table is introduced, which means that all columns of the Case table (including the CALC_THROUGHPUT result) are pulled up to the Activity table. The common table of the query has changed; it is now the Activity table. Due to the join, the CALC_THROUGHPUT values were "multiplied" by the number of activities of each case, as the CALC_THROUGHPUT result of one particular case is now related to every single row of that case in the Activity table. The result of the throughput time KPI is therefore calculated as (8+8+8+8+2+2) / 6 (4 times 8 because case 1 has 4 activities and a throughput time of 8; 2 times 2 because case 2 has 2 activities and a throughput time of 2, and this sum is divided by 6 because we calculated the sum based on 6 values):

Query

Column1

         AVG ( CALC_THROUGHPUT ( CASE_START TO CASE_END , REMAP_TIMESTAMPS ( "ACTIVITIES"."TIMESTAMP" , DAYS ) ) )
        

Column2

         COUNT ( CASE WHEN "ACTIVITIES"."ACTIVITY" = 'D' THEN 1 ELSE NULL END )
        

Input

Output

ACTIVITIES

CASE_ID : int

ACTIVITY : string

TIMESTAMP : date

1

'A'

Fri Jan 01 2021 13:00:00.000

1

'B'

Tue Jan 05 2021 13:00:00.000

1

'C'

Wed Jan 06 2021 13:00:00.000

1

'D'

Sat Jan 09 2021 13:00:00.000

2

'B'

Fri Jan 01 2021 13:00:00.000

2

'D'

Sun Jan 03 2021 13:00:00.000

CASES

CASE_ID : int

1

2

Foreign Keys

CASES.CASE_ID

ACTIVITIES.CASE_ID

Result

Column1 : float

Column2 : int

6.0

2

[4]

To get the expected values for both KPIs while having them inside the same query, we could get rid of the join with the Activity table again by calculating the number of D-activities for each case first in a PU_COUNT. As we specify the Case table as the target table of the PU function, the result of the PU function belongs to the Case table rather than the Activity table:

Query

Column1

         AVG ( CALC_THROUGHPUT ( CASE_START TO CASE_END , REMAP_TIMESTAMPS ( "ACTIVITIES"."TIMESTAMP" , DAYS ) ) )
        

Column2

         SUM ( PU_COUNT ( "CASES" , CASE WHEN "ACTIVITIES"."ACTIVITY" = 'D' THEN 1 ELSE NULL END ) )
        

Input

Output

ACTIVITIES

CASE_ID : int

ACTIVITY : string

TIMESTAMP : date

1

'A'

Fri Jan 01 2021 13:00:00.000

1

'B'

Tue Jan 05 2021 13:00:00.000

1

'C'

Wed Jan 06 2021 13:00:00.000

1

'D'

Sat Jan 09 2021 13:00:00.000

2

'B'

Fri Jan 01 2021 13:00:00.000

2

'D'

Sun Jan 03 2021 13:00:00.000

CASES

CASE_ID : int

1

2

Foreign Keys

CASES.CASE_ID

ACTIVITIES.CASE_ID

Result

Column1 : float

Column2 : int

5.0

2