Skip to main content

Celonis Product Documentation

Custom Data Pipeline Monitoring

Note

The tables of the Custom Data Pipeline Monitoring are created with the first events captured after enabling this feature. No past events are captured. The table data_consumption_updated_events is created with the first APC calculation for the team, which can take up to several days. Until then, the "Data Consumption Monitoring" Data Job will fail. The table replication_execution_finished_events is created with your first Replication Cockpit Execution. If the Replication Cockpit is not set up, the "Replication Cockpit Monitoring" Data Job will fail.

How does it work

The Custom Data Pipeline Monitoring allows you to leverage Views and Analyses to monitor your data pipeline health.

System events of Data Integration are collected and made available in a dedicated Data Pool called "Monitoring Pool" in form of four tables. You can adjust these tables to your needs using Transformations and load them into Data Models to use the information in Analyses, Views, or Signals.

The events are collected centrally for all Data Pools in the Monitoring Pool starting with the enablement of the feature in one comprehensive table for each use case: Data Consumption, Data Job Executions, Data Model Loads, and Replication Cockpit Executions.

The Monitoring Pool comes with pre-configured Transformations and Data Models and a related, ready-to-use Studio package called Data Pipeline & Consumption Monitor can be installed from the Marketplace.

How does the Monitoring Pool differentiate from a normal Data Pool?

The dedicated Monitoring Pool:

  1. Does not allow any data intake, i.e. Data Connections, File Uploads, Data Transfer - Import (Export is allowed), Replication Cockpit, Streaming Cockpit, and Extractor Builder are disabled for this Data Pool.

  2. Is excluded from Data Model license calculations.

  3. Cannot be copied or exported.

  4. Versions cannot be copied into this pool or from this pool to another pool.

  5. Is not included in the APC calculation.

  6. Contains pre-configured Transformations and Data Models that prepare the data for easier usage.

  7. Fits ready-to-use Views and Analyses that can be downloaded from the Marketplace.

    Warning

    Bug in Data Consumption Monitor

    Please check the PQL formula in the Data Consumption Monitor for the KPI Current_APC_per_Table_Bytes.

    It used to be:

    PU_LAST("data_tables", "data_consumption_events"."size_in_bytes", order by "data_consumption_events"."created_at")

    To accommodate deleted tables, this was changed to:

    PU_MAX (
      "data_tables" ,
      "data_consumption_events"."size_in_bytes" ,
      "data_consumption_events"."created_at"
      = PU_MAX(CONSTANT(), "data_consumption_events"."created_at")
    )

    If you still use the old calculation in your analysis, please replace it with the new one.

How to set it up
Enabling the Custom Data Pipeline Monitoring
  1. Go to Data Integration.

  2. Click Monitoring in the top bar (indicated by the Monitoring icon) (only visible to Admins).

    Bildschirmfoto_2022-11-07_um_17_31_36.png
  3. Click the Enable Custom Monitoring button.

    Bildschirmfoto_2022-11-07_um_17_33_36.png
  4. A new Data Pool is called "Monitoring Pool" is automatically created.

    Warning

    Do not rename the new Data Pool.

  5. Navigate to the new "Monitoring Pool" Data Pool.

  6. When opening the "Monitoring Pool" Data Pool, you will see the tag "Monitoring Pool" in the top left.

Disabling the Custom Data Pipeline Monitoring
  1. Go to Data Integration.

  2. Click Monitoring on the left-hand side navigation.

  3. Click the Disable Custom Monitoring button.

  4. The monitoring events will no longer be pushed to the Monitoring Pool; the Monitoring Pool is not deleted automatically.

  5. You can delete the Monitoring Pool in the same way as any other Data Pool on the three-dot menu. If you delete the Monitoring Pool without manually disabling the Custom Monitoring, it will be disabled automatically.

  6. If you enable the Custom Monitoring again and the Monitoring Pool still exists, the Monitoring Events are pushed there again.

  7. If you enable the Custom Monitoring again and the Monitoring Pool was deleted, a new Monitoring Pool is installed to which the Monitoring Events are pushed.

Migration for teams that used the old setup

If you are using the old setup of the Data Pipeline Monitoring, i.e. you set one of your Data Pools as Monitoring Pool by clicking on "Set as Monitoring Target" on one of your Data Pools as displayed below, we kindly ask you to migrate to the new setup. The "Set/ Unset as Monitoring Target" button is no longer available. Nevertheless, when you selected one Data Pool as Monitoring Pool before, the monitoring events are still pushed there until you migrate to the new setup.

41196611.png

Why should I migrate to the new setup?

The newly created dedicated Monitoring Pool:

  1. is excluded from the APC calculation

  2. contains pre-configured Transformations and Data Models that prepare the data for easier usage

  3. fits ready-to-use Views and Analyses that can be downloaded from the Marketplace

How can I migrate to the new setup?

Follow the steps described in the Enabling the Custom Data Pipeline Monitoring section. Once you clicked the Enable Custom Monitoring button and confirmed the message, the migration will be conducted automatically. The monitoring tables will no longer be pushed to the Data Pool you selected manually, instead they will be pushed to a newly installed, dedicated Data Pool called "Monitoring Pool". This action cannot be undone. The so far collected data in the manually selected Data Pool of the old setup will be retained in the old Data Pool. New tables are created in the new monitoring pool, i.e. the data is not migrated.

Collected Data

Data is collected for Data Consumption updates, Data Job executions, Data Model loads, and Replication Cockpit loads.

Data Consumption

When the data consumption is calculated an event per table and pool is generated including the information on how many bytes the table contains. The recalculation frequency of the APC differs. New calculations create new events, that then can be added to the table data_consumption_updated_events.

Column

Description

id

Unique identifier for each entry

schema_version

The schema version shows the version of the table configuration of the table data_consumption_updated_events

created_at

When did Data Integration log this event

data_pool_id

Corresponding Data Pool ID of the table

data_pool_name

Corresponding Data Pool name of the table

data_connection_id

Corresponding Connection ID of the table

data_connection_name

Corresponding Connection name of the table

data_connection_type

Corresponding Connection type of the table, e.g. SAP ECC

table_name

Table name

size_in_bytes

APC Consumption in bytes

Data Job Executions

For each Data Job execution step, i.e. for each status, an execution can attain (QUEUED, RUNNING, SUCCESS, CANCEL, FAIL), an event is generated. These events are added to the table data_job_state_updated_events every five minutes.

Column

Description

id

Unique identifier for each entry

schema_version

The schema version shows the version of the table configuration of the table data_job_state_updated_events

created_at

When did Data Integration log this event

data_pool_id

Corresponding Data Pool ID of the event

data_connection_id

Corresponding Connection ID of the event

data_connection_name

Corresponding Connection name of the event

data_connection_type

Corresponding Connection type of the event, e.g. SAP ECC

execution_id

Execution ID of the overall execution that triggered this execution

execution_item_id

Execution ID of a single object - either an extraction, transformation, schedule or data job

scheduling_id

Schedule ID, only filled if the event relates to a schedule

data_job_id

Data Job ID, only filled if the event relates to a data job

task_id

Task ID, only filled if the event relates to an extraction or transformation

step_id

Step ID, only filled if the event relates to an extraction step which corresponds to a table

name

Name of the execution item

status

The execution status: success, fail, queued, running, cancel

message

Stores the error message if the execution failed

execution_type

The type of the execution: data job - JOB, transformation/ extraction - TASK, schedule - SCHEDULE, extraction step (table) - STEP

mode

Extraction mode: full load - FULL, delta load - DELTA

data_pool_name

Corresponding Data Pool Name of the event

scheduler_cron_pattern

Cron pattern for schedules

task_type

A task can be either a transformation (TRANSFORMATION) or an extraction (EXTRACTION)

scheduling_name

Schedule name, only filled if the event relates to a schedule

Data Model Loads

For each Data Model load step, i.e. for each status, an execution can attain (RUNNING, SUCCESS, ERROR, WARNING, CANCELED), an event is generated. These events are added to the table data_model_load_events every five minutes.

Column

Description

id

Unique identifier for each entry

schema_version

The schema version shows the version of the table configuration of the table data_model_load_events

created_at

When did Data Integration log this event

data_model_load_id

Each triggered data model load has a unique ID on compute side, which is set when engine load starts

data_model_id

Each data model has a unique ID

load_history_id

Each triggered data model load has a unique ID, shared among all the events in one execution

status

The data model load status: running, success, error, warning, canceled

execution_type

Execution step: the table export is done first (TABLE_EXPORT) then the engine load follows (ENGINE_LOAD) which results in an engine table (ENGINE_TABLE).

All status changes meanwhile this process are tracked with the execution type DATA_MODEL_LOAD

load_type

A data model can be loaded from cache (CACHE) or from scratch (COMPLETE) or partially (PARTIAL)

data_model_name

Name of the data model that was loaded

data_pool_id

Corresponding Data Pool ID of the data model

data_pool_name

Name of the data pool in which the data model was loaded

table_name

Data Models are loaded by table - the individual table name

table_alias

Corresponding table alias to the table name

data_connection_id

Corresponding data connection ID from which the table is loaded

data_connection_name

Corresponding data connection name from which the table is loaded

data_connection_type

Corresponding data connection type, e.g. SAP ECC, from which the table is loaded

row_count

Amount of rows in a table in the data model (calculated in the step execution_type ENGINE_TABLE )

message

Stores the message if the data model load fails or throws a warning

execution_id

Execution ID of the overall execution that triggered this execution

execution_item_id

Execution ID of the data model, if the data models are integrated in the data jobs

job_execution_item_id

Execution ID of the data job if the data model load was triggered as part of a job execution

data_pool_version

Version number of the Data Pool Version the load ran on

Replication Cockpit Executions

For each finished Replication Cockpit execution, an event is generated that contains i.e. the status and start/end times of the Extractions and Transformations. These events are added to the table replication_execution_finished_events every five minutes.

Column

Description

id

Unique identifier for each entry

created_at

The time when the event was created

schema_version

The schema version shows the version of the table configuration of the table replication_execution_finished_events

execution_id

Each execution has a unique ID

replication_id

Each Replication (table) has a unique ID

table_name

The name of the table that is being replicated

data_pool_id

The ID of the Data Pool

data_pool_name

The name of the Data Pool

data_source_id

The ID of the Data Source (Data Connection)

data_source_name

The name of the Data Source (Data Connection)

start_time

Time when the replication started

end_time

Time when the replication ended

extraction_start_time

Time when the extraction started

push_job_created_time

Time when the Data Push Job was created

push_job_executed_time

Time when the Data Push Job was executed

extraction_end_time

Time when the extraction finished

transformation_start_time

Time when the transformation started

transformation_end_time

Time when the transformation ended

status

Overall status of the replication

extraction_status

Status of the extraction

transformation_status

Status of the transformation

cycle

Execution ID of the overall execution that triggered this execution

extracted_records

The number of records that got extracted

extracted_deletion_records

The number of deleted records that got extracted