Skip to main content

Celonis Product Documentation

Extracting and transforming data

Once a connection with your data source is established, you need to create and execute data jobs. Data jobs are tasks that extract and transform the data you require from your source systems, ensuring that only the relevant data is integrated with the Celonis Platform. Data jobs, and their related tasks, can be either created manually or added to your data pool as part of a process connector.

To create and manage data jobs manually, see: Creating and managing data jobs.

And once created, data jobs can consist of three task components:

  • Extraction tasks

  • Transformation tasks

  • Data model load task

For more information about data job task components, see: Data job task components.

Creating and managing data jobs

To create a data job from your data pool diagram:

  1. Click Data Jobs.

  2. Click Add Data Job.

  3. Enter a unique name and select your data job type:

    • Data connection jobs: These are based on existing data connections, allowing you to extract and transform data from a specified source system.

    • Global data jobs: These allow you to transform existing data only and are therefore best used when unifying data from multiple connections or tables.

  4. Click Save.

    The data job is now saved to your data pool and available to configure further. For more information about configuring your data jobs, see: Data job task components.

Managing existing data jobs

Once created, you can manage your existing data jobs by click Options:


The following options are available:

  • Rename: Update the name of the data job.

  • Change data connection: Available for data connection jobs only, allowing you to change the data connection associated with this job..

  • Duplicate: Create a duplicate of the data job and all existing content within the same data pool.

  • Copy To: Create a copy of the data job within other Celonis Platform teams or data pools. When creating a copy, the following content is not copied: Data model loads, job alerts, and data pool parameters.

  • Execute Data Job: Available for data jobs containing at least one extraction task, this allows you to manually execute the job on demand. For more information about executing data jobs, see: Executing data jobs.

  • Configure alerts: Configure when to send and receive email alerts for this data job, allowing you to be notified based on specified events. For more information, see: Enabling data job alerts.

  • Force Cancel Executions: Cancels the execution of the data job that is currently running. For more information about executing data jobs, see: Executing data jobs.

  • Execution settings: Configure further details about your data job executions. For more information about executing data jobs, see: Executing data jobs.

  • Delete: This deletes the data job and all associated tasks, with no recovery possible.

Data job task components

Data jobs consist of three task components: Extraction tasks, transformation tasks, and data model load tasks.

Extraction tasks (data connection jobs only)

As the name suggests, extractions allow you to select the data table to be extracted from your source system (and imported into the Celonis Platform). When configuring your extractions, you can apply time filters, add pseudonymization, and join tables to each other.

For more information about extraction tasks, see: Creating extraction tasks.

And for using partitioned extractions of large tables, see: Enabling partitioned extractions of large tables.

Transformation tasks

Transformations are then used to create event logs from your extracted data and are written in SQL (using the Vertica SQL syntax). Transformations help to clean up, restructure, and process data, allowing it to be used in data models. These data models are then consumed by other Celonis Platform features such as the Studio.

For more information about transformation tasks, see: Creating transformation tasks.

Data model load tasks

You can then load your extracted and transformed data directly into a data model, simplifying the process of making your data usable in other Celonis Platform features. Loading data directly into a data model is particularly useful when configuring schedules, ensuring that your data models are based on the latest data at predefined and regular intervals.

For more information about data model load tasks, see: Creating data model load tasks.