Skip to main content

Celonis Product Documentation

Creating extraction tasks

As the name suggests, extraction tasks allow you to select the data tables to be extracted from your source system (and imported into the Celonis Platform). When configuring your extractions, you can apply time filters, add pseudonymization, and join tables to each other.

You can either create extraction tasks manually from your data jobs or edit existing extraction tasks (whether originally manually created or as part of a process connector).

Creating and managing extraction tasks

To create an extraction tasks from your data pool diagram:

  1. Click Data Jobs and select an existing data connection scope.

  2. In the extraction row, click + Add.

    creating_extraction_tasks.png
  3. Add an extraction task name (an internal reference only) and click Save.

    new_extraction_task.png

    The task is created and displayed.

  4. Edit your extraction task configuration as required. See: Extraction task configuration.

For more information about your extraction task configuration options, see: Extraction task configuration.

Managing existing extraction tasks

You can manage existing extraction tasks by clicking Options.

manage_existing_extraction_tasks.png

You have the following options here:

  • Rename: Update the name of the extraction task.

  • Enable / disable: Control whether the extraction task should be enabled or disabled for executions.

  • Move up / down: Change the order in which this task is performed in a full execution.

  • Duplicate: Create a copy of the extraction task in the existing data job.

  • Execute: This allows you to manually execute just this task on demand. For more information about executing data jobs, see: Executing data jobs.

  • Execute from here: This allows you to manually execute this and all following tasks on demand. For more information about executing data jobs, see: Executing data jobs.

  • Convert to template /copy to regular task: The task becomes a template and can be added to other data jobs or used to extend the template. If the task is already a template, you can create a regular task from it. For more information about task templates, see: Creating task templates.

  • Delete: This deletes the task and all associated content, with no recovery possible.

  • Download table configuration: This gives you offline access to a zipped file containing any relevant Excel workbook copies of your table configuration.

Extraction task configuration

When creating or edit an existing extraction task, you have the following configuration options available:

Table configuration

This is where you select and configure the data tables you want to extract. Depending on your data connection type, you have the following table configuration options:

  • Column subset: Specify which columns should be extracted by clicking Configure next to the column count.

  • Pseudonymized columns: In the same manner as the column subsets, you can specify if and which columns should be pseudonymized during the extraction.

  • Primary key columns: You can override the primary keys of the source system to be used during delta loading.

Join configuration

You can add one or multiple join partners to the table. Each join partner can either be joined through the primary keys of the tables or through a custom join path. In order for the primary key join to work, the primary keys of the table to be joined need to be included in the primary keys of the base table. You can also add a filter for each joined table.

Time filter
  • Creation date filter: A filter on a data column that will be used in both full and delta loads. This filter will be combined with the "Filter Statement" under "Additional Filters" with an AND condition. So both conditions must be met.

  • Change date filter: A filter on a date column that automatically looks for the maximum date in the existing table and sets a filter to only extract data newer than this maximum date. This filter will be combined with the "Delta Filter Statement" under "Additional Filters" with an AND condition.

Additional filters
  • Filter statement: Using SQL syntax, specify which rows will be extracted, e.g. COLUMN1 > 5.

  • Delta filter: Using SQL syntax, specify which additional filters should be applied when the job executes a delta load. This filter statement is combined with the normal filter with the logical AND operator.

Debug mode

Once enabled, the debug mode provides detailed log information for the data extraction job and will be displayed in the execution logs. This allows for more transparency and easier troubleshooting. This mode is active for three days and the logs created are then deleted.