Creating extraction tasks using the Extractions Editor and AI Assistant
Limited availability
This functionality is currently in limited availability. If you’re interested in trying it out, get in touch with us at Celopeers.
Privacy Statement and Customer Consent
The use of AI Assistant for the Extractions Editor is optional. The feature can also be deactivated at anytime on a Celonis Platform team admin.
The AI Assistant for Extractions Editor does not qualify as a high-risk AI system as defined in the EU AI Act and is not designed for or intended to be used for any purpose(s) specified in Article 6 of the EU AI Act or otherwise prohibited under the EU AI Act. Customers are solely responsible for implementing appropriate technical and organizational measures to ensure that AI Features are not used for such purposes.
Customers accept responsibility for their queries and data transmission to the AI Assistant, including any impacts on their infrastructure and data. Customers must agree to follow best practice and use the least privileged database user to connect to the Celonis Platform and the least privileged operating system user to run their database systems in order to minimize the impact of compromise.
For further information regarding the leveraged Large Language Models consult the AI Model cards of the respective provider.
Extraction tasks allow you to select the data tables to be extracted from your source system (and imported into the Celonis Platform). You can now create, validate, and preview extraction tasks for JDBC source systems using the Extractions Editor and AI Assistant, allowing you to dynamically write SELECT SQL queries and use your source system's SQL dialect.
For more information about the Extractions Editor and AI Assistant see: Extractions Editor and AI Assistant overview
And for a video demo of these features:
To create extraction tasks using the Extractions Editor and AI Assistant from your Data Pool diagram:
Click Data Jobs and select an existing data connection scope.
In the extraction row, click + Add.
Enter an extraction task name, select SQL editor, and then click Save.
The SQL editor loads.
Using the Extractions Editor and AI Assistant, configure your extraction task by writing an SQL statement.
For more information about the features available here and the task settings, see: Extraction task configuration and settings.
Click Save.
The extraction task is saved and can be managed in the following ways:
You can manage existing extraction tasks by clicking Options.
You have the following options here:
Rename: Update the name of the extraction task.
Enable / disable: Control whether the extraction task should be enabled or disabled for executions.
Move up / down: Change the order in which this task is performed in a full execution.
Duplicate: Create a copy of the extraction task in the existing data job.
Execute: This allows you to manually execute just this task on demand. For more information about executing data jobs, see: Executing data jobs.
Execute from here: This allows you to manually execute this and all following tasks on demand. For more information about executing data jobs, see: Executing data jobs.
Convert to template /copy to regular task: The task becomes a template and can be added to other data jobs or used to extend the template. If the task is already a template, you can create a regular task from it. For more information about task templates, see: Creating task templates.
Delete: This deletes the task and all associated content, with no recovery possible.
Download table configuration: This gives you offline access to a zipped file containing any relevant Excel workbook copies of your table configuration.
Extractions Editor and AI Assistant features
When using the SQL editor and AI Assistant to create your extraction tasks, the following features are available:
Schema explorer: The schema explorer allows you to search for and select tables from your Data Model.
Target table configuration: This allows you to setup and configure the table into which your extracted data will be imported. You can assign primary keys and select if the column should be pseudonymized. It’s mandatory to provide the name for the table.
Validate SQL: This validates your current SQL statement and either confirms that it would successfully run or highlights errors for you to address. Please note, this feature is still under development and will be updated further soon.
Preview SQL: This runs your current SQL statement and gives you a preview of the extraction tasks and their associated logs. This is not a complete extraction, we only run the query with a limit of 100 records to ensure the query works and can fetch the data, once extraction is saved and executed.
Download SQL: This gives you offline access to your SQL statement as a .sql file.
AI Assistant: To generate an SQL query using the AI Assistant, you select the tables you would like to include in the query and then type prompts. We recommend keeping the selection limited to 4-5 tables for optimal performance.
Undo / redo: Efficiently undo or redo the most recent changes to your SQL statement.
Extraction settings - Debug mode, delta load configuration, connector parameters
The available extraction settings depend on the source system that you're connecting to.
Debug mode: Once enabled, the debug mode provides detailed log information for the data extraction job and will be displayed in the execution logs. This allows for more transparency and easier troubleshooting. This mode is active for three days and the logs created are then deleted.
Convert to a delete job: Converting this extraction job to a delete job will cause all records found by this job to be deleted in your data pool.
Delta load configuration: Select from either:
Option A - Standard: The delta load will be cancelled, when metadata in your source system changes (i.e. new columns) compared to already extracted data in the IBC. Metadata changes will not be ignored. A full reload or manual changes of your table is necessary to clear this conflict.
Option B - Including metadata changes: The delta load will include metadata changes (i.e. new columns) and run through with a warning. Only delta loaded rows will include data for the newly added columns. Rows which are not part of the delta load will have the newly added columns nullified. This can lead to inconsistency in your data. A full load is recommended for data consistency.
Connector parameters: The following connector parameters can be configured:
Batch size: Allows specifying the batch size (in records) for one extraction request.
Max string length: Allows the modification of the default length (80 characters) of String-type columns.This is configured using the parameter: MAX_STRING_LENGTH
Binary data type handling: Table column with binary data type can be represented in two ways: UTF- 8 or HEX_NOTATION. Depending on the value specified here the binary value will be converted.
Change default metadata source: Depending on your database, you can select from:
DRIVER_METADATA: This metadata source is supported by all source systems and mostly it is the default one. Here the driver internally runs the metadata Query against the source system and fetches the result set.
SAMPLE_QUERY: This metadata source is supported by all source systems. This also works the same as driver metadata, only the query used is different.
INFORMATION_SCHEMA: This metadata source is supported mainly by Oracle system. And it's a default metadata source for Oracle 11g.
PG_CATALOG: This metadata source is supported by Amazon Redshift. And it's a default metadata source.
Limit total number: Set the maximum number of records to be extracted per job.