Standard Data Ingestion API
Limited availability
This functionality is currently in limited availability. If you’re interested in trying it out, get in touch with us through celopeers.com/s/support.
Celonis Academy course available
To learn more about the Standard Data Ingestion API, we recommend this Celonis Academy course:
The Standard Data Ingestion API allows you to push real-time data to the Celonis Platform using your existing IT systems. This AWS S3 compatible API operates on events/notifications that get triggered whenever a new file reaches the API and automatically picks it up and processes it to a data pool.
To help you decide when to use the Standard Data Ingestion API, see the following:
And when you're ready to use the Standard Data Ingestion API, see: Using the Standard Data Ingestion API.
Important
With the Standard Data Ingestion API, the data is still stored in exactly the same infrastructure as before. The main difference is the S3 language used by the API gateway. For more information about S3 buckets, see: The difference between a standard S3 bucket and a S3 compatible bucket.
If you're currently using Microsft Azure, while the API has an AWS S3 flavor, no data is stored on AWS. For details about the AWS S3 API call being made, see: AWS - PutObject.
Use the following decision diagram when deciding which data integration method you should use when connecting to the Celonis Platform:
The Standard Data Ingestion API ensures a seamless integration into your existing IT landscape. This includes:
Full control over the data you send to Celonis: The Standard Data Ingestion API focuses on push based integration scenarios to give you full control over what data you're pushing.
Reuse your existing data applications: Celonis is just another consumer on top of your existing data architecture and doesn’t require any client to be installed.
Use multiple source systems: As the Standard Data Ingestion API can be used with tools like Informatica and Talend, you can efficiently integrate data from multiple source systems.
Using the Standard Data Ingestion API also has the following benefits:
Increased data integration options: The Standard Data Ingestion API is built on top of an Amazon S3 API such that existing applications and software libraries can interact with it out of the box.
Faster data transfer: The Standard data ingestion API significantly accelerates the speed at which data can be integrated into Celonis compared to previous methods.
Infrastructure built for scale: The Standard Data Ingestion API is build on top of cloud native object storages to scale for massive data volumes.
Support for nested data, e.g. nested JSON arrays: Nested data such as JSON is unnested automatically on the fly so it can directly be consumed in a columnar format.
UI integration: The Standard Data Ingestion API is embedded into the UI so you can configure your desired table schema (column names, primary keys etc.)
Automatic schema evolution: In case of schema changes, the schema automatically evolves and has no impact on the existing pipeline. This means that if a new column is part of the pushed data, it will automatically be added to the target table in Celonis. This also applies when a column was initially defined as VARCHAR(80) but then a new record, which exceeds the 80 characters gets pushed. In this case, the target table this will automatically extend the target column to the required VARCHAR length to not cut of any record.
In addition to the benefits of using the Standard Data Ingestion API, you should also consider the following implications:
An increased dependency on IT departments
To successfully ingest data via this method, IT departments must provide the source system data via ETL tools or cloud data lakes.
As part of an expansion into a new process domain, it most likely requires involvement of the IT department again to add the additional data.
Unable to take advantage of native data integration capabilities:
Capabilities provided by Celonis extractors such as defining extraction filters, removing columns, and changing data types are not provided and therefore must be handled by users.
Scheduling of data ingestion must also be handled by users and you may need to rely on such capabilities provided by your source systems (for example Snowflake, Databricks etc)..
The following integration options are possible:
Integrating with Databricks
The below video is a demo example of using the Standard Data Ingestion API with Databricks:
For more information about connecting to Databricks, see: Databricks.
Standard S3 bucket | S3 compatible bucket (Standard Data Ingestion API) |
---|---|
For a standard Amazon S3 bucket, the endpoint URL follows a specific format based on the AWS region where the bucket is located. Here’s a typical structure: https://<bucket-name>.s3.<region>.amazonaws.com/<object-name> | For a S3 compatible bucket (used by the Standard Data Ingestion API), the endpoint URL usually has a vendor specific format. Here's the typical structure: https://<team>.<cluster>.celonis.cloud/api/data-ingestion/<bucket-name>/<objectName> |
Example: If you have a bucket named my-bucket located in the us-west-2 region with an object called test, the endpoint URL would be: https://my-bucket.s3.us-west-2.amazonaws.com/test | Example: For the team dev on cluster us-1 with connection 4b3433b3-3135-499f-a35e-5ea7a22f0cbf and target table snowflake1: https://dev.us-1.celonis.cloud/api/data-ingestion/continuous/connection/4b3433b3-3135-499f-a35e-5ea7a22f0cbf/snowflake1/ |