Skip to main content

Celonis Product Documentation

Commit Policy

The connector accumulates data into files before it uploads it to EMS. Please check How it works section for details.

The commit policy is a set of rules to be applied by the connector to determine when data is uploaded. The goal is to avoid small files (their file size is in kilobytes) and avoid delaying the records for too long.

There are three configuration parameters to set to control the behavior:

  • parquet file size

  • number of records in the file

  • time since the last write

Once a record has been written to a file associated with a source topic partition, the sink checks if the file should be committed. The file is uploaded if any of the first two criteria are met.

The time since the last write is key to reducing the time for data to be uploaded. There are scenarios where data is not stored in a Kafka topic every few milliseconds or seconds. Depending on the context, there can be a gap of minutes or even hours before new data arrives on a topic. The extreme is for no record to ever arrive at the topic. Since these delays can be common, the first two criteria will take hours to be reached, or it might never be the case. Therefore, any accumulated data should not be delayed from being uploaded to EMS. Thus, the time since the last write offers a stop-gap makes and ensures the data will always be uploaded.

  • Every 10MB, or every 10k records or 30 seconds since the last write

  • Every 25MB, or every 100k records, or every 120 seconds