Commit Policy
The connector accumulates data into files before it uploads it to EMS. Please check How it works section for details.
The commit policy is a set of rules to be applied by the connector to determine when data is uploaded. The goal is to avoid small files (their file size is in kilobytes) and avoid delaying the records for too long.
There are three configuration parameters to set to control the behavior:
parquet file size
number of records in the file
time since the last write
Once a record has been written to a file associated with a source topic partition, the sink checks if the file should be committed. The file is uploaded if any of the first two criteria are met.
The time since the last write is key to reducing the time for data to be uploaded. There are scenarios where data is not stored in a Kafka topic every few milliseconds or seconds. Depending on the context, there can be a gap of minutes or even hours before new data arrives on a topic. The extreme is for no record to ever arrive at the topic. Since these delays can be common, the first two criteria will take hours to be reached, or it might never be the case. Therefore, any accumulated data should not be delayed from being uploaded to EMS. Thus, the time since the last write offers a stop-gap makes and ensures the data will always be uploaded.
Examples
Every 10MB, or every 10k records or 30 seconds since the last write
connect.ems.commit.size.bytes=10000000 connect.ems.commit.records=10000 connect.ems.commit.interval.ms=30000
Every 25MB, or every 100k records, or every 120 seconds
connect.ems.commit.size.bytes=25000000 connect.ems.commit.records=1000000 connect.ems.commit.interval.ms=120000