Skip to main content

Celonis Product Documentation

Optimal Data Job Scheduling

If you have multiple scheduled Data Jobs, these are the lowest hanging fruits which will speed up your data pipeline and increase freshness of data.

Don’t schedule everything at the same time

People instinctively schedule everything at full hours (e.g. 10.00). This results in spikes in the utilization of computing resources. Before the full hours, the resources are underutilized.


If you have scheduled more than one job, make sure they are not set to run at the same times. Let’s assume you have 3 jobs. Each runs every hour at full hour for 20 minutes. The compute resource is used heavily for 20 minutes and remain idle for 40 minutes every hour. Instead, schedule jobs to run after another. I.e. one job would run at full hour (x:00), one 20 minutes past (x:20), one 40 minutes past (x:40). After the change, you will most likely notice that the tasks run in less than 20 minutes. To check at what times the jobs run, check the Schedules and Data Jobs logs.

In case of idle time, increase the schedule frequency

If you run your data jobs for example once a day and it takes 3 hours to run them (after implementing the recommendation above), the compute resource remains idle for 21 hours. If that’s the case, why not increase the schedule frequency? The data will be more up-to-date. In every company there will be always a time when someone will need fresher data. Schedule your jobs in a way that the idle periods are short. There should be a “buffer” break between the jobs to compensate for fluctuations in duration. How long should be the buffer? This depends on how big are the fluctuations of your data job times. You can calculate them based on data in Data Job logs.