Temporal CRON latency

Hi Team,

We’ve scheduled our Data Streaming service as a Temporal CRON workflow, which is responsible for pulling data from various sources into our internal data lake. Initially, this workflow was set to run every 10 minutes.

However, this has a latency: if the workflow takes longer than 10 minutes (e.g., 11 minutes), the next scheduled run is skipped, resulting in an additional 9-minute delay before the next execution. I know this is expected.

But since this latency has become bit of concern, to mitigate this, we reduced the CRON interval to 1 minute, so that even if a run is skipped due to a previous one still in progress, the maximum added latency would only be 1 minute.

In practice, the workflow takes at least 5 minutes to complete, so we expect that most of the time, the new CRON triggers will be skipped due to the ongoing execution.

From a functional perspective, this setup helps minimize latency. However, after testing this in one of our clusters, we observed a noticeable increase in CPU usage. While this could be partly due to the streaming service frequently attempting to pull data, we are wondering if Temporal’s internal scheduling or polling logic (due to frequent CRON triggers) could also be contributing to the elevated resource usage.

Could you please let us know:

  1. Is it ideal or advisable to set the CRON schedule to run every 1 minute when we know the workflow almost never completes within that time?
  2. Are there any best practices in Temporal for handling such frequent schedules where most runs will be skipped?
  3. Could frequent skipped CRON executions lead to any long-term performance issues or unnecessary load on the Temporal server

Thank you.

Have you considered using BufferOne Overlap Policy?

Hi Maxim,

We are using Temporal CRON and this overlap policy seems to be for schedules.

Changing from Temporal CRON to schedule, how can we do this. Do you have any code references for the same.

If switching from CRON to Temporal schedules is a straightforward change, we are open to doing that. However, if it’s a disruptive change, we would prefer to continue with CRON for now and get things running.

Please let us know.

I’m not sure I can help you evaluate how disruptive a change it as it is very subjective.

You can also always stop using CRON and write your own workflow that calls sleep in a loop and implements whatever scheduling logic you need.

Okay, I understand. Sure, we will definitely think about the CRON.

One last question, could you please clarify this: “Could frequent skipped CRON executions lead to any long-term performance issues or unnecessary load on the Temporal server”. Am just curious about how this affects temporal.

Mu guess is that you started to run your workflows 10 times more frequently by changing CRON interval from 10 to 1 minute. This certainly should put additional load on your system.