Hi Team,
We’ve scheduled our Data Streaming service as a Temporal CRON workflow, which is responsible for pulling data from various sources into our internal data lake. Initially, this workflow was set to run every 10 minutes.
However, this has a latency: if the workflow takes longer than 10 minutes (e.g., 11 minutes), the next scheduled run is skipped, resulting in an additional 9-minute delay before the next execution. I know this is expected.
But since this latency has become bit of concern, to mitigate this, we reduced the CRON interval to 1 minute, so that even if a run is skipped due to a previous one still in progress, the maximum added latency would only be 1 minute.
In practice, the workflow takes at least 5 minutes to complete, so we expect that most of the time, the new CRON triggers will be skipped due to the ongoing execution.
From a functional perspective, this setup helps minimize latency. However, after testing this in one of our clusters, we observed a noticeable increase in CPU usage. While this could be partly due to the streaming service frequently attempting to pull data, we are wondering if Temporal’s internal scheduling or polling logic (due to frequent CRON triggers) could also be contributing to the elevated resource usage.
Could you please let us know:
- Is it ideal or advisable to set the CRON schedule to run every 1 minute when we know the workflow almost never completes within that time?
- Are there any best practices in Temporal for handling such frequent schedules where most runs will be skipped?
- Could frequent skipped CRON executions lead to any long-term performance issues or unnecessary load on the Temporal server
Thank you.