Starting new workflows from the queue Vs Retrying failed workflows

Hi Folks,

I recently started using Temporal to orchestrate some of our use cases using workflows. It revolves around triggering thousands of workflows within a minute or two and there are cases where we might fail and retry some of the workflows depending on the output of the activities. And these retries for now don’t have a maximumRetry set. I do use a proper backoff coefficient and initial intervals in the restart policy.

  • What I wanted to know was if my application is generating a lot of new workflows at the same time as when most of the existing workflows are failing and needs to be retried, how will temporal decide between starting a new workflow that is pending in the queue vs again retrying another failed workflow. Does it add the workflow that needs to be retried to the same queue to maintain ordering?
  • Is there some sort of guarantee that all workflows will at least be given a chance to execute once rather than just retrying the failed workflows again and again which might cause some of workflows in the queue to starve.

Again these might be some very basic doubts, but wanted to get them cleared up to improve my understanding. Thanks in advance for all the help!

There is no “queue of workflows”. There is a queue of workflow tasks. Each workflow generates a task when it has to decide what to do next. If a workflow is waiting on something like an activity completion or a timer then no tasks are pending for this workflow and it is not competing with other workflows for worker resources. Currently, there is no any guarantees of ordering for workflow tasks. So, if you have a lot of open workflows, there is no guarantee that the tasks for the newly added workflows will be prioritized higher.

Gotcha, so even if a workflow needs to restart there is no pending action from it until it actually starts creating new events for its retries. Thanks for the quick response!