Execute a workflow with the same WorkflowID with another one which is having WorkflowTaskTimeout

Hi,

I have this workflow running and sometimes it is having this WorkflowTaskTimedOut.
The Workflow gets retried when hitting this error.

However, when I try to execute a new workflow with the same WorkflowID. The command gets no error and this new one shares the same RunID as the previous one and it still keeps the previous one’s input.
That means the second workflow execution is dropped.

Also during that, I have this error message
Cached state staled, new task has unexpected events.

What should I do if I want to have a new workflow with the same WorkflowID to run after the first one already completes?

Thanks a lot.

Hi!

If the Workflow Task is timing out - but the Workflow Execution is still running, then you won’t be able to start a new Workflow Execution with the same Id.

By default, once the first Workflow Execution completes, then you can start another one with the same Workflow Id. What is a Workflow Id Reuse Policy? | Temporal Documentation
This is because “Allow Duplicate” is the default Workflow Id Reuse Policy.

The problem you are facing it sounds like - is that the Workflow Task is timing out, but the Workflow Execution is not finishing… and the “unexpected events” error you are getting sounds like it might be a deterministic code problem.

Which SDK are you using?
Have you tried replaying the history using a Workflow Replayer?

1 Like

Just to add, could you share your workflow history for this exec?
You can get it via tctl:

tctl wf show -w <wfid> -r <runid> --output_filename myhistory.json

The Workflow gets retried when hitting this error.

Unlike activities, workflows do not have a default retry policy, you have to enable it via WorkflowOptions-RetryOptions, assume your workflow task is being retried, waiting for a fix for the failure.

1 Like

Thanks @tihomir and @Cully ,

Here is the workflow history I exported

The timed out activity is at eventID 10

By the way, I try to spin up 3 workers in one process/program but all workers come back as 1 WorkerID. Is it expected or I did something incorrectly?

Btw, I updated code to use Workflow Replay and the replay doesn’t have any error.

Thanks a lot.

By the way, I try to spin up 3 workers in one process

There isn’t a need to set up multiple workers within a single process. You can run multiple worker processes to achieve fault tolerance and scalability.

Looking at your history (specifically event 10 - WorkflowTaskTimedOut), seems after your FindQualifiedReward activity completed, the next workflow task was scheduled (event 9) but your worker was either down or was overloaded and didnt pick it up within 5s (ScheduleToStart timeout for sticky queue). This task then timed out and server placed it on the global task queue for any worker to pick it up (see event 11 taskQueue kind “Normal”).

Take a look at the worker tuning guide in docs and set up SDK metrics so you can see better whats going on.

1 Like