Hi everyone, I’ve been doing some learning tests to figure out if temporal is a good fit for us.
I’ve been trying to deal with failures for example in case a worker shuts down in the middle of running an activity. If I have a running worker - everything works as expected and the activity retries on the second worker (and complete successfully).
My issue now is in case I only have one worker working. The order of actions is as follows:
- run worker
- run workflow with activity that takes ~2sec
- stop worker
- start worker
- wait for workflow to finish
From looking at the logs and the temporalUI I can see that the worker is not picking up on the activity and we end up with only 2 attempts (second one isn’t actually running) and a NonRetryableFailure.
after the timeout:
Then the workflow continues running without anything happening (unless we terminate or define timeout for it).
My questions are:
- Is this the desired behaviour? Meaning if I don’t have a worker running to run the activity it will fail/stay as “zombie”?
- Is there any configuration that can change it?