I am using session affinity for a long-running workflow to ensure my activities execute on the same worker. However, I’m facing an issue with graceful worker shutdowns. My current approach involves a ‘health check’ activity at the beginning of the workflow. If the worker is draining, this check fails, but due to session affinity, the workflow gets stuck and cannot be reassigned to another worker.
My current workaround is a complex system of per-worker task queues, which I’d like to replace.
I believe the recommended best practice is to:
- Remove the application-level health check.
-
- Allow the activity to fail when the worker shuts down, which will cause a
SessionFailedError. -
- Catch this error in the workflow.
-
- Create a new session, which will be assigned to a healthy worker.
-
- Retry the business logic.
- Could you confirm if this is the idiomatic and recommended approach for a self-hosted Temporal environment? Are there any other best practices or considerations for this scenario?
- Thank you!
- Allow the activity to fail when the worker shuts down, which will cause a