Hi all,
I’m using a self-hosted Temporal Service, and have been facing an issue in which my Temporal Activity Workers have been intermittently failing to process Activity invocations. Here’s a rough overview of what happens:
- In the Temporal Web UI, I notice the “ActivityTaskStarted” event (which shows the identity of the Worker that started this task)
- However, when I examine the logs from the Activity Worker, I see that it is not running any activity, and is simply idling (I have set the logging level to “debug” and notice that the Temporal Python SDK does not produce any logs during this period of idling)
- Eventually, an “ActivityTaskTimedOut” event is produced, because we hit the startToCloseTimeout
The tricky bit is that this behavior is sporadic and intermittent (it seems to happen at random with no discernable pattern).
Some additional context:
- The Activity Worker processes have not crashed during these periods (they are running and healthy)
- There are no error logs coming from the Temporal Python SDK used in the Activity Worker processes
- I am not using Activity Heartbeats, as the Activities are short running
I’d greatly appreciate any advice or ideas that may point me in the right direction with troubleshooting this issue.
Thanks.