We are having some issues with workers. The env is used for tests, so load is very limited. Need a guidance on how to check what happened.
On Friday we noticed that some Workflows were “stuck” in activities that shouldn’t fail ( the activity are not dependent of external apis yet, so shouldn’t fail or timeout ). During the weekend, the task was finally processed, but took 13h to complete :
As seen in the above image, the activity had 14 attempts before completing.
The number of attempts makes sense because of the 1h timeout of scheduleToStartTimeout and 1h for startToCloseTimeout, but the reason for being “stuck” is a mistery.
This morning we continued the search for answer about this and noticed that new workflows are not being processed by the worker.
Seems that the worker is not handling Workflows Tasks
haven’t tried yet to restart the worker as i want to understand why this is happening.
Using tctl we have the following for the queue :
Thanks for the help,