We are currently setting up a Temporal cluster. Server runs on GKE, workflows and activities are implemented in Go. It runs smoothly, but we’ve been observing “Workflow task not found” errors in the matching service without a clear cause.
This is an example log, taken from the matching service:
No server pod or worker pod restarted around that time, they’ve all been running for hours. Workers don’t log errors or warnings. Database was running fine at this time too.
Workflow has since completed, and there are no obvious issues. If we look into the Temporal UI, the event with id 2 in the event history, aka a Workflow Task Scheduled is there and looks normal.
Any idea what’s going on? Does it means our workers are somehow doing the same task twice?
Thanks for looking into this! We’re observing around 240 occurrences of this error message per day, spread over the day. While there are minor volume variations, we’ve been observing consistently for days.
We haven’t terminated any workflow over the last 3 days, yet we still observe the same amount of errors, on workflows which have been started recently.
The workflow from the first message has been pruned, but here is the history of one with the exact same situation: