Error Workflow task not found

I didn’t setup the Activity Timeout so it defaulted to 10 seconds. My Activity ran longer than that and I had this error:

2022/07/09 15:50:13 INFO  Task processing failed with error Namespace default TaskQueue DataTQ WorkerID Data Worker #9914@DeliciousBacon WorkerType WorkflowWorker Error Workflow task not found.
2022/07/09 15:50:29 ERROR Workflow panic Namespace default TaskQueue DataTQ WorkerID Data Worker #9914@DeliciousBacon WorkflowType DataWorkflow WorkflowID data#0 RunID f74db9ce-4964-46d1-9f9e-a2aaa3a348fd Attempt 2 Error Potential deadlock detected: workflow goroutine "root" didn't yield for over a second StackTrace process event for DataTQ [panic]:
go.temporal.io/sdk/internal.(*coroutineState).call(0xc00057f680, 0x2540be400)
        /home/user/go/pkg/mod/go.temporal.io/sdk@v1.15.0/internal/internal_workflow.go:944 +0x1bd
go.temporal.io/sdk/internal.(*dispatcherImpl).ExecuteUntilAllBlocked(0xc000491620, 0x11211e0?)
        /home/user/go/pkg/mod/go.temporal.io/sdk@v1.15.0/internal/internal_workflow.go:1033 +0x1a5
go.temporal.io/sdk/internal.executeDispatcher({0x1506b50, 0xc000491680}, {0x1507ef8, 0xc000491620}, 0x0?)
        /home/user/go/pkg/mod/go.temporal.io/sdk@v1.15.0/internal/internal_workflow.go:606 +0x9f
go.temporal.io/sdk/internal.(*syncWorkflowDefinition).OnWorkflowTaskStarted(0xc0003fcea0?, 0xc00024a900?)
        /home/user/go/pkg/mod/go.temporal.io/sdk@v1.15.0/internal/internal_workflow.go:576 +0x32
go.temporal.io/sdk/internal.(*workflowExecutionEventHandlerImpl).ProcessEvent(0xc00001cf78, 0xc00024ad80, 0x60?, 0x1)
        /home/user/go/pkg/mod/go.temporal.io/sdk@v1.15.0/internal/internal_event_handlers.go:827 +0x203
go.temporal.io/sdk/internal.(*workflowExecutionContextImpl).ProcessWorkflowTask(0xc009420080, 0xc000488930)
        /home/user/go/pkg/mod/go.temporal.io/sdk@v1.15.0/internal/internal_task_handlers.go:902 +0xd68
go.temporal.io/sdk/internal.(*workflowTaskHandlerImpl).ProcessWorkflowTask(0xc0003c8420, 0xc000488930, 0xc00010e300)
        /home/user/go/pkg/mod/go.temporal.io/sdk@v1.15.0/internal/internal_task_handlers.go:749 +0x485
go.temporal.io/sdk/internal.(*workflowTaskPoller).processWorkflowTask(0xc0003c4a90, 0xc000488930)
        /home/user/go/pkg/mod/go.temporal.io/sdk@v1.15.0/internal/internal_task_pollers.go:284 +0x2cd
go.temporal.io/sdk/internal.(*workflowTaskPoller).ProcessTask(0xc0003c4a90, {0x10ecdc0?, 0xc000488930?})
        /home/user/go/pkg/mod/go.temporal.io/sdk@v1.15.0/internal/internal_task_pollers.go:255 +0x6c
go.temporal.io/sdk/internal.(*baseWorker).processTask(0xc0000d9040, {0x10ec980?, 0xc000202180})
        /home/user/go/pkg/mod/go.temporal.io/sdk@v1.15.0/internal/internal_worker_base.go:400 +0x167
created by go.temporal.io/sdk/internal.(*baseWorker).runTaskDispatcher
        /home/user/go/pkg/mod/go.temporal.io/sdk@v1.15.0/internal/internal_worker_base.go:305 +0xb5
2022/07/09 15:50:29 WARN  Failed to process workflow task. Namespace default TaskQueue DataTQ WorkerID Data Worker #9914@DeliciousBacon WorkflowType DataWorkflow WorkflowID data#0 RunID f74db9ce-4964-46d1-9f9e-a2aaa3a348fd Attempt 2 Error Potential deadlock detected: workflow goroutine "root" didn't yield for over a second

After I saw “Potential deadlock detected”, I stopped the worker, set the proper Timeout in workflow.WithActivityOptions and restarted the worker, but now I get:

2022/07/09 15:57:43 DEBUG ExecuteActivity Namespace default TaskQueue DataTQ WorkerID Data Worker #12166@DeliciousBacon WorkflowType ProcessDataWorkflow WorkflowID data#0 RunID f74db9ce-4964-46d1-9f9e-a2aaa3a348fd Attempt 1 ActivityID 59 ActivityType BuildSaveDataActivity
2022/07/09 15:57:43 DEBUG Cached state staled, new task has unexpected events Namespace default TaskQueue DataTQ WorkerID Data Worker #12166@DeliciousBacon WorkflowID data#0 RunID f74db9ce-4964-46d1-9f9e-a2aaa3a348fd Attempt 22 CachedPreviousStartedEventID 57 TaskFirstEventID 1 TaskStartedEventID 57 PreviousStartedEventID 45
2022/07/09 15:57:43 INFO  Task processing failed with error Namespace default TaskQueue DataTQ WorkerID Data Worker #12166@DeliciousBacon WorkerType WorkflowWorker Error Workflow task not found.

It keeps retrying and increasing the Attempt count, but it fails to recover. Is there any way to recover the Workflow from this?

Notes:

  • workflow.ActivityOptions{StartToCloseTimeout: time.Hour * 2562046}
  • worker.Options{DeadlockDetectionTimeout: time.Second * 60}
  • go.temporal.io/api v1.8.0
  • go.temporal.io/sdk v1.15.0

What are you doing in your workflow thread that takes longer than 1 second? The deadlock detector is there for a reason as such code is prohibited.

Wasn’t this an Activity Timeout and not the problem with the Workflow? It timed out after 10 seconds—the Activity default.

Here’s info from the UI:


P.S. I would gladly read some docs on deadlock detector and prohibitions in the Workflow if you could link them. I don’t understand the reasons behind the Deadlocks after 1 second. I use for loops and short functions in the Workflow, which run fine and from what I’ve read are within the deterministic limitations.

Deadlock detector fires when a workflow thread doesn’t yield control back to Temporal for at least 1 second. The control is yielded when Future.Get, ExecuteActivity, etc are called. My guess is that the workflow code hits infinite loop in this case. You can download the history of this workflow and replay it under a debugger to troubleshoot. Use WorkflowReplayer to replay.

So even if there are no external API calls, I should still use Activity instead of handling simple functions in the Workflow?

My guess is that the workflow code hits infinite loop in this case.

I would love to hear some other possible causes since for _, i := range items {} loop cannot be infinite.

Also, could you please link some resources that explain the Deadlock detector and the reasons behind its functionality?

What type of simple function takes over 1 second to execute without talking to any external APIs?

I have two slices of structs that I range through twice and apply conditional logic on. The issue is, the slices can grow very lengthy and it takes longer than 1 second to range through them. :snail:

To demonstrate a simple function that takes over 1 second to execute, here’s one trivial example:

var evens []int
var odds []int
for i := 0; i < 200000000; i++ {
    if i%2 == 0 {
        evens = append(evens, i)
    } else {
        odds = append(odds, i)
    }
}

The point is, it’s not complex, it’s lengthy. Averages to 1.1 seconds on my setup.


To get back to the problem and the solution, should I instead send this work through an Activity?

I had set DeadlockDetectionTimeout: time.Second * 60 but this didn’t resolve the issue either. :thinking:

And thank you for your time, @maxim ! :hugs:


P.S. I have said I use for loops and short* functions in the Workflow—short functions, but the slices I get through Activity results can be long.

P.P.S. I would really love to read on prohibited code practices you mentioned with the Deadlock detector. I don’t want this to hit me in the production.

P.P.P.S. WorkflowReplayer gives me an error “first event is not WorkflowExecutionStarted”:

wrp := worker.NewWorkflowReplayer()

err := wrp.ReplayWorkflowHistoryFromJSONFile(nil, "ab24c5f3-1b91-4db0-809c-c7c94c28743a_events.json")
if err != nil {
    log.Fatal(err)
}

JSON:

{
    "eventId": "1",
    "eventTime": "2022-07-09T17:54:39.007537090Z",
    "eventType": "WorkflowExecutionStarted",
    "version": "0",
    "taskId": "45089033",
    //...
}