Deadlock when Blob data size is over the "Warn" limit

When I get the warning for the Blob data size, my worker goes into panic deadlock.

The size of the Blob data is not even near the 2MB hard limit (error limit), it just went a little bit over the warning limit.

NOTE: I use the Zstandard converter which works perfectly fine without the “Warn” limit messages.

NOTE #2: I have also tested it with the official Temporal’s Zlib converter and it also deadlocks, although the “wf-size” gets bigger because of the converter.

temporal | 
{"level":"warn",
"ts":"2022-05-20T09:13:36.835Z",
"msg":"Blob data size exceeds the warning limit.",
"service":"processor",
"wf-namespace":"0c99641b-8634-4b67-9ce1-33786b77fbde",
"wf-id":"85baa805-bc08-446b-b161-bf0e49bc2280",
"wf-run-id":"9f78fcfa-f0e2-491f-b1c2-3342ca90beec",
"wf-size":634207,
"blob-size-violation-operation":"RespondActivityTaskCompleted",
"logging-call-at":"util.go:592"}
2022/05/20 11:13:37 ERROR Workflow panic Namespace default TaskQueue ProcessIDs WorkerID 27392@User@ WorkflowType ProcessIDsWorkflow WorkflowID 85baa805-bc08-446b-b161-bf0e49bc2280 RunID 9f78fcfa-f0e2-491f-b1c2-3342ca90beec Attempt 1 Error Potential deadlock detected: workflow goroutine "root" didn't yield for over a second StackTrace process event for ProcessIDs [panic]:
go.temporal.io/sdk/internal.(*coroutineState).call(0xc00055a280, 0x3b9aca00)

C:/Users/U/go/pkg/mod/go.temporal.io/sdk@v1.14.0/internal/internal_workflow.go:925 +0x19e
go.temporal.io/sdk/internal.(*dispatcherImpl).ExecuteUntilAllBlocked(0xc00055a230, 0x147ad00?)

C:/Users/U/go/pkg/mod/go.temporal.io/sdk@v1.14.0/internal/internal_workflow.go:1014 +0x1a5
go.temporal.io/sdk/internal.executeDispatcher({0x16efb58, 0xc000550180}, {0x16f0cc8, 0xc00055a230}, 0x0?)

C:/Users/U/go/pkg/mod/go.temporal.io/sdk@v1.14.0/internal/internal_workflow.go:602 +0x9f
go.temporal.io/sdk/internal.(*syncWorkflowDefinition).OnWorkflowTaskStarted(0xc000570100?, 0xc001630d40?)

C:/Users/U/go/pkg/mod/go.temporal.io/sdk@v1.14.0/internal/internal_workflow.go:575 +0x32
go.temporal.io/sdk/internal.(*workflowExecutionEventHandlerImpl).ProcessEvent(0xc000454138, 0xc001630e00, 0xe0?, 0x1)

C:/Users/U/go/pkg/mod/go.temporal.io/sdk@v1.14.0/internal/internal_event_handlers.go:815 +0x203
go.temporal.io/sdk/internal.(*workflowExecutionContextImpl).ProcessWorkflowTask(0xc0001e2000, 0xc0004317d0)

C:/Users/U/go/pkg/mod/go.temporal.io/sdk@v1.14.0/internal/internal_task_handlers.go:878 +0xca8
go.temporal.io/sdk/internal.(*workflowTaskHandlerImpl).ProcessWorkflowTask(0xc0000f5e40, 0xc0004317d0, 0xc00054cb40)

C:/Users/U/go/pkg/mod/go.temporal.io/sdk@v1.14.0/internal/internal_task_handlers.go:727 +0x485
go.temporal.io/sdk/internal.(*workflowTaskPoller).processWorkflowTask(0xc0001a9930, 0xc0004317d0)

C:/Users/U/go/pkg/mod/go.temporal.io/sdk@v1.14.0/internal/internal_task_pollers.go:284 +0x2cd
go.temporal.io/sdk/internal.(*workflowTaskPoller).ProcessTask(0xc0001a9930, {0x145b800?, 0xc0004317d0?})
 
C:/Users/U/go/pkg/mod/go.temporal.io/sdk@v1.14.0/internal/internal_task_pollers.go:255 +0x6c
go.temporal.io/sdk/internal.(*baseWorker).processTask(0xc000234000, {0x145b3c0?, 0xc028de2050})

C:/Users/U/go/pkg/mod/go.temporal.io/sdk@v1.14.0/internal/internal_worker_base.go:398 +0x167
created by go.temporal.io/sdk/internal.(*baseWorker).runTaskDispatcher

C:/Users/U/go/pkg/mod/go.temporal.io/sdk@v1.14.0/internal/internal_worker_base.go:302 +0xb5

workflow goroutine “root” didn’t yield for over a second

It seems that you converter might be blocking for over a second and triggering the deadlock detector timeout.
Are you getting this during debugging only?

There is a new workflow.DataConverterWithoutDeadlockDetection feature which will be introduced in next Go SDK release to help with this.

If the data converter is not an issue in your case it would be helpful if you can provide a reproduce to see what else could be going on that’s blocking.

It looks like it is the Deadlock timeout since my converter converts >20MB of payload into ~500KB so it takes longer than a second to do the compressions.

I will wait for the one without the Deadlock Detection.

NOTE: I am not doing it in the debugging mode, it’s a normal Workflow execution run.

Take a look at the snappycompress sample, it might provide a better/faster algorithm, might be worth trying.

my converter converts >20MB

An alternative option could be to store this large data to let’s say an s3 bucket and then pass only the reference to it as result from your activity.