"context deadline exceeded" when running ExecuteActivity after running ExecuteChildWorkflow ~3000 times

I have a Temporal workflow written in Go. It first runs ExecuteChildWorkflow ~3000 times (the count varies depending on the size of the workload), then it runs ExecuteActivity, and finally it runs Get() on the Future returned by the activity. When the number of child workflows to execute is low (let’s say a few dozens), the workflow completes without issues, but with 3000 child workflows to execute, neither the activity nor the children workflows are actually executed.

Here is what I observe:

  1. From OTEL logs, I see 3000 “ExecuteChildWorkflow” printed, followed by one “ExecuteActivity”. Then “Task Processing failed with error” with “context deadline exceeded” as Error.

  2. The activity and the children workflows are not actually executed.

  3. On the worker, I see this log entry:

{"time":"2025-04-10T21:12:39.199104-07:00","level":"INFO","msg":"Task processing failed with error","build-info":{"version":"","time":"2025-04-03T01:54:02+0000"},"Namespace":"default","TaskQueue":"queue","WorkerID":"worker","WorkerType":"WorkflowWorker","Error":"context deadline exceeded"}
  1. The workflow continues running and the things I described above repeats several times in every hour.

  2. In Temporal UI, I see “WorkflowTaskTimedOut” in Workflow Events. But this is printed only a couple of times, not each time the failure described above occurs.

What can I do to find out what is causing this issue?

In the loop where I run ExecuteChildWorkflow 3000 times, when I do workflow.Sleep(ctx, 1*time.Millisecond) after kicking off every 100 child workflows, then I no longer observe this issue. But I’m wondering if there is a more proper fix to this.

There is a 4 MB gRPC request limit. It looks like 3k child workflow inputs exceed it. So, your workaround breaks it down into smaller batches, which is the correct solution.

1 Like

Thank you, Maxim!