We noticed that some of our workflows were completing early without error or timeout and when checking our temporal server logs for the same time period we see many errors like below.
Around this time we were starting on the order of 100K abandoned child workflows that execute an activity that polls for status by returning error and retrying until the desired status is received.
Is this something we can address with a configuration change?
Please let us now if more information if needed.
{ [-]
   address: xxx.xxx.xxx.xxx:7234
   cluster-name: active
   component: timer-queue-processor
   error: context deadline exceeded
   level: error
   lifecycle: ProcessingFailed
   logging-call-at: taskProcessor.go:326
   msg: Fail to process task
   queue-task-id: 133055103
   queue-task-type: ActivityRetryTimer
   queue-task-visibility-timestamp: 1625173712064589600
   service: history
   shard-id: 4
   shard-item: 0xc000cf6580
   stacktrace: go.temporal.io/server/common/log/loggerimpl.(*loggerImpl).Error
	/temporal/common/log/loggerimpl/logger.go:138
go.temporal.io/server/service/history.(*taskProcessor).handleTaskError
	/temporal/service/history/taskProcessor.go:326
go.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck.func1
	/temporal/service/history/taskProcessor.go:212
go.temporal.io/server/common/backoff.Retry
	/temporal/common/backoff/retry.go:103
go.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck
	/temporal/service/history/taskProcessor.go:238
go.temporal.io/server/service/history.(*taskProcessor).taskWorker
	/temporal/service/history/taskProcessor.go:161
   ts: 2021-07-01T21:14:05.130Z
   wf-id: 29bc0b91-3547-473e-9844-bd48b70d9745_17
   wf-namespace-id: c27e3385-bb4c-45f2-abc6-e7d20a76028e
   wf-run-id: b6673fab-d36e-4b67-9abe-6502514fb8e7
   wf-timeout-type: Unspecified
   xdc-failover-version: 0
}