Activity not executed due to history fail to process task

Hi,

what are the possible reasons for fail to process task + context deadline exceed from history service?

I have a workflow working before but just notice it is halted after WorkflowExecutionStarted and WorkflowTaskScheduled. have checked task queue, there are workers sitting idle.

when any new workflow is created, history service will fail and retry a few time, then Critical error processing task, then Error updating ack level for shard(“cluster-name”:“active”,“error”:“Failed to update shard. Previous range ID: 27; new range ID: 28”,“operation-result”:“OperationFailed”) and workflow timeout. In this env, I only have a workflow with very low rps so I doubt resource constraint could cause this.

thanks

{"level":"error","ts":"2021-11-09T23:58:45.198Z","msg":"Fail to process task","service":"history","shard-id":1473,"address":"10.32.31.200:7934","shard-item":"0xc00041d680","component":"transfe │
│ r-queue-processor","cluster-name":"active","shard-id":1473,"queue-task-id":6291467,"queue-task-visibility-timestamp":"2021-11-09T23:55:27.540Z","xdc-failover-version":0,"queue-task-type":"Tran │
│ sferWorkflowTask","wf-namespace-id":"09f36679-7181-4b98-a281-ec819c6436bc","wf-run-id":"98fce692-9e8e-4f4c-a5ee-b7a6be4d5158","error":"context deadline exceed │
│ ed","lifecycle":"ProcessingFailed","logging-call-at":"taskProcessor.go:332","stacktrace":" go.temporal.io/server/common/log.(*zapLogger).Error \n\t/temporal/common/log/zap_logger.go:143\ngo.temp │
oral.io/server/service/history.(*taskProcessor).handleTaskError \n\t/temporal/service/history/taskProcessor.go:332\ ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck.func
│ 1\n\t/temporal/service/history/taskProcessor.go:218\ ngo.temporal.io/server/common/backoff.Retry \n\t/temporal/common/backoff/retry.go:103\ ngo.temporal.io/server/service/history.(*taskProcessor)
│ .processTaskAndAck\n\t/temporal/service/history/taskProcessor.go:244\ ngo.temporal.io/server/service/history.(*taskProcessor).taskWorker \n\t/temporal/service/history/taskProcessor.go:167"}

@maxim do you have idea about debugging here? thanks! Slack

Could it be the cluster mixup issue?