Errors on Temporal History Server

Hi team,
After Cassandra node got crashed, we are getting below given error on temporal history node in stage environment, it coming so frequently, we are running on 1.15.2 version of temporal server.

2023-07-02T20:13:18+05:30	{"level":"error","ts":"2023-07-02T14:43:18.261Z","msg":"Critical error processing task, retrying.","shard-id":458,"address":"10.1.26.77:7234","component":"timer-queue-processor","cluster-name":"active","component":"timer-queue-processor","shard-id":458,"queue-task-id":23071095,"queue-task-visibility-timestamp":"2023-04-28T10:47:54.083Z","queue-task":{"NamespaceID":"ad3353c9-8633-448c-ac9b-a43645d8a19f","WorkflowID":"PWAUAU_6369c1ae01c9f00653640abe_9f8371d9-2689-4f7f-b636-942fb7e02a0f","RunID":"16cbe255-4fef-428f-9491-0fbd7506b909","VisibilityTimestamp":"2023-04-28T10:47:54.083917521Z","TaskID":23071095,"Version":0},"error":"unable to get workflow completion event","operation-result":"OperationCritical","queue-task":{"NamespaceID":"ad3353c9-8633-448c-ac9b-a43645d8a19f","WorkflowID":"PWAUAU_6369c1ae01c9f00653640abe_9f8371d9-2689-4f7f-b636-942fb7e02a0f","RunID":"16cbe255-4fef-428f-9491-0fbd7506b909","VisibilityTimestamp":"2023-04-28T10:47:54.083917521Z","TaskID":23071095,"Version":0},"logging-call-at":"taskProcessor.go:233","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck.func1\n\t/temporal/service/history/taskProcessor.go:233\ngo.temporal.io/server/common/backoff.Retry.func1\n\t/temporal/common/backoff/retry.go:104\ngo.temporal.io/server/common/backoff.RetryContext\n\t/temporal/common/backoff/retry.go:125\ngo.temporal.io/server/common/backoff.Retry\n\t/temporal/common/backoff/retry.go:105\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck\n\t/temporal/service/history/taskProcessor.go:254\ngo.temporal.io/server/service/history.(*taskProcessor).taskWorker\n\t/temporal/service/history/taskProcessor.go:177"}

2023-07-02T20:13:18+05:30	{"level":"error","ts":"2023-07-02T14:43:18.261Z","msg":"Fail to process task","shard-id":458,"address":"10.1.26.77:7234","component":"timer-queue-processor","cluster-name":"active","component":"timer-queue-processor","shard-id":458,"queue-task-id":23071095,"queue-task-visibility-timestamp":"2023-04-28T10:47:54.083Z","queue-task":{"NamespaceID":"ad3353c9-8633-448c-ac9b-a43645d8a19f","WorkflowID":"PWAUAU_6369c1ae01c9f00653640abe_9f8371d9-2689-4f7f-b636-942fb7e02a0f","RunID":"16cbe255-4fef-428f-9491-0fbd7506b909","VisibilityTimestamp":"2023-04-28T10:47:54.083917521Z","TaskID":23071095,"Version":0},"error":"unable to get workflow completion event","lifecycle":"ProcessingFailed","logging-call-at":"taskProcessor.go:340","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/service/history.(*taskProcessor).handleTaskError\n\t/temporal/service/history/taskProcessor.go:340\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck.func1\n\t/temporal/service/history/taskProcessor.go:228\ngo.temporal.io/server/common/backoff.Retry.func1\n\t/temporal/common/backoff/retry.go:104\ngo.temporal.io/server/common/backoff.RetryContext\n\t/temporal/common/backoff/retry.go:125\ngo.temporal.io/server/common/backoff.Retry\n\t/temporal/common/backoff/retry.go:105\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck\n\t/temporal/service/history/taskProcessor.go:254\ngo.temporal.io/server/service/history.(*taskProcessor).taskWorker\n\t/temporal/service/history/taskProcessor.go:177"}

2023-07-02T20:13:18+05:30	{"level":"error","ts":"2023-07-02T14:43:18.260Z","msg":"Cache unable to retrieve event from store","shard-id":458,"address":"10.1.26.77:7234","component":"events-cache","error":"Workflow execution history not found.","wf-id":"PWAUAU_6369c1ae01c9f00653640abe_9f8371d9-2689-4f7f-b636-942fb7e02a0f","wf-run-id":"16cbe255-4fef-428f-9491-0fbd7506b909","wf-namespace-id":"ad3353c9-8633-448c-ac9b-a43645d8a19f","wf-history-event-id":13,"logging-call-at":"cache.go:131","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/service/history/events.(*CacheImpl).GetEvent\n\t/temporal/service/history/events/cache.go:131\ngo.temporal.io/server/service/history/workflow.(*MutableStateImpl).GetCompletionEvent\n\t/temporal/service/history/workflow/mutable_state_impl.go:790\ngo.temporal.io/server/service/history/workflow.(*DeleteManagerImpl).deleteWorkflowExecutionInternal\n\t/temporal/service/history/workflow/delete_manager.go:165\ngo.temporal.io/server/service/history/workflow.(*DeleteManagerImpl).DeleteWorkflowExecutionByRetention\n\t/temporal/service/history/workflow/delete_manager.go:124\ngo.temporal.io/server/service/history.(*timerQueueTaskExecutorBase).executeDeleteHistoryEventTask\n\t/temporal/service/history/timerQueueTaskExecutorBase.go:108\ngo.temporal.io/server/service/history.(*timerQueueActiveTaskExecutor).execute\n\t/temporal/service/history/timerQueueActiveTaskExecutor.go:111\ngo.temporal.io/server/service/history.(*timerQueueActiveProcessorImpl).process\n\t/temporal/service/history/timerQueueActiveProcessor.go:257\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskOnce\n\t/temporal/service/history/taskProcessor.go:276\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck.func1\n\t/temporal/service/history/taskProcessor.go:227\ngo.temporal.io/server/common/backoff.Retry.func1\n\t/temporal/common/backoff/retry.go:104\ngo.temporal.io/server/common/backoff.RetryContext\n\t/temporal/common/backoff/retry.go:125\ngo.temporal.io/server/common/backoff.Retry\n\t/temporal/common/backoff/retry.go:105\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck\n\t/temporal/service/history/taskProcessor.go:254\ngo.temporal.io/server/service/history.(*taskProcessor).taskWorker\n\t/temporal/service/history/taskProcessor.go:177"}

Is there any way to fix this error, we doesn’t have option to recreate the database.

Critical error processing task, retrying.
unable to get workflow completion event

we are running on 1.15.2

I would try to upgrade server version if possible. For standby timer queue, before server release 1.17 this could happen when adding new clusters, see fix PR here.
For active timer queue this should not happen under normal operations, are you still getting the errors on your staging env?

thanks @tihomir
Yes we are still getting same error, but for active timer queue as you mentioned we are not facing any issue.
can we directly upgrade from release 1.15.2 to 1.18.0 or is it better to go with minor version and then to 1.18.0 as i seen schema changes also for cassandra?

can we directly upgrade from release 1.15.2 to 1.18.0

Not directly, see docs here