"Workflow execution history not found" errors

Hi,
we had an issue with stuck crons with Temporal (with MySQL) version 1.15.3. We recreated crons with different names and new crons work well. However after that we started to see errors in history service logs (Temporal was upgraded to v1.17.6, but the errors were there in previous version too):

{
    "component": "events-cache",
    "error": "Workflow execution history not found.",
    "level": "error",
    "logging-call-at": "cache.go:132",
    "message_key": null,
    "msg": "Cache unable to retrieve event from store",
    "shard-id": 97,
    "stacktrace": "go.temporal.io/server/common/log.(*zapLogger).Error\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/log/zap_logger.go:142\ngo.temporal.io/server/service/history/events.(*CacheImpl).GetEvent\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/events/cache.go:132\ngo.temporal.io/server/service/history/workflow.(*MutableStateImpl).GetCompletionEvent\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/workflow/mutable_state_impl.go:845\ngo.temporal.io/server/service/history/workflow.(*MutableStateImpl).GetWorkflowCloseTime\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/workflow/mutable_state_impl.go:871\ngo.temporal.io/server/service/history/workflow.(*DeleteManagerImpl).deleteWorkflowExecutionInternal\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/workflow/delete_manager.go:227\ngo.temporal.io/server/service/history/workflow.(*DeleteManagerImpl).DeleteWorkflowExecutionByRetention\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/workflow/delete_manager.go:189\ngo.temporal.io/server/service/history.(*timerQueueTaskExecutorBase).executeDeleteHistoryEventTask\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/timerQueueTaskExecutorBase.go:149\ngo.temporal.io/server/service/history.(*timerQueueStandbyTaskExecutor).Execute\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/timerQueueStandbyTaskExecutor.go:114\ngo.temporal.io/server/service/history/queues.(*executableImpl).Execute\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/queues/executable.go:164\ngo.temporal.io/server/common/tasks.(*ParallelProcessor).executeTask.func1\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/tasks/parallel_processor.go:207\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/backoff/retry.go:166\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/backoff/retry.go:190\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/backoff/retry.go:167\ngo.temporal.io/server/common/tasks.(*ParallelProcessor).executeTask\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/tasks/parallel_processor.go:217\ngo.temporal.io/server/common/tasks.(*ParallelProcessor).processTask\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/tasks/parallel_processor.go:195",
    "stream": "stdout",
    "wf-history-event-id": 11,
    "wf-id": "wfid",
    "wf-namespace-id": "nsid",
    "wf-run-id": "runid"
}

{
    "component": "timer-queue-processor",
    "error": "unable to get workflow completion event",
    "level": "error",
    "lifecycle": "ProcessingFailed",
    "logging-call-at": "lazy_logger.go:68",
    "message_key": null,
    "msg": "Fail to process task",
    "queue-task": {
      "NamespaceID": "nsid",
      "RunID": "runid",
      "TaskID": 377,
      "Version": 0,
      "WorkflowID": "wfid"
    },
    "queue-task-id": 377,
    "queue-task-type": "DeleteHistoryEvent",
    "shard-id": 97,
    "stacktrace": "go.temporal.io/server/common/log.(*zapLogger).Error\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/log/zap_logger.go:142\ngo.temporal.io/server/common/log.(*lazyLogger).Error\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/log/lazy_logger.go:68\ngo.temporal.io/server/service/history/queues.(*executableImpl).HandleErr\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/queues/executable.go:234\ngo.temporal.io/server/common/tasks.(*ParallelProcessor).executeTask.func1\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/tasks/parallel_processor.go:208\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/backoff/retry.go:166\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/backoff/retry.go:190\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/backoff/retry.go:167\ngo.temporal.io/server/common/tasks.(*ParallelProcessor).executeTask\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/tasks/parallel_processor.go:217\ngo.temporal.io/server/common/tasks.(*ParallelProcessor).processTask\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/tasks/parallel_processor.go:195",
    "stream": "stdout",
    "wf-history-event-id": 1,
    "wf-id": "wfid",
    "wf-namespace-id": "nsid",
    "wf-run-id": "runid"
}

{
    "component": "timer-queue-processor",
    "error": "unable to get workflow completion event",
    "level": "error",
    "logging-call-at": "lazy_logger.go:68",
    "message_key": null,
    "msg": "Critical error processing task, retrying.",
    "operation-result": "OperationCritical",
    "queue-task": {
      "NamespaceID": "nsid",
      "RunID": "runid",
      "TaskID": 3776,
      "Version": 0,
      "WorkflowID": "wfid"
    },
    "queue-task-id": 3776,
    "queue-task-type": "DeleteHistoryEvent",
    "shard-id": 97,
    "stacktrace": "go.temporal.io/server/common/log.(*zapLogger).Error\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/log/zap_logger.go:142\ngo.temporal.io/server/common/log.(*lazyLogger).Error\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/log/lazy_logger.go:68\ngo.temporal.io/server/service/history/queues.(*executableImpl).HandleErr.func1\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/queues/executable.go:187\ngo.temporal.io/server/service/history/queues.(*executableImpl).HandleErr\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/queues/executable.go:235\ngo.temporal.io/server/common/tasks.(*ParallelProcessor).executeTask.func1\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/tasks/parallel_processor.go:208\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/backoff/retry.go:166\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/backoff/retry.go:190\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/backoff/retry.go:167\ngo.temporal.io/server/common/tasks.(*ParallelProcessor).executeTask\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/tasks/parallel_processor.go:217\ngo.temporal.io/server/common/tasks.(*ParallelProcessor).processTask\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/tasks/parallel_processor.go:195",
    "stream": "stdout",
    "wf-history-event-id": 1,
    "wf-id": "wfid",
    "wf-namespace-id": "nsid",
    "wf-run-id": "runid"
}

There are workflow IDs and run IDs in these log entries. I assume these stuck workflows have expired after retention period and have been deleted from DB/archived only partially and some old data seems to be left in DB. If I try to tctl show these workflows, I get “requested workflow history does not exist”. How to get rid of these errors?
Thanks

Hi @jlemesh
Do you resolve this problem? Can you share, thanks.

Hi,

no, I wasn’t able to figure out the root cause. DB was reset multiple times after this, and multiple Temporal upgrades done. The problem did not repeat after this.