Hi,
we had an issue with stuck crons with Temporal (with MySQL) version 1.15.3. We recreated crons with different names and new crons work well. However after that we started to see errors in history service logs (Temporal was upgraded to v1.17.6, but the errors were there in previous version too):
{
"component": "events-cache",
"error": "Workflow execution history not found.",
"level": "error",
"logging-call-at": "cache.go:132",
"message_key": null,
"msg": "Cache unable to retrieve event from store",
"shard-id": 97,
"stacktrace": "go.temporal.io/server/common/log.(*zapLogger).Error\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/log/zap_logger.go:142\ngo.temporal.io/server/service/history/events.(*CacheImpl).GetEvent\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/events/cache.go:132\ngo.temporal.io/server/service/history/workflow.(*MutableStateImpl).GetCompletionEvent\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/workflow/mutable_state_impl.go:845\ngo.temporal.io/server/service/history/workflow.(*MutableStateImpl).GetWorkflowCloseTime\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/workflow/mutable_state_impl.go:871\ngo.temporal.io/server/service/history/workflow.(*DeleteManagerImpl).deleteWorkflowExecutionInternal\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/workflow/delete_manager.go:227\ngo.temporal.io/server/service/history/workflow.(*DeleteManagerImpl).DeleteWorkflowExecutionByRetention\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/workflow/delete_manager.go:189\ngo.temporal.io/server/service/history.(*timerQueueTaskExecutorBase).executeDeleteHistoryEventTask\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/timerQueueTaskExecutorBase.go:149\ngo.temporal.io/server/service/history.(*timerQueueStandbyTaskExecutor).Execute\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/timerQueueStandbyTaskExecutor.go:114\ngo.temporal.io/server/service/history/queues.(*executableImpl).Execute\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/queues/executable.go:164\ngo.temporal.io/server/common/tasks.(*ParallelProcessor).executeTask.func1\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/tasks/parallel_processor.go:207\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/backoff/retry.go:166\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/backoff/retry.go:190\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/backoff/retry.go:167\ngo.temporal.io/server/common/tasks.(*ParallelProcessor).executeTask\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/tasks/parallel_processor.go:217\ngo.temporal.io/server/common/tasks.(*ParallelProcessor).processTask\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/tasks/parallel_processor.go:195",
"stream": "stdout",
"wf-history-event-id": 11,
"wf-id": "wfid",
"wf-namespace-id": "nsid",
"wf-run-id": "runid"
}
{
"component": "timer-queue-processor",
"error": "unable to get workflow completion event",
"level": "error",
"lifecycle": "ProcessingFailed",
"logging-call-at": "lazy_logger.go:68",
"message_key": null,
"msg": "Fail to process task",
"queue-task": {
"NamespaceID": "nsid",
"RunID": "runid",
"TaskID": 377,
"Version": 0,
"WorkflowID": "wfid"
},
"queue-task-id": 377,
"queue-task-type": "DeleteHistoryEvent",
"shard-id": 97,
"stacktrace": "go.temporal.io/server/common/log.(*zapLogger).Error\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/log/zap_logger.go:142\ngo.temporal.io/server/common/log.(*lazyLogger).Error\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/log/lazy_logger.go:68\ngo.temporal.io/server/service/history/queues.(*executableImpl).HandleErr\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/queues/executable.go:234\ngo.temporal.io/server/common/tasks.(*ParallelProcessor).executeTask.func1\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/tasks/parallel_processor.go:208\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/backoff/retry.go:166\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/backoff/retry.go:190\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/backoff/retry.go:167\ngo.temporal.io/server/common/tasks.(*ParallelProcessor).executeTask\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/tasks/parallel_processor.go:217\ngo.temporal.io/server/common/tasks.(*ParallelProcessor).processTask\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/tasks/parallel_processor.go:195",
"stream": "stdout",
"wf-history-event-id": 1,
"wf-id": "wfid",
"wf-namespace-id": "nsid",
"wf-run-id": "runid"
}
{
"component": "timer-queue-processor",
"error": "unable to get workflow completion event",
"level": "error",
"logging-call-at": "lazy_logger.go:68",
"message_key": null,
"msg": "Critical error processing task, retrying.",
"operation-result": "OperationCritical",
"queue-task": {
"NamespaceID": "nsid",
"RunID": "runid",
"TaskID": 3776,
"Version": 0,
"WorkflowID": "wfid"
},
"queue-task-id": 3776,
"queue-task-type": "DeleteHistoryEvent",
"shard-id": 97,
"stacktrace": "go.temporal.io/server/common/log.(*zapLogger).Error\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/log/zap_logger.go:142\ngo.temporal.io/server/common/log.(*lazyLogger).Error\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/log/lazy_logger.go:68\ngo.temporal.io/server/service/history/queues.(*executableImpl).HandleErr.func1\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/queues/executable.go:187\ngo.temporal.io/server/service/history/queues.(*executableImpl).HandleErr\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/service/history/queues/executable.go:235\ngo.temporal.io/server/common/tasks.(*ParallelProcessor).executeTask.func1\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/tasks/parallel_processor.go:208\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/backoff/retry.go:166\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/backoff/retry.go:190\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/backoff/retry.go:167\ngo.temporal.io/server/common/tasks.(*ParallelProcessor).executeTask\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/tasks/parallel_processor.go:217\ngo.temporal.io/server/common/tasks.(*ParallelProcessor).processTask\n\t/go/pkg/mod/go.temporal.io/server@v1.17.6/common/tasks/parallel_processor.go:195",
"stream": "stdout",
"wf-history-event-id": 1,
"wf-id": "wfid",
"wf-namespace-id": "nsid",
"wf-run-id": "runid"
}
There are workflow IDs and run IDs in these log entries. I assume these stuck workflows have expired after retention period and have been deleted from DB/archived only partially and some old data seems to be left in DB. If I try to tctl show
these workflows, I get “requested workflow history does not exist”. How to get rid of these errors?
Thanks