Hello, we’ve been having the following recurrent error happening every 18minute every hour on history servers:
{"level":"error","ts":"2022-09-29T05:18:16.597Z","msg":"Operation failed with internal error.","error":"GetWorkflowExecution: failed to get request cancel info. Error: Failed to get request cancel info. Error: context deadline exceeded","metric-scope":5,"logging-call-at":"persistenceMetricClients.go:1579","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:143\ngo.temporal.io/server/common/persistence.(*metricEmitter).updateErrorMetric\n\t/home/builder/temporal/common/persistence/persistenceMetricClients.go:1579\ngo.temporal.io/server/common/persistence.(*executionPersistenceClient).GetWorkflowExecution\n\t/home/builder/temporal/common/persistence/persistenceMetricClients.go:247\ngo.temporal.io/server/common/persistence.(*executionRetryablePersistenceClient).GetWorkflowExecution.func1\n\t/home/builder/temporal/common/persistence/persistenceRetryableClients.go:228\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/builder/temporal/common/backoff/retry.go:194\ngo.temporal.io/server/common/persistence.(*executionRetryablePersistenceClient).GetWorkflowExecution\n\t/home/builder/temporal/common/persistence/persistenceRetryableClients.go:232\ngo.temporal.io/server/service/history/shard.(*ContextImpl).GetWorkflowExecution\n\t/home/builder/temporal/service/history/shard/context_impl.go:902\ngo.temporal.io/server/service/history/workflow.getWorkflowExecution\n\t/home/builder/temporal/service/history/workflow/transaction_impl.go:425\ngo.temporal.io/server/service/history/workflow.(*ContextImpl).LoadMutableState\n\t/home/builder/temporal/service/history/workflow/context.go:270\ngo.temporal.io/server/service/history.LoadMutableStateForTask\n\t/home/builder/temporal/service/history/nDCTaskUtil.go:142\ngo.temporal.io/server/service/history.loadMutableStateForTimerTask\n\t/home/builder/temporal/service/history/nDCTaskUtil.go:123\ngo.temporal.io/server/service/history.(*timerQueueActiveTaskExecutor).executeActivityTimeoutTask\n\t/home/builder/temporal/service/history/timerQueueActiveTaskExecutor.go:192\ngo.temporal.io/server/service/history.(*timerQueueActiveTaskExecutor).Execute\n\t/home/builder/temporal/service/history/timerQueueActiveTaskExecutor.go:108\ngo.temporal.io/server/service/history/queues.(*executorWrapper).Execute\n\t/home/builder/temporal/service/history/queues/executor_wrapper.go:67\ngo.temporal.io/server/service/history/queues.(*executableImpl).Execute\n\t/home/builder/temporal/service/history/queues/executable.go:201\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask.func1\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:225\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/builder/temporal/common/backoff/retry.go:170\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/builder/temporal/common/backoff/retry.go:194\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/builder/temporal/common/backoff/retry.go:171\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:235\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:211"}
Most of the error are cause by “context deadline exceed”. but we have no any hourly cron job on temporal.
On the time when error occur, the metrics of MySQL “threads_running” goes high
We trying to execute “show full processlist” on MySQL every 5 second, near the time when error occur, we could see “START TRANSACTION” statement
Is there any cron job that history serve would execute every hour with a lot of operate to MySQL?