History server context deadline exceed errors every hour

rj.huangsir · September 29, 2022, 8:07am

Hello, we’ve been having the following recurrent error happening every 18minute every hour on history servers:

{"level":"error","ts":"2022-09-29T05:18:16.597Z","msg":"Operation failed with internal error.","error":"GetWorkflowExecution: failed to get request cancel info. Error: Failed to get request cancel info. Error: context deadline exceeded","metric-scope":5,"logging-call-at":"persistenceMetricClients.go:1579","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:143\ngo.temporal.io/server/common/persistence.(*metricEmitter).updateErrorMetric\n\t/home/builder/temporal/common/persistence/persistenceMetricClients.go:1579\ngo.temporal.io/server/common/persistence.(*executionPersistenceClient).GetWorkflowExecution\n\t/home/builder/temporal/common/persistence/persistenceMetricClients.go:247\ngo.temporal.io/server/common/persistence.(*executionRetryablePersistenceClient).GetWorkflowExecution.func1\n\t/home/builder/temporal/common/persistence/persistenceRetryableClients.go:228\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/builder/temporal/common/backoff/retry.go:194\ngo.temporal.io/server/common/persistence.(*executionRetryablePersistenceClient).GetWorkflowExecution\n\t/home/builder/temporal/common/persistence/persistenceRetryableClients.go:232\ngo.temporal.io/server/service/history/shard.(*ContextImpl).GetWorkflowExecution\n\t/home/builder/temporal/service/history/shard/context_impl.go:902\ngo.temporal.io/server/service/history/workflow.getWorkflowExecution\n\t/home/builder/temporal/service/history/workflow/transaction_impl.go:425\ngo.temporal.io/server/service/history/workflow.(*ContextImpl).LoadMutableState\n\t/home/builder/temporal/service/history/workflow/context.go:270\ngo.temporal.io/server/service/history.LoadMutableStateForTask\n\t/home/builder/temporal/service/history/nDCTaskUtil.go:142\ngo.temporal.io/server/service/history.loadMutableStateForTimerTask\n\t/home/builder/temporal/service/history/nDCTaskUtil.go:123\ngo.temporal.io/server/service/history.(*timerQueueActiveTaskExecutor).executeActivityTimeoutTask\n\t/home/builder/temporal/service/history/timerQueueActiveTaskExecutor.go:192\ngo.temporal.io/server/service/history.(*timerQueueActiveTaskExecutor).Execute\n\t/home/builder/temporal/service/history/timerQueueActiveTaskExecutor.go:108\ngo.temporal.io/server/service/history/queues.(*executorWrapper).Execute\n\t/home/builder/temporal/service/history/queues/executor_wrapper.go:67\ngo.temporal.io/server/service/history/queues.(*executableImpl).Execute\n\t/home/builder/temporal/service/history/queues/executable.go:201\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask.func1\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:225\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/builder/temporal/common/backoff/retry.go:170\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/builder/temporal/common/backoff/retry.go:194\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/builder/temporal/common/backoff/retry.go:171\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:235\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:211"}

Most of the error are cause by “context deadline exceed”. but we have no any hourly cron job on temporal.

On the time when error occur, the metrics of MySQL “threads_running” goes high

We trying to execute “show full processlist” on MySQL every 5 second, near the time when error occur, we could see “START TRANSACTION” statement

Is there any cron job that history serve would execute every hour with a lot of operate to MySQL?

tihomir · September 30, 2022, 3:54am

GetWorkflowExecution: failed to get request cancel info.

Believe this comes from this part of the code that tries to find the workflow executions that your workflows are trying to cancel (send cancel requests for).
It could indicate db corruption, would check the state of your db, look at logs or anything that could be going wrong on the db site.

Also what are the server and mysql versions you are running on?

rj.huangsir · September 30, 2022, 6:28am

mysql version: 8.0.26-17
temporal version: 1.18.0
temporal was deployed on k8s by using helm chart

rj.huangsir · September 30, 2022, 6:40am

I believe there is no any business logical would trying to send cancel request. because the task is coming from schedule every 5minute

rj.huangsir · October 27, 2022, 1:10am

The problem solved. After the binlog archive turned off by dba…

Topic		Replies	Views
Regular Context Deadline Exceeded error when executing a workflow Community Support go-sdk , general-impl	4	771	January 28, 2024
High volume of context deadline exceeded from visiblity_manager_metrics.go Community Support	1	233	December 14, 2024
Temporal History Server Errors Community Support history	10	1295	September 4, 2024
History and worker service errors Community Support	2	500	October 20, 2022
Temporal Sever errors ; workflow failures and all request to history client failed Community Support	6	1189	May 8, 2023

History server context deadline exceed errors every hour

Related topics