Temporal Application is not establishing connection to Temporal Server inside Kubernetes

Martin_Lercher · May 2, 2022, 7:30pm

Hi all,

Note: found the (trivial) reason myself. See at the end of this item. Sorry for bothering. However, I’d like to keep the text as is, b/c someone else might find similar symptoms.

I’m also getting these error logs with “context deadline exceeded” after 70s at PollWorkflowTaskQueue and PollActivityTaskQueue in tag v1.16.1 of temporal, and I’m completely lost on what wait operation causes the 70s timeout.

{"level":"error","ts":"2022-05-02T20:04:30.602+0200","msg":"Unable to call matching.PollWorkflowTaskQueue.","service":"frontend","wf-task-queue-name":"LPOD:22bb3bbe-f6a3-4fe9-8a67-835b2898380c","timeout":"1m9.997997s","error":"context deadline exceeded","logging-call-at":"workflowHandler.go:921", ...

intermixed with these, bursted and probably only varying over all available “wf-task-queue-name”-s:

{"level":"error","ts":"2022-05-02T20:04:30.602+0200","msg":"Unable to call matching.PollActivityTaskQueue.","service":"frontend","wf-task-queue-name":"/_sys/temporal-sys-tq-scanner-taskqueue-0/3","timeout":"1m10s","error":"context deadline exceeded","logging-call-at":"workflowHandler.go:1167", ...

Background

The persistence store was virgin, I added the namespace “default”, then successfully started and finished exactly one workflow and then the temporal server went idle for multiple hours. Spontaneously, i.e. no worker processes were running, these errors were logged, so I suspect some housekeeping work of the service itself is the trigger for inspecting the task queues.

Configuration is development.yaml with logging level changed to “error” and I’m running temporal with my own SQL Server driver. The stack trace is not referring to my code-base directly, but of course it can be a DB operation causing the timeout in a statement right before the logged stack trace. Which one?

So what’s my issue here?

I’d like to debug the error and check if it’s connected with my driver code, so I want to reproduce it without waiting several hours for the housekeeping processes to trigger the faulty code. Is there a way to speed up these internal processes via configuration, effectively triggering the error situation more often? What SQL table is relevant for PollWorkflowTaskQueue and is it a DB query or a DB mutation that times out here?

Thanks for your attention. KR, Martin.

Reason was: the Windows dev machine went to sleep mode and was woken up at the exact instant of the error. Clearly a timeout was detected. So nothing to worry about. This was logged into the system log on on wake-up:

The system has resumed from sleep.
--- then ---
The system time has changed to ‎2022‎-‎05‎-‎02T18:04:30.500000000Z from ‎2022‎-‎05‎-‎02T15:32:40.564565800Z.

Change Reason: System time synchronized with the hardware clock.
Process: '' (PID 4).

Topic		Replies	Views
Temporal TLS Enabled Server Deployment java-sdk	4	1519	September 26, 2022
Temporal mTLS .NET Server Deployment dotnet-sdk , server , mtls	12	409	November 6, 2024
Temporal worker is unable to connect to Cluster Community Support typescript-sdk , tls , certificate	2	1107	August 24, 2023
Docker-compose deployment with mutual TLS Community Support	5	1680	August 13, 2020
Tctl won't use TLS config specified in env Community Support tls , tctl	7	1841	July 6, 2022

Temporal Application is not establishing connection to Temporal Server inside Kubernetes

Related topics