mohits
October 15, 2021, 2:34pm
1
Hi,
I have seen an issue while using temporal is that the worker is unable to connect to the temporal server and all the workflows got stuck in queue with heartbeat failing in the workflow.
Can i know if there is a solution like worker automatic reconnection or something like worker heartbeat to know the connection health.
For health checks, SDKs perform a gRPC health check when you create a client.
You can also run a health check via tctl with:
tctl cluster health
and can health checks for different services with code, see for example:
https://github.com/temporalio/temporal/blob/master/tools/cli/clusterCommands.go#L39
where fullWorkflowServiceName
can be changed depending on which service you want to health check:
temporal.api.workflowservice.v1.WorkflowService
temporal.api.workflowservice.v1.HistoryService
temporal.api.workflowservice.v1.MatchingService
You can also look at some other good posts with more health check information:
Hello,
We have been running Temporal in our staging environment for some time now. The load is typically small and we have experienced a very stable server(very few restarts). We are running it in Kubernetes and we have written the deployment configuration ourselves (taking inspiration from the provided helm charts).
Now area planning to deploy Temporal in our production environments soon and we wanted to validate some operational related details with you guys.
Our main concern at the moment…
Hi,
Currently I am using temporal-server v1.12.0
I have deployed the worker service in kubernetes as individual pod and it’s keep Running & Crashing. what is the issue with this?
Here the below error.
{“level”:“debug”,“ts”:“2021-09-07T12:16:52.304Z”,“msg”:“Membership heartbeat upserted successfully”,“service”:“worker”,“address”:“100.127.29.165”,“port”:6939,“hostId”:“7bc42600-0fd5-11ec-82ea-a230519320c5”,“logging-call-at”:“rpMonitor.go:163”}
{“level”:“fatal”,“ts”:“2021-09-07T12:17:02.349Z”,…
Hope this helps.