Are you able to see anything in your server logs?
Try looking for fatal error log that starts with “error starting scanner”.
Do you get this error after a longer period of worker inactivity? If so then this thread might be helpful.
Checked with the server team and they asked if you can
confirm the connectivity between your worker pod and wherever you have the frontend service deployed?
One path to this error is the worker service being unable to validate the existence of it’s internal namespace with the frontend service (via gRPC).
I have verified service.worker.scanner.scanner.go.
I have increased to 7 Minutes but it doesn’t reflect and crash every 5 mins. I am not sure what is the root cause of this. Please let me know.
I see at least two different error messages in your logs. context deadline exceeded comes from “SDK worker” which is inside “worker service” (sorry for overloaded “worker” term), and what I see from the code, the only place where it can be returned from worker.Start() is when SDK checks for namespace. So clearly, worker service doesn’t have access to frontend. I didn’t follow stack trace of the second error, but it seems that startWorkflow fails because of the same reason.
To double check this you can shell into the worker service container and run tctl cluster health. If it gives you an error then you need to check your k8s setup.
tctl cluster health works only for frontend pod service.
Rest of the services pods doesn’t show healthy with Running status (but worker pod crashes as mentioned above)
Even frontend pod doesn’t show healthy with address.
$ kubectl exec -it frontend-84b9b86577-ztk6k -c frontend – bash
bash-4.2$ tctl --address server-asyncworkflow-local.apps.mt-d2.carl.gkp.net:7233 cluster health
Error: Unable to get “temporal.api.workflowservice.v1.WorkflowService” health check status.
Error Details: rpc error: code = DeadlineExceeded desc = context deadline exceeded
(‘export TEMPORAL_CLI_SHOW_STACKS=1’ to see stack traces)
bash-4.2$ tctl cluster health
temporal.api.workflowservice.v1.WorkflowService: SERVING
All other services don’t have to be directly accessed from worker service. Worker “talks” only to frontend. I am not network pro, but apparently server-asyncworkflow-local.apps.mt-d2.carl.gkp.net is not accessible neither from worker nor from frontend itself. I think this should work:
tctl will talk to localhost by default and if you run it on frontend itself it should be reachable. This is just to check that health API on frontend is working properly. To access it from worker service you need to figure out your network/deployment topology and right DNS name that worker should use.
History and matching also expose health check endpoints but tctl doesn’t check them. You need some gRPC tool which may call it. AWS LB also supports gRPC health check. Worker doesn’t have one.