Context deadline exceeded after restarting Temporal server

Hello! Perhaps this is a stupid question with an obvious answer. We are running Temporal on K8S (with Istio). What we see happening is that clients in one of our services are connecting to Temporal without any problems, but as soon as we do a redeploy of Temporal, the clients are starting to give “context deadline exceeded” errors when we try to trigger workflows. This can be quite problematic, because the situation doesn’t seem to resolve itself, and sometimes those deployments just do restarts. Can you think of anything anything that we may have misconfigured on the Temporal client or server side?

We figured it out. We noticed that the connection was always lost after a restart of the Temporal services. Then we found this lead, basically saying that Istio doesn’t work yet with headless services (in this case: Temporal in K8S). We fixed it by running both a headless and a non-headless service for Temporal. The Temporal backends connect to the headless one, while our regular backend applications connect to the non-headless one.

Quite K8S specific, but perhaps this helps anyone at some point.

2 Likes