Temporal spanned across multiple k8s clusters

I’ve found a few different posts here about “how do I run temporal HA”, or “how can I have temporal working across more than 1 k8s cluster”. I tried this out locally using kind, postgres, istio, metallb and temporal helm charts. Got a db cluster up, then two clusters each running temporal, using the single db-cluster as a shared backend database. I was able to get those all running, creating a ns in one cluster and it appears in the other just fine. I then tried to deploy a sample app and got an error…

"error":"Not enough hosts to serve the request",

So I started digging into why this is when all my pods (frontend/history/matching/web/worker) were running just fine. AI suggested this:

What “Not enough hosts to serve the request” usually means in a healthy-looking cluster

This error is thrown when frontend cannot find any history (or matching) hosts in the membership ring for the cluster that should own the request.

In a multi-cluster setup, that often happens when:

clusterMetadata.currentClusterName is the same in both clusters, or misconfigured.

Global namespaces (including possibly temporal-system) are configured with an active cluster that does not actually match where your traffic is going.

Frontend tries to redirect the call to the “active” cluster and fails in a way that ends up as “Not enough hosts”.

The stack trace you pasted includes the redirection and namespace handover interceptors:

(*Redirection).handleRedirectAPIInvocation

(*NamespaceHandoverInterceptor).Intercept

That’s a big hint that namespace redirection / global namespace logic is involved.

Both my temporal clusters DO have the same name (like I said.. one big cluster)… So I’m wondering if what I’m trying to do is even possible with the current architecture. A few questions that cross my mind are…

  1. Can I only have a single frontend running? Like does there have to be a single master that “owns” the namespace?
  2. If in the future I wanted to trim down to just a single cluster, would I be able to?
  3. Has anyone done this successfully?

Any info that anyone can provide on this would be greatly appreciated.

Thank you,

Paul

This configuration is not supported by Temporal. Multiple Temporal clusters cannot share a database. Temporal requires point-to-point connections for all Temporal cluster processes. So, if you run a single Temporal cluster across multiple k8s clusters, you have to set up routing in a way that all Temporal pods can talk to each other directly, even if they belong to different k8s clusters.

Use Self-hosted Multi-Cluster Replication | Temporal Platform Documentation for high availability setups.

Hi Maxim, thanks for the reply.

So what am I not understanding about this… You say

“if you run a single Temporal cluster across multiple k8s clusters, you have to set up routing in a way that all Temporal pods can talk to each other directly”

I’ve done this, and I thought that I’d have to apply the temporal helm chart to both clusters, using the same cluster name, and deploying all the components to both clusters. Is that not the case? Are there some components that must run in only one cluster? And if in the future I wish to decommission one or the other k8s clusters, do I have to incur some downtime to move the core components from cluster-a to cluster-b?

Thank you.

Temporal doesn’t support multiple clusters sharing a single DB. You can configure a single cluster (from Temporal point of view) across multiple k8s clusters. But for this the default helm chart and networking are not going to work as they will not allow direct point to point communication between each pod of the temporal cluster.

HI Maxim,

Is there a doc or guide on this anywhere that discusses what components need to run where and how they need to be configured or is this a unsupported path?

Thanks.

Paul

I don’t think there is a specific document, as this is not a recommended configuration. The main idea is that you need to support direct point-to-point communication between all the pods.