I’ve found a few different posts here about “how do I run temporal HA”, or “how can I have temporal working across more than 1 k8s cluster”. I tried this out locally using kind, postgres, istio, metallb and temporal helm charts. Got a db cluster up, then two clusters each running temporal, using the single db-cluster as a shared backend database. I was able to get those all running, creating a ns in one cluster and it appears in the other just fine. I then tried to deploy a sample app and got an error…
"error":"Not enough hosts to serve the request",
So I started digging into why this is when all my pods (frontend/history/matching/web/worker) were running just fine. AI suggested this:
What “Not enough hosts to serve the request” usually means in a healthy-looking cluster
This error is thrown when frontend cannot find any history (or matching) hosts in the membership ring for the cluster that should own the request.
In a multi-cluster setup, that often happens when:
clusterMetadata.currentClusterName is the same in both clusters, or misconfigured.
Global namespaces (including possibly temporal-system) are configured with an active cluster that does not actually match where your traffic is going.
Frontend tries to redirect the call to the “active” cluster and fails in a way that ends up as “Not enough hosts”.
The stack trace you pasted includes the redirection and namespace handover interceptors:
(*Redirection).handleRedirectAPIInvocation
(*NamespaceHandoverInterceptor).Intercept
That’s a big hint that namespace redirection / global namespace logic is involved.
Both my temporal clusters DO have the same name (like I said.. one big cluster)… So I’m wondering if what I’m trying to do is even possible with the current architecture. A few questions that cross my mind are…
- Can I only have a single frontend running? Like does there have to be a single master that “owns” the namespace?
- If in the future I wanted to trim down to just a single cluster, would I be able to?
- Has anyone done this successfully?
Any info that anyone can provide on this would be greatly appreciated.
Thank you,
Paul