Hi all! We’re currently evaluating Temporal for our workflow needs and so far it really checks all the boxes we want - but now that we try to sketch up how it would fit our current infrastructure architecture it get’s complicated:
Thing is: We run multiple Kubernetes clusters in parallel. For simplicity, let’s just call them A, B and C. All of our apps are deployed to A, B and C and backed by a Cassandra cluster (outside of K8s). All clusters receive production traffic at all times. So at first glance Temporal looked like the perfect fit for us, since we would only run the Temporal “cluster” inside of our Kubernetes cluster - like recommended.
As a first prototype we tried to connect two Temporal “Clusters” running in two individual Kubernetes clusters to the same database and quickly ran into the slowness that Maxim already described in this issue:
Temporal will make sure that data is not corrupted, but such setup will not be functional as these services will be stealing shards from each other all the time. So it will be 100 times slower than a single cluster setup.
Which in our concrete case results in workflows being blocked for up to 5 minutes.
But - we don’t want to run multiple Temporal clusters to achieve HA. We just do not want to have a “Temporal is running only in Kubernetes Cluster A” situation. Neither do we want or need something like a failover setup where we would need to deal with a active / passive Temporal cluster. All apps should just be able to talk to their “cluster local” Temporal frontend.
The architecture guide mentions horizontal scaling for all the components inside of a Temporal cluster, so our next idea would be to basically “stretch” a Temporal cluster across multiple Kubernetes clusters:
So, instead of running multiple replicas of the worker service in Kubernetes cluster A, we would run 1 replica of the worker in A, B and C each. Same goes for all the other components.
My question is: Are we far off the tracks or is this a reasonable deployment option?
Temporal service processes need to be able to communicate directly point to point. They don’t really know anything about k8s. So if you can configure your networking in the way that processes can talk to other Temporal processes in a different cluster directly, then your setup should work.
Just to be sure, let’s leave k8s out of the picture for a moment. Say we have two datacenters, each running two matching services:
DC A DC B
Matching A1 Matching B1
| X |
Worker A2 Worker B2
All matching services would be able to talk to each other, but with different latencies. A1 → A2 would be a reliable local connection, but e.g. A1 → B2 could see latency spikes and even intermittent connection failures, since it’s different providers.
You’re absolutely right. I haven’t thought about the full lifetime of a request and I somewhat hoped that the ring gossip would prefer “closer” nodes, so that e.g. History DCB will prefer Matching DCB because of it’s low latency, even though it knows about Matching DCA or Matching DCC.
But at any rate - we will probably set up another prototype so we can test this setup and maybe even run a few performance tests to see how it holds up.
The history nodes are stateful. So each workflow id maps to a specific node. So there is no way to prefer closer nodes without a complete architecture redesign.
Oooh okay. Thank you so much for taking the time to explain all this!
I’m going to mark your previous answer as a solution, since it basically answers my “should we give this a try” question.