Running temporal across multiple Kubernetes clusters

bascht · September 1, 2022, 6:32pm

Hi all! We’re currently evaluating Temporal for our workflow needs and so far it really checks all the boxes we want - but now that we try to sketch up how it would fit our current infrastructure architecture it get’s complicated:

Thing is: We run multiple Kubernetes clusters in parallel. For simplicity, let’s just call them A, B and C. All of our apps are deployed to A, B and C and backed by a Cassandra cluster (outside of K8s). All clusters receive production traffic at all times. So at first glance Temporal looked like the perfect fit for us, since we would only run the Temporal “cluster” inside of our Kubernetes cluster - like recommended.

As a first prototype we tried to connect two Temporal “Clusters” running in two individual Kubernetes clusters to the same database and quickly ran into the slowness that Maxim already described in this issue:

Temporal will make sure that data is not corrupted, but such setup will not be functional as these services will be stealing shards from each other all the time. So it will be 100 times slower than a single cluster setup.

Which in our concrete case results in workflows being blocked for up to 5 minutes.

But - we don’t want to run multiple Temporal clusters to achieve HA. We just do not want to have a “Temporal is running only in Kubernetes Cluster A” situation. Neither do we want or need something like a failover setup where we would need to deal with a active / passive Temporal cluster. All apps should just be able to talk to their “cluster local” Temporal frontend.

The architecture guide mentions horizontal scaling for all the components inside of a Temporal cluster, so our next idea would be to basically “stretch” a Temporal cluster across multiple Kubernetes clusters:

So, instead of running multiple replicas of the worker service in Kubernetes cluster A, we would run 1 replica of the worker in A, B and C each. Same goes for all the other components.

My question is: Are we far off the tracks or is this a reasonable deployment option?

maxim · September 1, 2022, 7:14pm

Temporal service processes need to be able to communicate directly point to point. They don’t really know anything about k8s. So if you can configure your networking in the way that processes can talk to other Temporal processes in a different cluster directly, then your setup should work.

bascht · September 1, 2022, 8:07pm

Alright, thanks for your quick reply!

Just to be sure, let’s leave k8s out of the picture for a moment. Say we have two datacenters, each running two matching services:

   DC A              DC B

Matching A1       Matching B1
     |        X        |
 Worker A2         Worker B2

All matching services would be able to talk to each other, but with different latencies. A1 → A2 would be a reliable local connection, but e.g. A1 → B2 could see latency spikes and even intermittent connection failures, since it’s different providers.

Will network latency be an issue in this setup?

maxim · September 1, 2022, 8:11pm

It depends on the latency. I certainly wouldn’t recommend to run such setup across US east and west, for example.

Note that it is not just matching to matching. For example, you can end up with the request bouncing around more than once. For example:

Frontend DCA
History DCB
Matching DCA
Poll request from worker in DCB
Hits Matching in DCA
Start activity is called to History in DCB

So the latency will add up pretty quickly.

bascht · September 1, 2022, 8:35pm

So the latency will add up pretty quickly.

You’re absolutely right. I haven’t thought about the full lifetime of a request and I somewhat hoped that the ring gossip would prefer “closer” nodes, so that e.g. History DCB will prefer Matching DCB because of it’s low latency, even though it knows about Matching DCA or Matching DCC.

But at any rate - we will probably set up another prototype so we can test this setup and maybe even run a few performance tests to see how it holds up.

maxim · September 1, 2022, 8:42pm

The history nodes are stateful. So each workflow id maps to a specific node. So there is no way to prefer closer nodes without a complete architecture redesign.

bascht · September 1, 2022, 8:46pm

Oooh okay. Thank you so much for taking the time to explain all this!
I’m going to mark your previous answer as a solution, since it basically answers my “should we give this a try” question.

Topic		Replies	Views
Temporal cluster cross-dc deployment Community Support cross-dc , deployment	7	1068	September 17, 2021
Active-active deployment of temporal services on multiple kubernetes cluster cross DC Community Support deployment	1	1723	November 16, 2022
Moving temporal cluster from one k8s cluster to another one Community Support cassandra , kubernetes	2	796	August 19, 2022
Multi-datacentre issue Community Support deployment	1	639	April 13, 2022
Multi Cluster metadata Community Support multicluster	7	694	September 22, 2021

Running temporal across multiple Kubernetes clusters

Related topics