How to scale temporal to run across multiple hosts for HA

Hello, I’m looking forward to running temporal in a production-ready way.

What we’ve done
In the initial experiment, we currently run temporal using docker-compose and each major component corresponds to a container

  • temporal
  • temporal-admin-tools
  • temporal-postgres
  • temporal-ui

What we want
Running on a single host, things are working fine. Now we start to think about HA. Ideally, we want such kind of scenario: suppose the active temporal server is running on host A and there are some workflows currently running in the middle. Now host A is shut down. The previous standby temporal server running on host B can pick up the unfinished workflows and resume them seamlessly. For some reason, we don’t want to run temporal on top of K8s.

What we think
To accomplish this goal, we are thinking there are a few points that need to be ensured

  1. We need to replicate the state of the temporal-postgres container. Containers running on different hosts should have eventual consistent copies of data.
  2. We deploy multiple copies of temporal services on different hosts. Those copies have to be aware of each other and establish leadership and membership so that only one of them is active at a time ==> Does temporal provide this out-of-box?

Please advise if we are on the right track and if there is anything that we need to be mindful of. Also, does Temporal provide any documentation we can follow for such kind of scaled deployment?

I know I’ve asked lots of questions but I would really appreciate your help

Check out Self-hosted Multi-Cluster Replication | Temporal Documentation

The Temporal cluster is already HA if its DB is HA (we recommend Cassandra for this). You can add and remove nodes for each of its roles without downtime.

If you want multi-region availability, then the multi-cluster setup @tihomir mentioned is the way to go.

@maxim, if we run multiple nodes for the Temporal cluster, and we use the Schedule feature in Temporal, could workflows get double-scheduled?

No, they will be scheduled only once, as they rely on a database for consistency.