How to easily deploy a temporal server cluster in production?

Dear developers, I am an architect of a banking system. I am trying to use temporal in a new system and consider promoting it. In the development environment, I directly use docker-compose to deploy a single-node temporal server, and it works normal.

The current confusion is how do I deploy a distributed temporal server cluster in a production environment. If I deploy multiple docker-compose nodes like a development environment, is it feasible to implement load balancing through domain name resolution? Can multiple servers share the same database?

I learned that Temporal contains 4 modules, and the official recommendation is to deploy these four modules separately, but I did not find specific documents for this kind of solution. And I don’t want the deployment plan to be too complicated because my system does not have too high throughput.

I hope to get help here, thank you.

Hi @guofeng, were you able to go through info in the docs:

and

Would be nice to know what’s missing there that would help you move forward with deploying Temporal in your particular desired setup

Hi tihomir,thanks for your reply, these documents are useful and I got some information from them, but I think there should be a step by step demo, I see a lot of similar questions in the forum and it is hard to know clearly how a distributed cluster should be deployed based on the available information.

For example, if I deploy multiple Temporal clusters how do I handle inter-cluster communication or data synchronization. Will multiple clusters sharing a single datastore cause conflicts. And so on.

@guofeng thanks for the recommendations!
In addition to the links in the previous post Temporal also provides helm charts repo: GitHub - temporalio/helm-charts: Temporal Helm charts
and a set of Docker compose files: GitHub - temporalio/docker-compose: Temporal docker-compose files

If I may add a bit of constructive feedback, it is very hard to provide specific advice on a particular deployment when details of the deployment are not mentioned in the question :slight_smile: Note that Temporal is open source and as you have mentioned there are many users that have managed to deploy it in different environments. Providing specific information about your deployment environment and strategies and errors you are running into would allow the community to get a better idea and provide help based on your specific needs.

Dear tihomir, this is my deployment vision, do you think it can be achieved? Or do you have any suggestions?

i would make 2 suggestions:

  1. try to deploy the service, one service per docker / k8s pod, i.e.
  • docker / k8s pod 1 only run frontend
  • docker / k8s pod 2 only run matching
  • docker / k8s pod 3 only run history
  • docker / k8s pod 4 only run worker
  1. either make sure the mysql replication (DB layer replication) is strong consistent, or try not to use it. Temporal depends on DB being strong consistent.

Thank you, but I want to ensure high availability in two data center. This is a regulatory requirement, there are at least two available services.

@Wenquan_Xing If I run multiple Temporal Server clusters at the same time and let them user the same database, will there be data conflicts? This is the only problem that hinders me at the moment.

Temporal will make sure that data is not corrupted, but such setup will not be functional as these services will be stealing shards from each other all the time. So it will be 100 times slower than a single cluster setup.

Dear maxim, It’s been a long time coming, but I’m still trying. May I ask what program you suggest? Here are the needs I want to satisfy.
1, Do not use k8s, but will use docker
2, Have two clusters that are alive at the same time and share one database.
3, Make sure that after one cluster is physically down, the Client can continue to get work from the other surviving cluster.

  1. K8s is not required to run Temporal
  2. Not supported. I’m not sure what you are trying to achieve here. 99% of outages are caused by DB unavailability. So having a single DB is not going to increase your availability much.
  3. Not supported. And as I pointed in (2) not really useful.

Our mysql cluster is already highly available through master-slave replication, so if the application is also highly available, then we don’t have to worry about downtime failures.

AFAIK mysql master-slave is not fully consistent during failover events. Temporal cannot function on top of non fully consistent DB.

Ok, thank you. Let me ask another question: how do I specify the development.yaml that I need to use when I build the docker image?

@guofeng Just curious to know, How did you solve the HA with different DC? I am also looking for similar solution. We wouldn’t have much time to build another DC after the current DC is gone. As it would be our tier 0 application. It should be fail over with in 5 mins of DC failure

Looks like its improved from 2022 - Self-hosted Multi-Cluster Replication | Temporal Documentation

Thanks. Will take a look at the replication solution

The primary-backup of postgres should work with sync-replication mode, and guess mysql should have a similar thing like this.