Hello Temporal team,
I have already read following pages:
- Communication between multiple instances of temporal server
- Production HA setup
- Postgres stretched cluster along with Temporal stretched cluster across 2 DCs with ~7ms latency
- What is Multi-Cluster Replication? | Temporal Documentation
We want to achieve High Availibility for temporal, meaning we can afford to have a DC down for a certain amount of time without impacting our business.
Below the deployment topology (in red an issue I detail later)
Temporal services will be able to communicate to each other as is they were deployed in the same DC, i won’t detail this part.
Now my questions:
- Is this kind of deployment is suported by temporal?
- If yes, Can you confirm that if kubernetes DC1 and DC2 are not able to reach DC3, but DC3 is still running, pods still connected to database and serving trafic, it can’t corrupt any state? are shards rebalanced? can you explain what could happen if it is not the case?
Maybe this approach is not valid and that is why What is Multi-Cluster Replication? | Temporal Documentation is about asynchronous replication with an active-passive model. If there are resources on why it is not valid I would be glad if you can share
Regarding multi-cluster replication, is it still in experimental mode?
If still expiremental, without using this feature, what would be the way to support our high availiblity requirement described above?
If still expiremental, what is remaining before being ready for production?
Finally, I am wondering how temporal cloud is addressing high availibility, do you have any word on this?
Thanks for your time