Hey Team,
I have read many posts and I’m aware of the below:
One temporal client cannot connect to 2 active clusters
Multi-cluster is experimental
Problem statement:
I have 2 fully isolated openshift clusters in 2 DCs. Each cluster runs the same number of microservices (workers).
I want to deploy Temporal Server (4 temporal services) on both so that each cluster has its own Temporal Server. Each temporal service will scale differently as its deployed independently and not packaged in a single container. These microservices/workers & temporal services cannot communicate to each other across DCs.
Which of these or none of these 2 options is good?
Ran a few scenarios with SetUp1 to check cross-site processing. The behaviour is not consistent.
If worker1 is down on site1, worker1 on site 2 picks up to complete the workflow and sometimes it doesn’t.
Set-up 1: Two Temporal Servers sharing 1 schema - If cross-site interaction happens due to a common schema, i.e; If a worker on site 2 can process a workflow that was initiated by site 1 then we don’t need to maintain a passive instance and HA/DR is achieved straight away incase one of the Temporal Servers or Workers go down.
Set-up 2: Two Temporal Servers having its own schema (on same physical DB) - This will not have cross-site interaction due to separate schemas. A passive instance needs to be maintained incase one of the site (either Temporal or workers or both) goes down.
All Temporal cluster nodes should be able to communicate with each other directly. So, your Set-up 1 is not going to work. I don’t understand the value of set-up 2 with the same physical DB. Why do you want to have a single point of failure in a multi-dc setup? You can run two completely independent clusters, each with its DB in this case.
If you need the ability to continue already started workflows in case of DC outage, then Temporal global namespace (multi-cluster) is the only option.
A passive instance needs to be maintained incase one of the site (either Temporal or workers or both) goes down.
Temporal requires strong DB consistency. If data can be lost during DB failover, then the cluster can get into a corrupted state, which is not recoverable.
In set-up 2, the DB is HA in a cluster across 2 sites. Its a Dual site Master/ Multiple Standbys using Synchronous and Asynchronous streaming replication. One of the Standby in DR is cascade Standby.
Master/Standby architecture of EDB PostgreSQL does not allow for more than a single master at once.
That’s the reason to have 2 logical DB’s for 2 independent clusters.
Sorry, I’m not an expert in PostgreSQL replication. So, I don’t know if it provides the necessary consistency in case of host crashes. If it doesn’t, the cluster might get into an unrecoverable state. So please test thoroughly.
Thanks @maxim . Trying to understand why the clusters have to talk to each other? Isn’t the DB source of truth for them? In set-up1, its similar to having multiple replicas of each service?
What if in set-up1, the clusters can talk to each other with something like Skupper. Only if site1 temporal service is not available, call will go to other site.
DB is the source of truth for them, but Temporal allocates shards to history hosts. So, the direct routing to these shards is necessary. You can watch the Designing a Workflow Engine From First Principles presentation at the @Scale conference which explains the architecture of the service.