Hi everyone,
We’re currently conducting a technical assessment for self-hosting Temporal on AWS EKS and would appreciate some guidance on how best to support disaster recovery (DR) in this setup.
Our planned architecture involves deploying the Temporal service across two AWS regions, each with three availability zones. All traffic will be served from the primary region, with failover to the secondary region in the event of a disaster. For persistence, we’ll be using Amazon Aurora PostgreSQL.
Based on our review of the Temporal documentation, we believe that multi-cluster replication may be the right approach for our use case. Could you please confirm if this is correct?
Specifically:
-
Should we use Aurora Global Database to replicate workflow state across regions, or is that unnecessary if multi-cluster replication is enabled and Temporal handles asynchronous state replication between clusters?
-
Is multi-cluster replication currently considered production-ready for critical workloads?
-
If not, would an alternative approach—such as using Aurora Global Database for cross-region replication and starting the Temporal cluster in the DR region when needed—be viable?
Any insights or recommendations would be greatly appreciated.
Thanks in advance!