If every pod can talk to all other pods, including pods in other DCs, then Temporal is going to function. I cannot confirm the DB behavior in the case of outages as it is very DB specific. Temporal requires that DB is fully consistent in the presence of any failure. So all replication should always be synchronous.
Some caveats with the approach you propose:
- If temporal pods get partitioned but still can perform DB writes, performance will be very bad as they will steal history shards from each other on almost every request.
- Even if all DCs are fully operational, performance might suffer as Temporal will make multiple cross DC calls for every update. For example, an update can land on a frontend in DC1, history service in DC2, matching engine in DC3, and poll request from DC1 again.
Regarding multi-cluster replication, is it still in experimental mode?
Yes, it is in the experimental mode.
If still experimental, without using this feature, what would be the way to support our high availability requirement described above?
There is no real solution at this point.
If still experimental, what is remaining before being ready for production?
Many issues are related to correctness. Also, operating and setting it up is pretty hard. It is also not thoroughly tested.