New feature: Multi-region Namespaces in Temporal Cloud

irina · June 3, 2024, 11:05pm

Multi-region Namespaces are now available in Temporal Cloud!

The feature provides failover capabilities to mitigate service outages due to regional failures.

Key Benefits:

Enhanced Availability: Continuous service availability across multiple regions. 99.99% contractual SLA.
Disaster Recovery: Automated failover to protect data and operations during regional outages. RPO: near zero; RTO: 20 minutes or less
Reduced Downtime: Automatic data replication and failovers to minimize service interruptions.

Visit the Temporal Cloud Documentation to learn more, and feel free to ask any questions here or on the Community Slack.

awwx · June 4, 2024, 5:56am

I have a question about the documentation… Multi-region Namespace - Temporal Cloud feature guide | Temporal Documentation says “Critical operations like Signals won’t get lost.”

However, with the replication delay, it sounds like I could send a signal to a workflow and the active region could go down before that event was replicated to the standby region?

By “won’t get lost”, does it mean that if the signal didn’t get replicated, it’ll be processed once the previously active region recovers?

irina · June 4, 2024, 6:02pm

Hi Andrew @awwx !

By “won’t get lost”…

Yes. With the given extreme scenario that the cluster is down right after acknowledged the signals, the signals will be recovered after the source region recovers. However, the replication lag is at millisecond level, so this scenario is very unlikely to happen.

awwx · June 4, 2024, 11:52pm

OK, that makes sense. Is this correct… my understanding is that signals of the same signal method name are normally delivered to a workflow in order. It sounds like this would no longer be guaranteed with multi-region namespaces?

At the level of an individual workflow, of course. Running a lot of simultaneous workflows would raise the probability that a rare event happens to at least one of them.

On the other hand, usually of course a region doesn’t just crash, instead performance is degraded and that would give time for events to be replicated.

The lack of an ordering guarantee for signal delivery would be important to highlight, as a workflow that worked correctly when it processed signals in order might not if signals are reordered.

irina · July 15, 2024, 11:35am

The signal reorder can only happen when there is a failover and conflict resolution happens. If the workflow execution logic is aware of the ordering, then it is not guarantee on signal ordering as they are ordered by when the server receives and records the signal. Users has to include some ordering information in the signal payload to avoid race condition.

awwx · July 18, 2024, 4:42am

Yes, of course, if I’m aware of the new requirement then I can design my workflows to continue to work correctly in the presence of rare signal reordering.

My point is simply that it would be good to mention this in the documentation, so that people would know what changes they might need to make to their workflow implementations to continue to get reliable execution when using multi-region.

Topic		Replies	Views
Business continuity in the case of a regional outage Community Support cassandra , multicluster	14	1624	September 1, 2022
Disaster recovery Community Support	1	165	August 14, 2024
Temporal integration for primary and DR environment Community Support general-impl	3	772	June 10, 2021
Have any body tried temporal multi cluster replication on differnt regions? Community Support xdc , deployment	21	2636	October 5, 2021
Local workflow/activity fallback mode during backend failure scenario Community Support	3	632	April 14, 2021

New feature: Multi-region Namespaces in Temporal Cloud

Related topics