Problem: Replay issues caused by RPC calls inside executeActivity()
of WorkflowOutboundCallsInterceptor
Current Setup
We have implemented WorkflowOutboundCallsInterceptor
to override executeActivity()
and dynamically determine the task queue for routing activity tasks. This is required by our multi-datacenter architecture, where activities for a user must always run in their currently assigned data center.
To determine the correct task queue, we make an RPC call inside executeActivity()
to fetch the user’s current DC assignment from a source-of-truth (SOT) service.
However, since executeActivity()
is invoked during replay, the RPC call is re-executed — which violates Temporal’s determinism guarantees.
Background: Multi-DC Architecture
We operate multiple Temporal clusters (one per data center) with no replication between them. Each user is assigned to a specific data center, and this assignment can change over time based on traffic distribution.
To maintain read-after-write consistency, all activities for a given user must run in their current DC — even if the workflow started in a different DC.
Example
- A workflow starts in DC1, and activities initially run in DC1.
- Later, the user’s assignment moves to DC2.
- Remaining activities must now run in DC2, though the workflow itself continues running in DC1.
Worker & Task Queue Setup
Each data center runs:
- A local Temporal cluster
- Primary workers polling both workflow and activity queues in the local cluster
- Cross-colo workers polling activity queues from remote clusters
Task Queues
- DC1:
task-queue-dc1
— polled by local workerscross-colo-task-queue-dc2
— polled by workers in DC2
- DC2:
task-queue-dc2
— polled by local workerscross-colo-task-queue-dc1
— polled by workers in DC1
Question:
Given the above setup, what’s the best way to:
- Perform external lookups (like our RPC call) without breaking replay determinism?
- Ensure that activity routing remains replay-safe?
Would wrapping the network call in Workflow.sideEffect()
be appropriate? Or should we use WorkflowUnsafe.isReplaying()
to guard it? Are there better alternatives?
Would appreciate any advice, patterns, or alternatives others have used in similar cross-DC setups!