Routing between Temporal-Clusters -- is this possible?

sdonovan · February 3, 2021, 11:36pm

Greetings. Hopefully I can explain this series of questions clearly.

Imagine two namespaces, N1 and N2, with workflows in each. Workflows in N1 will start child workflows in N2, but not the reverse. When starting a child workflow from N1 for instantiation in N2, and calling from within an activity, we: (i) set the WorkflowClientOptions to the correct target namespace (N2) and (ii) set the correct task-list when instantiating the workflow. When the workflow in N2 completes, an event is written to the parent workflow in N1. I assume this works.

Is it possible to run N1 and N2 on separate instances of Temporal (I1 and I2), i.e. as a cluster, using Kafka or whatever between them (see diagram attached)? Specifically, from N1, can I start a workflow in N2, and get usual event notification when it’s complete?

Going further: imagine the activity in I1 that starts the workflow in I2 has a doNotComplete() and sends it’s activity task-token to the workflow in I2, either as workflow input or a signal. When the task-token is completed within I2, does it get routed to originating activity in I1?

On the diagram, I’m assuming the red-line use-cases are not supported in the cluster case (I’m assuming call routing is done by namespace – if it is supported at all).

Many thanks!

Sean

maxim · February 3, 2021, 11:42pm

In the future, we plan to support child workflow and activity invocations directly between clusters.

Now it is not directly supported. So the recommended workaround is to start a workflow in a separate cluster through an activity and report workflow completion from activity executed by the “child” workflow that signals the original workflow.

I would recommend to use an activity to start “child” and the child to signal back to complete over an asynchronous activity. The reason is timeouts and retries. If remote cluster is not reachable for a short period of time you want to retry the StartWorkflowExecution call almost immediately. In case of asynchronous activity the timeout is expected to be as long as the whole “child” workflow execution. So retry is not going to happen for a long time.

sdonovan · February 3, 2021, 11:47pm

Ah, I get it – we view them as totally separate installations, not clustered at all, and communicate with async/signals. The downside is that activity code in I2 needs url/port access to I1 – though, maybe we find a solution to that. If the cluster case was supported, communication is limited to just the Temporal instances.

Thank you!

Sean

maxim · February 3, 2021, 11:51pm

You can always move those specific activities in its own binary and make them part of the core cluster install.

sdonovan · February 3, 2021, 11:54pm

Yup, that’s a good idea!

Topic		Replies	Views
Multiple active clusters at same time Community Support	14	1400	July 22, 2022
Cross cluster / namespace workflow chaining Community Support	2	469	February 22, 2023
Cross-namespace Activity Calls Community Support	0	799	April 9, 2021
About the implications of running child workflows in a different namespace than the parent Community Support	2	1371	March 31, 2023
Bridging Temporal Clusters: Enabling Remote Child Workflows and Activities Community Support java-sdk , deployment	6	684	August 12, 2024

Routing between Temporal-Clusters -- is this possible?

Related topics