Bridging Temporal Clusters: Enabling Remote Child Workflows and Activities

Hi Team,

We are currently in the process of integrating Temporal into multiple modules/components within our system, each of which possesses distinct temporal config requirements, such as retention policies, throughput etc.

Some of these modules have inherent interdependencies, where one module may generate child workflows in another. After a thorough analysis of our Temporal deployment strategy, we believe that establishing separate Temporal clusters for each module would greatly enhance our flexibility in configuring different settings for server and its corresponding workers. This approach also offers improved isolation and upgradability of each module. However, a challenge arises due to the multiple temporal cluster where we lose the end to end observability of the parent workflow through a single flow trace.

One way to bridge this is to link and allow to create remote child workflows. But currently, it is not feasible to spawn a child workflow within different Temporal clusters. While we could potentially create a custom framework for generating child workflows through activities and notifying parents of completion via signals, this approach lacks native support within Temporal. And we would still lose essential features such as traceability and the linkage between parent and child workflows, not to mention the loss of functionalities like the parent close policy.

In light of these considerations, we would like to explore the possibility of introducing support for defining multiple Temporal clients. This proposal entails maintaining a default client that functions as it currently does. However, users would have the option to provide an optional parameter, “temporalClient,” when creating a child workflow or invoking an activity. This parameter would determine the Temporal server on which the child workflow or activity is scheduled.

Waiting to hear your thoughts on this. Would be great if this can be supported.

Thanks in advance.

We have a project called Nexus planned which is supposed to enable this sort of integration.

You won’t be able to spawn child workflows directly but you’ll be able to define a handler that is called from your workflow that can schedule a workflow using a client.

Nexus abstracts this away using the Operation concept. Operations can be canceled and when they’re canceled their underlying implementation (e.g. a Temporal workflow) will get notified and can cancel itself if needed.

Note that it’s discouraged to invoke other teams’ workflows and activities directly as some of the invocation options are considered implementation details and should be determined by the implementor of those workflows and activities.

Thank you, @bergundy, for your insightful response.

The Nexus project appears to hold immense promise, and I’ve taken the time to explore the available documentation and timelines. I do have a couple of queries to further understand this exciting development:

I noticed a rough timeline outlined here , indicating that the first MVP is expected by the end of December. Will this MVP be available for us to integrate and test with our specific use cases?

Given that we primarily use Java as our development language, I’m curious if there is a projected timeline for the integration of Nexus with the Java SDK, and whether we can expect this by December, or if there is an estimated timeline for this.

With regards to visibility and traceability, I’m keen to understand how Nexus will enable us to visualize inter-cluster or inter-namespace temporal workflows and activities within the Temporal web UI. Will it provide the capability to trace and visualize all workflows and activities, including those that span multiple namespaces and external Temporal instances, in a unified manner?

Thank you once again for your valuable information.

The timelines are wildly outdated.
This project is just now starting development, I wouldn’t expect this for a while (months).

One of the goals of Nexus is to provide e2e tracing of execution. It might not be available in the first MVP though. The correlation ID (request ID) will be recorded in the handling workflow’s history and the operation ID is recorded in the calling workflow’s history but the UI that links the caller with the handler may be implemented at a later stage as the project matures.