Bridging Temporal Clusters: Enabling Remote Child Workflows and Activities

vikas_kumar · October 7, 2023, 10:53pm

Hi Team,

We are currently in the process of integrating Temporal into multiple modules/components within our system, each of which possesses distinct temporal config requirements, such as retention policies, throughput etc.

Some of these modules have inherent interdependencies, where one module may generate child workflows in another. After a thorough analysis of our Temporal deployment strategy, we believe that establishing separate Temporal clusters for each module would greatly enhance our flexibility in configuring different settings for server and its corresponding workers. This approach also offers improved isolation and upgradability of each module. However, a challenge arises due to the multiple temporal cluster where we lose the end to end observability of the parent workflow through a single flow trace.

One way to bridge this is to link and allow to create remote child workflows. But currently, it is not feasible to spawn a child workflow within different Temporal clusters. While we could potentially create a custom framework for generating child workflows through activities and notifying parents of completion via signals, this approach lacks native support within Temporal. And we would still lose essential features such as traceability and the linkage between parent and child workflows, not to mention the loss of functionalities like the parent close policy.

In light of these considerations, we would like to explore the possibility of introducing support for defining multiple Temporal clients. This proposal entails maintaining a default client that functions as it currently does. However, users would have the option to provide an optional parameter, “temporalClient,” when creating a child workflow or invoking an activity. This parameter would determine the Temporal server on which the child workflow or activity is scheduled.

Waiting to hear your thoughts on this. Would be great if this can be supported.

Thanks in advance.

bergundy · October 25, 2023, 11:03pm

We have a project called Nexus planned which is supposed to enable this sort of integration.

You won’t be able to spawn child workflows directly but you’ll be able to define a handler that is called from your workflow that can schedule a workflow using a client.

Nexus abstracts this away using the Operation concept. Operations can be canceled and when they’re canceled their underlying implementation (e.g. a Temporal workflow) will get notified and can cancel itself if needed.

Note that it’s discouraged to invoke other teams’ workflows and activities directly as some of the invocation options are considered implementation details and should be determined by the implementor of those workflows and activities.

github.com

nexus-rpc/api/blob/main/SPEC.md

# Nexus RPC HTTP Specification

## Overview

The Nexus protocol, as specified below, is a synchronous RPC protocol. Arbitrary length operations are modelled on top
of a set of pre-defined synchronous RPCs.

A Nexus **caller** calls a **handler**. The handler may respond inline or return a reference for a future, asynchronous
operation. The caller can cancel an asynchronous operation, check for its outcome, or fetch its current state. The caller
can also specify a callback URL, which the handler uses to asynchronously deliver the result of an operation when it
is ready.

## Operation Addressability

An operation is addressed using three components:

- The containing service, a URL prefix (e.g. `http://api.mycompany.com/v1/myservice/`)
- [Operation Name](#operation-name)
- [Operation ID](#operation-id)

This file has been truncated. show original

github.com

bergundy/nexus-poc/blob/main/poc.go

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"time"

	"github.com/nexus-rpc/sdk-go/nexus"
	"go.temporal.io/sdk/client"
	"go.temporal.io/sdk/temporalnexus"
	"go.temporal.io/sdk/worker"
	"go.temporal.io/sdk/workflow"
)

const serviceName = "infra"

type CreateCellInput struct {
	CellID    string

This file has been truncated. show original

vikas_kumar · October 27, 2023, 11:03am

Thank you, @bergundy, for your insightful response.

The Nexus project appears to hold immense promise, and I’ve taken the time to explore the available documentation and timelines. I do have a couple of queries to further understand this exciting development:

I noticed a rough timeline outlined here , indicating that the first MVP is expected by the end of December. Will this MVP be available for us to integrate and test with our specific use cases?

Given that we primarily use Java as our development language, I’m curious if there is a projected timeline for the integration of Nexus with the Java SDK, and whether we can expect this by December, or if there is an estimated timeline for this.

With regards to visibility and traceability, I’m keen to understand how Nexus will enable us to visualize inter-cluster or inter-namespace temporal workflows and activities within the Temporal web UI. Will it provide the capability to trace and visualize all workflows and activities, including those that span multiple namespaces and external Temporal instances, in a unified manner?

Thank you once again for your valuable information.

bergundy · October 31, 2023, 11:40pm

The timelines are wildly outdated.
This project is just now starting development, I wouldn’t expect this for a while (months).

bergundy · October 31, 2023, 11:43pm

One of the goals of Nexus is to provide e2e tracing of execution. It might not be available in the first MVP though. The correlation ID (request ID) will be recorded in the handling workflow’s history and the operation ID is recorded in the calling workflow’s history but the UI that links the caller with the handler may be implemented at a later stage as the project matures.

vikas_kumar · August 12, 2024, 6:55am

Hi @bergundy
Just checking in to see if there are any updates on Project Nexus. If possible, could you share any rough timelines for the upcoming milestones? Thanks in advance!

bergundy · August 12, 2024, 12:41pm

Hey, yes there are updates.
Nexus is currently available as a pre-release feature on Temporal Cloud and the Temporal CLI using the Go SDK.
It will be available for single cluster self hosted deployments in the upcoming (a week or two) 1.25.0 Temporal server release.

More SDKs will follow in the upcoming months.

github.com

temporalio/temporal/blob/main/docs/architecture/nexus.md

# Background

Nexus RPC is an open-source service framework for arbitrary-length operations whose lifetime may extend beyond a
traditional RPC. It is an underpinning connecting durable executions
within and across namespaces, clusters and regions – with an API contract designed with multi-team collaboration in mind.
A service can be exposed as a set of sync or async Nexus operations – the latter provides an operation identifier and a
uniform interface to get the status of an operation or its result, receive a completion callback, or cancel the
operation.

Temporal uses the Nexus RPC protocol to allow calling across namespace and cluster boundaries.
The [Go SDK Nexus proposal](https://github.com/temporalio/proposals/blob/master/nexus/sdk-go.md) explains the user
experience and shows sequence diagrams from an external perspective.

# Nexus RPC in the Temporal Codebase

Temporal server uses the [Nexus Go SDK](https://github.com/nexus-rpc/sdk-go) client and server abstractions which
implement the [Nexus over HTTP Spec](https://github.com/nexus-rpc/api/blob/main/SPEC.md) for internal and cross cluster
communication.

The frontend exposes the following Nexus HTTP routes over the existing HTTP API (default port is `7243`):

This file has been truncated. show original

github.com

temporalio/samples-go/blob/main/nexus/README.md

# nexus

Nexus RPC is an open-source service framework for arbitrary-length operations whose lifetime may extend beyond a
traditional RPC. It is an underpinning connecting durable executions within and across namespaces, clusters and regions
– with an API contract designed with multi-team collaboration in mind. A service can be exposed as a set of sync or
async Nexus operations – the latter provides an operation identifier and a uniform interface to get the status of an
operation or its result, receive a completion callback, or cancel the operation.

Temporal uses the Nexus RPC protocol to allow calling across namespace and cluster boundaries. The [Go SDK Nexus
proposal](https://github.com/temporalio/proposals/blob/master/nexus/sdk-go.md) explains the user experience and shows
sequence diagrams.

This sample shows how to use Temporal for authoring a Nexus service and call it from a workflow.

### Sample directory structure

- [service](./service) - shared service defintion
- [caller](./caller) - caller workflows, worker, and starter
- [handler](./handler) - handler workflow, operations, and worker
- [options](./options) - command line argument parsing utility

This file has been truncated. show original

Topic		Replies	Views
Routing between Temporal-Clusters -- is this possible? Community Support	4	593	February 3, 2021
Cross cluster / namespace workflow chaining Community Support	2	450	February 22, 2023
Temporal Workflow Orchestration between multiple microservices Community Support java-sdk	3	1030	November 9, 2023
Microservice Communication. Replace REST APIs with Child Workflow Executions. Project structure Community Support general-impl	2	391	February 7, 2024
Child Workflow Separate Service Community Support go-sdk , documentation	8	2532	March 10, 2023

Bridging Temporal Clusters: Enabling Remote Child Workflows and Activities

Related topics