Downstream client patterns with Temporal

jayelle · September 24, 2020, 5:57pm

Hi Temporal Team,

We have been exploring different patterns for integrating Temporal workflows into our existing microservices which prompted a pattern question that I’m hoping you can clarify.

Currently, we are thinking of creating grpc or http requests to our downstream microservices that are Temporal agnostic. Within our activities, we would make client requests to the downstream services to complete the actual process.

As an example:

// ACTIVITY
func SomeActivity(ctx context.Context, value) error {
	cancelc := make(chan struct{})
	defer close(cancelc)
	go func() {
		for {
			select {
			case <-cancelc:
				return
			default:
				{
					time.Sleep(1 * time.Second)
					activity.RecordHeartbeat(ctx)
				}
			}
		}
	}()
	
	ret, err := client.SomeMethod(ctx, value)
	if err != nil {
		return err
	}
	return ret
}

Questions:

Is this an anti-pattern?
If any, what features would we lose if we took this approach?
Would there be areas to look out for that become more difficult?

madhu · September 25, 2020, 4:49pm

Why should this be an antipattern, temporal activities are free to do anything they want, but having said that , this could really become tricky in many cases :
E.g. there is a workflow, which invokes an activity which does say Add or delete a resource (say a row in table), and lets assume this is done through http put, http delete… Lets also assume if the upstream was too slow and the request (http/grpc) request timed out, but in reality the resource creation/deleting could have actually happened in upstream…

In such scenarios the activity code may have to take care of
a) a set of rules to correctly identify such errors and configure appropriate retry policy
b) or some mechanism to ensure that the activity invocation largely remains idempotent ( say, e.g. if the activity is retried and the resource is already created /deleted, appropriate errors code may have to be dealt with 409/ 404 etc.) this could get a bit tricky though…

maxim · September 25, 2020, 7:58pm

Yes, it is an anti-pattern if you own the downstream services.

Obviously, if you are calling into services that are part of another company or organization then you have no choice and you’ll have to forgo all the benefits outlined.

Here are some reasons to implement downstream services as activities directly:

Flow Control

If an activity worker is down it is not consuming activity tasks from the associated task queue. So no load on the service is generated and no error logs are produced.
Activity worker allows specifying per instance rate limit.
Activity worker allows specifying per instance limit on the number of parallelly executing activities.
Activity worker allows specifying per task list rate limit which is enforced by the service across any number of workers.
If there is a request spike and activities are requested faster than workers can (or allowed due to configured rate limits) then requests are backlogged in a task queue and processed later as soon as workers get spare capacity.

Compare it to the proposed downstream gRPC service approach:

If gRPC service is down activities are still executed and make requests to the service, possibly killing it with its requests.
If gRPC service is overloaded it has no way to push back on the request rate.
Additional load on Temporal service and activity workers to cycle for the failing activities.
No support for absorbing traffic spikes without overloading downstream service.

Routing and Load Balancing

You have to maintain completely separate routing and load balancing layer for RPC
This layer is not needed (besides the ability for Temporal workers to find Temporal Frontends) when using temporal activities directly.
Temporal supports routing requests to specific workers when needed. It can be achieved through RPC as well but might be nontrivial.

Long-Running Operations

RPC services don’t support long-running operations directly.
Temporal activities can have unlimited duration
Temporal activities support heartbeating to support fast worker failure detection

Topic		Replies	Views
GRPC calls from Activities Community Support go-sdk	1	419	July 10, 2023
Best practice for parallel activities and cancelation? Community Support go-sdk	1	1169	February 13, 2024
Temporal Performance with golang microservices Community Support go-sdk , mysql , cassandra	9	1764	August 7, 2022
Best practices / Recommendations for orchestrating microservices with temporal Community Support	4	1957	October 15, 2020
Micro service orchestration clarification Community Support java-sdk	5	2010	March 15, 2022

Downstream client patterns with Temporal

Flow Control

Routing and Load Balancing

Long-Running Operations

Related topics