Best-practices for structuring a workflow that involves many independent services

Hi there,
Brand new to Temporal and I’m still working my way through the documentation and videos, so I apologize in advance if this is a stupid question :slight_smile:

Here is an outline of my current data pipeline (let’s call it MyDataPipeline), which is a workflow that can take many hours. I am hoping to simplify the structure with Temporal.

  • ServiceA is used to pre-process a large amount of input data. This can take 5-45 minutes depending on the amount of data. Once it is done, it enqueues N tasks into a message queue for ServiceB.
  • These tasks are processed in parallel by ServiceB, and they can take hours. Once all N of these tasks are done, ServiceB enqueues a task for ServiceC.
  • ServiceC performs post-processing that can also take hours.

The business logic for Services A, B, and C all involve calls to external systems, each of which can fail obviously. A few other details:

  • We own all 3 services but we do not own all of the external systems that they depend on.
  • Services A, B, and C are used by other data pipelines, not just MyDataPipeline.
  • The work that is performed by A, B, and C is different enough that ideally it would be performed by workers on different machines that are optimized for their respective purposes.

Conceptually, I think of the business logic of this data pipeline with this (oversimplified) pseudocode:

func InvokeDataPipeline(data []byte) {
  serviceBTasks := serviceA.preprocess(data)
  for _, sbt := range serviceBTasks {
     // each sbt can be processed in parallel, but for simplicity I'm showing it
     // sequentially for this pseudocode
     serviceB.process(sbt)
  }
  serviceC.postprocess()
}

This seems like it could pretty easily be transformed with Temporal, where InvokeDataPipeline is the Temporal workflow and each call to services A-C would be an activity. However, as I mentioned, the work performed by services A-C make external calls to other systems, too, and additionally, it would be best for these tasks to be be performed on dedicated and optimized A/B/C machines.

Is Temporal right for this type of pipeline? If so, I’m hoping to understand what type(s) of Temporal pattern(s) are suggested for this sort of use case.

Thank you in advance!

TLDR; Yes, Temporal is an ideal fit for such a use case.

Temporal supports routing activities to specific services or even individual hosts using the concept of a task queue. A task queue is a queue that is internal to the Temporal service and is created dynamically when listened on or by dispatching a task to it.

Temporal also supports child workflows. So if any of the operations require multiple calls to other services and are owned by a different team they can be implemented as independent workflows. The task queues are also used to route start workflow requests to different services/processes.

So in your case I would model services A, B and C as workflows as they

make external calls to other systems, too, and additionally, it would be best for these tasks to be be performed on dedicated and optimized A/B/C machines.

Each of these workflows will invoke activities that call to external systems. Then it can have separate pools of activity process per different type of machines. Then use appropriate task queues to route activity invocation requests to those machines.

2 Likes

thank you so much for the lightning-fast reply!