Hi there,
Brand new to Temporal and I’m still working my way through the documentation and videos, so I apologize in advance if this is a stupid question
Here is an outline of my current data pipeline (let’s call it MyDataPipeline), which is a workflow that can take many hours. I am hoping to simplify the structure with Temporal.
- ServiceA is used to pre-process a large amount of input data. This can take 5-45 minutes depending on the amount of data. Once it is done, it enqueues N tasks into a message queue for ServiceB.
- These tasks are processed in parallel by ServiceB, and they can take hours. Once all N of these tasks are done, ServiceB enqueues a task for ServiceC.
- ServiceC performs post-processing that can also take hours.
The business logic for Services A, B, and C all involve calls to external systems, each of which can fail obviously. A few other details:
- We own all 3 services but we do not own all of the external systems that they depend on.
- Services A, B, and C are used by other data pipelines, not just MyDataPipeline.
- The work that is performed by A, B, and C is different enough that ideally it would be performed by workers on different machines that are optimized for their respective purposes.
Conceptually, I think of the business logic of this data pipeline with this (oversimplified) pseudocode:
func InvokeDataPipeline(data []byte) {
serviceBTasks := serviceA.preprocess(data)
for _, sbt := range serviceBTasks {
// each sbt can be processed in parallel, but for simplicity I'm showing it
// sequentially for this pseudocode
serviceB.process(sbt)
}
serviceC.postprocess()
}
This seems like it could pretty easily be transformed with Temporal, where InvokeDataPipeline is the Temporal workflow and each call to services A-C would be an activity. However, as I mentioned, the work performed by services A-C make external calls to other systems, too, and additionally, it would be best for these tasks to be be performed on dedicated and optimized A/B/C machines.
Is Temporal right for this type of pipeline? If so, I’m hoping to understand what type(s) of Temporal pattern(s) are suggested for this sort of use case.
Thank you in advance!