I’m working with a Temporal workflow that needs to execute a large number of activities (around 40,000) to move data. To manage this, I’m considering a design where a parent workflow starts four child workflows, each responsible for about 1,000 activities. To avoid exceeding Temporal’s limits, each child workflow would use “Continue As New” after executing 1,000 activities, effectively splitting the work into smaller chunks. Each child workflow will likely do 100 continue as new each
I’d appreciate any advice on this approach, particularly regarding potential implementation issues or whether there might be a more efficient way to handle this volume of activities in Temporal. Thank you
There are three main approaches to implement batch jobs using Temporal.
Heartbeating Activity
When processing of each record is simple and you don’t expect complex error handling, a single activity that iterates over a data set in a loop and processes each record inline. To avoid re-processing the whole dataset on process/host crashes, the progress of the activity can be recorded as part of the heartbeat details parameter. When an activity is retried after a failure, it can retrieve the latest recorded heartbeat details and continue processing.
If the processing of each record is not as simple and requires an activity or even child workflow invocation per record, the Iterator Workflow pattern can be used.
A workflow uses an activity to load a page of records to process.
If there a no records to process the workflow completes.
A records are processed sequentially or in parallel
After records are processed the workflow calls continue-as-new passing the next page offset/token as a parameter to the next workflow.
A child workflow frequently implements this pattern. This relies on the fact that parent is not notified when a child calls continue-as-new. It only learns about the completion of the last workflow in the child chain. It is common to start multiple such children to partition processing of a large dataset.
The limitation of the Iterator Workflow pattern that it cannot start processing the next page of records until all the processing of the current page is done. This is not optimal if some records can take a long time to process. The Sliding Window approach solves this proboem by always running a fixed number of parallel child workflows. As soon as a child workflow completes a new one is started.
A child’s workflow completion is not reported to a parent after it is called continue-as-new. So this pattern cannot rely on the standard approach of waiting for a child completion. Instead, each child sends a signal to its parent (using only workflow ID) to notify it about its completion. The signal is received by the currently running parent even if the previous parent execution in the continue-as-new chain started the child.