Using temporal for producing and consuming data

Prats · August 8, 2023, 8:38am

Hi,

Planning to design a temporal workflow that has following high-level steps

Call Activity 1 to determine items
Asynchronously call Child workflow to process these items.
This task is complex and there are several millions of items. Internally we are sharding these items and at any given point of time, there are several shards processing different set of items.

Now as soon as a shard finishes, we signal to workflow (other shards will still be running)
Acknowledge signal and call Activity 3 to process first shard results. Meanwhile there can be other shards finishing in the child workflow and the parent workflow will start receiving all these signals.

public class WorkflowImpl implements SampleWorkflow {
    private boolean isResultsPublished;
    private String shardID;

    @Overrride
    public void runWorkflow() {
      String path = activity1.determineItems();
      Async.procedure(childWorkflow.processItems());
      while(true) {
            Workflow.await(() -> isResultsPublished);
            isResultsPublished = false;
            activity3.publishResults(shardID);
       }
   }

   @Override
  public void signalShardCompletion(boolean flag, String shardID) {
   this.isResultsPublished = true;
   this.shardID =. shardID;
  }
}

We would like to process these shards as efficiently as possible i.e, as and when a shard finishes we want to get started with the processing.

Question 1:
Will this approach work? I am bit skeptical as to how temporal will behave as there will be huge no. of shards sending signals at any given point of time.

I thought of another approach as well having Child Workflow publish shard completion events to kafka and have another workflow consume these events and call activity 3 to publish results shard wise.

Question 2:
Which approach out of these 2 would you recommend ? Other than these 2, any better ways of handling this?

maxim · August 8, 2023, 5:45pm

A single workflow execution has limited size and throughput. Each activity result is also limited to 2mb.

Please check the batch samples to get an idea of how large datasets can be processed efficiently.

Topic		Replies	Views
Using temporal for scalability within an activity Community Support java-sdk	7	944	May 24, 2023
Child-workflows + Signals Community Support java-sdk	7	3667	October 16, 2021
Temporal IO running activities of a workflow outside the parent workflow JAR Community Support java-sdk , workflow-implementat	17	231	June 25, 2024
Workflow orchestration for data pipeline activities Community Support	8	3422	July 12, 2022
Parallell processing should be activities or workers? Community Support	2	479	August 24, 2023

Using temporal for producing and consuming data

Related topics