Using temporal for producing and consuming data

Hi,

Planning to design a temporal workflow that has following high-level steps

  1. Call Activity 1 to determine items

  2. Asynchronously call Child workflow to process these items.
    This task is complex and there are several millions of items. Internally we are sharding these items and at any given point of time, there are several shards processing different set of items.

    Now as soon as a shard finishes, we signal to workflow (other shards will still be running)

  3. Acknowledge signal and call Activity 3 to process first shard results. Meanwhile there can be other shards finishing in the child workflow and the parent workflow will start receiving all these signals.

public class WorkflowImpl implements SampleWorkflow {
    private boolean isResultsPublished;
    private String shardID;

    @Overrride
    public void runWorkflow() {
      String path = activity1.determineItems();
      Async.procedure(childWorkflow.processItems());
      while(true) {
            Workflow.await(() -> isResultsPublished);
            isResultsPublished = false;
            activity3.publishResults(shardID);
       }
   }

   @Override
  public void signalShardCompletion(boolean flag, String shardID) {
   this.isResultsPublished = true;
   this.shardID =. shardID;
  }
}

We would like to process these shards as efficiently as possible i.e, as and when a shard finishes we want to get started with the processing.

Question 1:
Will this approach work? I am bit skeptical as to how temporal will behave as there will be huge no. of shards sending signals at any given point of time.


I thought of another approach as well having Child Workflow publish shard completion events to kafka and have another workflow consume these events and call activity 3 to publish results shard wise.

Question 2:
Which approach out of these 2 would you recommend ? Other than these 2, any better ways of handling this?

A single workflow execution has limited size and throughput. Each activity result is also limited to 2mb.

Please check the batch samples to get an idea of how large datasets can be processed efficiently.