Splitting one message into couple smaller ones (Splitter + Aggregator)

aleksa.jovanovic · September 16, 2023, 3:15pm

Hello there. Currently i am developing app that needs to process one big message that is time consuming (can take days to process). I came up with the idea to split that one message into smaller ones and distribute that smaller ones among workers so they can be processed in a parallel (to speed up the processing). I also must mention that the number of small messages depends on how much one big message is, also i need the feature where i can track their proggress and check if all of them are successfully finished (some kind of transaction). I came across on this wonderful project, so here i am asking if it is logical to apply temporal.io in this situation. I assume that this is the analogy:

Big message = Main workflow (have a task to track child workflows, calculate how much child workflows to start, publish message when whole process is done)

Small messages (splitted from big message) = Child workflows with their own activities

What do you think? is this logical to you?

Thanks,

Aleksa

maxim · September 16, 2023, 6:58pm

There is no “once size fits all” solution. It depends on your specific requirements. Depending on your requirements processing of each record in the file as a child workflow might be a very good fit or overkill.

In general batch processing workflows either process a file (or its chunk) using a single activity that iterates over records and processes them directly. This activity heartbeats and includes a last processed record id into the heartbeat. On retry, the activity loads the last processed record id from the last recorded heartbeat and continues.

If a single record processing is nontrivial then a child workflow per record does make sense. This can be done by loading a range of records from an activity and then processing it as child workflows. Continue-as-new should be used to ensure that the parent doesn’t exceed the history size limit.

See the correspondent batch samples from the samples-java repository.

Here is the sliding window sample in Go.

aleksa.jovanovic · September 16, 2023, 9:45pm

Hello maxim, thanks for the response. That’s what i was looking for!

I’ve been thinking to do processing with some kind of pagination. In my case i think it would be an overkill to create a child workflow for each record. Instead i would create “parent workflow” that would calculate and split big list of records into smaller sets and for the each set i would start a “child workflow” in parallel. Amazing java samples.

Thanks,

Aleksa

Topic		Replies	Views
Problem with synchronisation between workflows Community Support java-sdk , child-workflow	2	26	November 12, 2024
Processing partial / streaming result from activity Community Support go-sdk	2	336	August 4, 2024
Best Practices for Implementing a Workflow to Process Millions of Files Concurrently with Heavy Child Workflow Activities Community Support go-sdk	10	613	August 11, 2024
Will temporal fit my scenario Community Support	7	65	February 5, 2025
Child Workflow creation . . Community Support java-sdk , child-workflow	5	1211	October 22, 2020

Splitting one message into couple smaller ones (Splitter + Aggregator)

Related topics