Hello there. Currently i am developing app that needs to process one big message that is time consuming (can take days to process). I came up with the idea to split that one message into smaller ones and distribute that smaller ones among workers so they can be processed in a parallel (to speed up the processing). I also must mention that the number of small messages depends on how much one big message is, also i need the feature where i can track their proggress and check if all of them are successfully finished (some kind of transaction). I came across on this wonderful project, so here i am asking if it is logical to apply temporal.io in this situation. I assume that this is the analogy:
Big message = Main workflow (have a task to track child workflows, calculate how much child workflows to start, publish message when whole process is done)
Small messages (splitted from big message) = Child workflows with their own activities
What do you think? is this logical to you?
There is no “once size fits all” solution. It depends on your specific requirements. Depending on your requirements processing of each record in the file as a child workflow might be a very good fit or overkill.
In general batch processing workflows either process a file (or its chunk) using a single activity that iterates over records and processes them directly. This activity heartbeats and includes a last processed record id into the heartbeat. On retry, the activity loads the last processed record id from the last recorded heartbeat and continues.
If a single record processing is nontrivial then a child workflow per record does make sense. This can be done by loading a range of records from an activity and then processing it as child workflows. Continue-as-new should be used to ensure that the parent doesn’t exceed the history size limit.
See the correspondent batch samples from the samples-java repository.
Here is the sliding window sample in Go.
Hello maxim, thanks for the response. That’s what i was looking for!
I’ve been thinking to do processing with some kind of pagination. In my case i think it would be an overkill to create a child workflow for each record. Instead i would create “parent workflow” that would calculate and split big list of records into smaller sets and for the each set i would start a “child workflow” in parallel. Amazing java samples.