I want to use Temporal for a data ingestion pipeline. The data is provided in an excel file. The file can contain up to 10000 entries. The parent workflow will parse the file. For each entry, a child workflow will be created that will do the end-to-end ingestion for that specific entry. All these child workflows will run in parallel.
What would be the most efficient way to implement this? Browsing the forum I found two options:
-
Iterator workflow pattern - Looks like this does not allow us to start all the 10000 child workflows in parallel since we read a range of entries in each iteration. Is this understanding correct? Also, is there any example of Java SDK implementation of this pattern?
-
Tree of workflows - We can create 100 child workflows in level 1, each of which will have 100 child workflows at level 2, thus making a total of 10000 child workflows at leaf level. I can also execute them all in parallel. So I prefer to use this approach. Is there any drawback of this strategy?