Option 1 Pro’s & Cons
- Con: Cannot process multiple objects in parallel
You can run multiple parallel activities for different parts of the file. For example for a large file stored in S3 you could download parts of it independently to different hosts.
Option 3 Pro’s & Cons
3. Con: cannot take advantage of any bulk operations e.g. bulk saving of objects.
There are ways to buffer events across multiple activity invocations. For example, accumulate results from many activities on a worker and complete all of these activities asynchronsouly after a bulk operation is done.
Scenarios
Option 1
This is the simplest approach if processing each record of the file is simple and short-lived. An activity implementation can process multiple records in parallel if it helps to speed things up.
Option 2
This approach is still useful if the size of the file is bounded as it is the simplest one.
There is a variation of this approach that works with files of unlimited size which I call iterator workflow. The idea is to process a part of the file and then call continue as new to continue processing. This way each run of a workflow that processes a range of records is bounded in size. This approach also works if each record requires a child workflow for processing.
Option 3
The child workflow per record option is needed if each record requires independent orchestration which can take an unpredictable amount of time. If the number of records in the file is large the options are either use the iterator workflow approach I described in Option 2 or use hierarchical workflows. For example, a parent workflow with 1000 children and each of them with 1000 children allows starting 1 million workflows without hitting any worklfow size limits.