Activity 1: get a large list of products
Activity 2-N: given the given the large list of products, do something.
However, I realize this will grind up the event history storage.
Since activity 2-N are operating on the same large list, is there another (temporal native) way they can get the list without passing it in through the params?
Pass this list through some external system like Redis or S3.
Cache the list at a host (or process memory) and route activities to that host. If the host goes down, retry the whole sequence from the beginning. See the fileprocessing sample for details.
If I go with option 1 - what would be the recommended way to handle the workflow being replayed and the list being cleared from the external system. In this scenario, Activity 1 which initially get and set the list in the external system is successful so wont be replayed (if my understanding is correct).
Would it be:
to have each subsequent activity get and set the list if not present in the external system?
Can I have a “get if not set” activity which is always replayed?
A way to specify the workflow needs replayed from the beginning for this case?
For (3) you will hit the single activity result limit of 2mb. Also there is gRPC request limit.
For 1 you can have an activity at the end of your workflow that cleans up the data from the external system. I’m not sure how it is related to workflow replay which doesn’t reexecute activities.
I think I did not convey my concern well, it was not about cleanup but how best to handle something going wrong with the external state, like the value not being found in the redis cache in a subsequent activity.
If I can guarantee the list does not exceed the activity result limit or gRPC request limit, would pulling the list out of the event history with GetWorkflowHistory go sdk call and caching it in process memory be a sound approach?
The advantages being
Do not inflate event history by passing in to subsequent activity params
No need for s3 or redis. Not that setting them up is hard but it is nice to use a single transactional system for the state
No need to only route to single worker, each worker can pull the list out of event history one time then keep it in process
Do I misunderstand the benefits/are there gotchas?