Say I’ve a big DSL Graph which the temporal workflow executes. I want to push the state of this to S3 every time the graph makes progress.
- Do it in the workflow: Once the activity returns (eg.
result = await execute_activity(...)
update the state in memory, serialize it (around 2MBs) and callboto3_s3.put_object
.
a. Problem: Workflow doesn’t allow this as it breaks determinism on replay.
b. Possible Solution: run the S3 upload underwith temporalio.workflow.unsafe.sandbox_unrestricted():
iftemporalio.workflow.unsafe.is_replaying()
isFalse
, otherwise NoOp. - Do it in the activity: Workflow updates the state, serializes it and passes to an activity (local or otherwise). Activity is free to do whatever it wants.
a. Problem: Prohibitive history size. Since anything workflow passes to an activity is recorded in history, this will grow to a prohibitive size soon. - Do it in the activity but use a query: Workflow triggers an activity, which instead of receiving the state as input, issues a query to the workflow. The workflow then calculates the state (serializes it) and passes to the activity. Activity can do whatever it wants with it.
a. Problem: Feels a bit roundabout and there’s a limit to gRPC response size (from the query)? If this say was 10 MB it probably wouldn’t work? (Not sure). Any other drawback (I’m not very familiar with queries)?
Which is the preferred option in this case? Or are there any other options?