What to do when an activity cannot proceed without re-running previously completed activities?

sunyizhe · December 18, 2024, 2:35am

Suppose I have a file-processing workflow with three activites A → B → C. Each activity writes their results to the disk and pass file paths to the next activity. However, suppose the worker crashes during activity C, and now the files from activity B is gone. Temporal will retry activity C, but it cannot proceed without the files. What now?

I’m considering the following options:

Fail the entire workflow by throwing a non-retryable exception in activity C, then enable retry on the workflow. But this forces retries even when activities are timing out.
Trigger a reset. This also restarts the workflow from the start, but the python sdk doesn’t have good support for it.

awwx · December 18, 2024, 10:53am

Have your workflow handle the logic of running B again if C doesn’t have the files it needs.

Resetting a workflow is for when a bug in your workflow code is preventing the workflow from running. It isn’t something you want to use when the workflow is running correctly because you lose the state of the workflow.

Having Temporal do the activity retry login for you is a convenience: you always have the option to implement the retry logic in your workflow if you wanted to.

If the retry logic you need to implement is more complicated than what Temporal handles for you, you can implement it in the workflow instead.

sunyizhe · December 18, 2024, 3:46pm

Have your workflow handle the logic of running B again if C doesn’t have the files it needs.

This gets more difficult as the workflow gets more complex. In actual usage, my workflows tend to look like DAGs – activity A depends on B, which depends on C and D, also C depends on E, etc. Re-running an activity means re-running all of its transitive dependencies in the correct order.

I could indeed implement all the retry logic by myself, but I was hoping Temporal could help me out somewhat.

maxim · December 18, 2024, 4:18pm

I would run A->B->C in a retry loop.

awwx · December 20, 2024, 8:13am

Well, in Temporal ideally activities would be idempotent. If activities saved their output to durable storage (S3, persistent disk, etc), then once for example “B” had run it would be done: C could run multiple times and still have the data from B each time. If you’re not able to do that you’re probably going to have to orchestrate something yourself, would be my guess.

Just a thought, perhaps each activity could return which files it was missing when it wasn’t able to run. You’d run “C” first, it would report back “I don’t have files from B”; so you’d run B next (and it would say “I don’t have files from A”, etc)… after you had run the dependencies, you’d run C again, and maybe the worker had crashed at some point and didn’t have the data, and C would report again “I need files from B”.

This way you’d handle the DAG of dependencies but it would be driven by the activities instead of having to keep track yourself which files might have been lost if a worker crashes.

Topic		Replies	Views
Activities in Python Community Support python-sdk , activity	3	402	March 12, 2024
Workflow Retry - Workflow should skip Activities which are successful in previous run Developer Corner java-sdk	8	1081	October 18, 2024
Workflow should be in running status if activity failed due to retry exhausts Community Support	1	21	June 20, 2025
Handling failure/success activities scheduled in parallel on reset/new run Community Support java-sdk , async , activity	8	238	May 22, 2024
Is it possible to resume from a failed state instead of starting over? Community Support	14	3991	December 15, 2020

What to do when an activity cannot proceed without re-running previously completed activities?

Related topics