Hi Temporal Community,
I’m new to Temporal, and I’m having trouble figuring out the best way to restart an entire Workflow from the beginning using the Python SDK.
TL;DR
How do I restart a whole Workflow in the Python SDK under certain circumstances that are not related to intermittence?
Background
I have a Workflow that currently contains two Activities:
-
Make an HTTP request to a REST API to kick off a long-running job. The API returns a job ID that I can use to query the status.
-
Periodically query the API for the status of the job.
In step 2, the job’s status can be one of: ["pending", "running", "successful", "failed"]
.
Temporal-izing It
If I were not using Temporal, I would accomplish step 2 by repeatedly polling the API for the job’s status using exponential backoff and retries via the backoff
package from PyPI. But, one of the promises of Temporal is (optionally) unlimited retries in the face of intermittence.
So, instead of using my own retry logic, I’ve written the Activity so that it hits the API and raises an Exception
unless the job’s status is either “successful” or “failed”, because “pending” and “running” mean the job is not done yet. This technique works, and it’s really nice! The worker will keep retrying my Activity as it transitions from “pending” to “running” to “successful”. Every failure along the way triggers a retry for the Activity as expected. Once it’s done, the Workflow is marked as completed.
But, my problem is the “failed” case. If the job on the remote service fails, then both Temporal Activities are done, since the job has been launched in step 1 and no more polling is needed for step 2. But, my overall Workflow should be considered a failure because the job that launched in step 1 ended up failing for reasons outside of my control. For the system in question, that means I need to try the whole Workflow over, repeating both Activities to launch a new job via the same API and check its status.
Alternatives I’ve Tried
My first thought was that I could package both steps 1 and 2 into a single Activity. After all, they’re kind of atomic in the sense that if one or the other fails, the whole thing should be retried from the start. My problem with that approach is that I will lose my automatic polling in step 2. Rather than letting Temporal repeatedly query the API until a desired state is reached, I would have to program that polling myself with some kind of exponential backoff-- diminishing the purpose of me using Temporal in the first place.
So, my next thought was that I could retry a whole Workflow rather than just a single Activity.
Workflow Retry Best Practices?
Based on my search of this forum and the Temporal docs for the Python SDK, there seem to be a couple of options for restarting Workflows:
- Custom Retry Policies can be used to retry whole Workflows, but the docs suggest that this should be used only rarely. Plus, the Retry Options don’t seem very configurable (e.g. can’t specify from which point to start over).
- This post talks about using a “Reset” function of some sort, but the answer only contains examples for the TypeScript, Go, and Java SDKs, and I can’t find anything about a Reset in the Python SDK docs.
Questions
-
Is my use of Activities idiomatic? For example, when I raise an
Exception
instead of using some kind of homegrownwhile
loop for repeatedly polling an API, is that an intended use case for Temporal? -
Should my two steps be squeezed into one Activity? If so, what happens to my nice, automatic retry logic in step 2?
-
Are there any alternatives approaches you can recommend for this situation (using the Python SDK specifically)?
Thanks, and sorry for the long post!