Retrying a workflow for a specific error scenario

I want to retry my workflow in only one specific error scenario, for which I am raising a custom error. However, it looks like the documentation provides the inverse of it, which means that if I provide a RetryPolicy , I need to specify the errors for which I do not need to retry the workflow, not the other way. Is there anyway around this apart from encapsulating all my errors and provide them in the NonRetryableErrorTypes list? Also, What if there are temporal internal errors for which I do not want to retry the workflow but exit it? Is there a list of ErrorTypes that I need to add to the array?

2 Likes

It is a pretty philosophical question. What is the purpose of workflow retry? Workflows don’t fail due to intermittent infrastructure issues. And intermittent in the workflow timeline can be days. So the main reason for workflow failures are unknown bugs in the code. And for some workflows retrying on unknown bugs makes sense.

Thus the retry policy is designed to retry any unknown error and allow to not retry configured list of known errors.

I don’t know your use case, but in general letting workflows fail and retrying it is not a good idea. I know it is not something that comes naturally. We are all wired to retry on error working with unreliable request/reply services for ages. But workflows should be written in a way that they don’t fail on any known failures. It is usually achieved through unlimited retries of individual activities.

If you really want to retry on a known error then bake this logic into your workflow code. One approach is to wrap your main workflow function in a code that calls continue as new if the function returned a specific error.

What you said totally makes sense.

Our use case revolves around triggers for updates in the database, which starts the workflow and completes an activity by making third party api calls and inserts the value back in the database.
We want to make sure that the workflow retries through an exponential backoff only if we hit rate limits or encounter failures to write to the database(which is very unlikely), but otherwise exit gracefully (with either a success or a failure that is recorded in the database).
I’ll checkout the Continue as New option as well. Thanks for the immediate reply and giving your thoughts on this.

I don’t think in your use case you need workflow retries.

I would recommend attaching a retry policy to the DB update activity and keep retrying it until the DB call goes through.

I would also recommend running this activity on a separate task queue and use task queue rate limiting to ensure that you don’t hit the database with a rate higher than configured.

Hi Maxim,

I should have been more clear, I meant rate limits for the third party API calls that we are using. The API would return errors that might fall into two categories, errors that can be retried (eg, unexpected error, crawl failed, request timed out) and errors that should not be retried (eg, bad request). We’d like the workflow/activity to retry based on these error codes that are returned by the third party API.

Thanks

If you use the Temporal task queue rate limiting feature you can ensure that you don’t call the third party API above configured limit.

An activity can mark an error (or Exception in case of Java SDK) as non retryable looking into the error code returned by the third party.

Thanks! That should solve the problem. I’ll check it out

@maxim Is there any example I can follow for the second part -
“An activity can mark an error (or Exception in case of Java SDK) as non retryable looking into the error code returned by the third party.”

@humblefool In Go return error created through temporal.NewNonRetryableApplicationError and in Java throw an exception created through ApplicationFailure.newNonRetryableFailure.

Hi @maxim,

I have tried throwing “ApplicationFailure.newNonRetryableFailure” from my activity for specific exceptions but I see the activity is still retried as specified in retryOptions. Am I missing something?

I assume that you are using Java SDK. I modified a sample to throw ApplicationFailure.newNonRetryableFailure and it wasn’t retried.

Could you create a reproduction of your problem and file an issue?