Retrying a workflow for a specific error scenario

ashwani · July 24, 2020, 3:37pm

I want to retry my workflow in only one specific error scenario, for which I am raising a custom error. However, it looks like the documentation provides the inverse of it, which means that if I provide a RetryPolicy , I need to specify the errors for which I do not need to retry the workflow, not the other way. Is there anyway around this apart from encapsulating all my errors and provide them in the NonRetryableErrorTypes list? Also, What if there are temporal internal errors for which I do not want to retry the workflow but exit it? Is there a list of ErrorTypes that I need to add to the array?

maxim · July 24, 2020, 3:50pm

It is a pretty philosophical question. What is the purpose of workflow retry? Workflows don’t fail due to intermittent infrastructure issues. And intermittent in the workflow timeline can be days. So the main reason for workflow failures are unknown bugs in the code. And for some workflows retrying on unknown bugs makes sense.

Thus the retry policy is designed to retry any unknown error and allow to not retry configured list of known errors.

I don’t know your use case, but in general letting workflows fail and retrying it is not a good idea. I know it is not something that comes naturally. We are all wired to retry on error working with unreliable request/reply services for ages. But workflows should be written in a way that they don’t fail on any known failures. It is usually achieved through unlimited retries of individual activities.

If you really want to retry on a known error then bake this logic into your workflow code. One approach is to wrap your main workflow function in a code that calls continue as new if the function returned a specific error.

ashwani · July 24, 2020, 4:12pm

What you said totally makes sense.

Our use case revolves around triggers for updates in the database, which starts the workflow and completes an activity by making third party api calls and inserts the value back in the database.
We want to make sure that the workflow retries through an exponential backoff only if we hit rate limits or encounter failures to write to the database(which is very unlikely), but otherwise exit gracefully (with either a success or a failure that is recorded in the database).
I’ll checkout the Continue as New option as well. Thanks for the immediate reply and giving your thoughts on this.

maxim · July 24, 2020, 4:16pm

I don’t think in your use case you need workflow retries.

I would recommend attaching a retry policy to the DB update activity and keep retrying it until the DB call goes through.

I would also recommend running this activity on a separate task queue and use task queue rate limiting to ensure that you don’t hit the database with a rate higher than configured.

ashwani · July 24, 2020, 4:29pm

Hi Maxim,

I should have been more clear, I meant rate limits for the third party API calls that we are using. The API would return errors that might fall into two categories, errors that can be retried (eg, unexpected error, crawl failed, request timed out) and errors that should not be retried (eg, bad request). We’d like the workflow/activity to retry based on these error codes that are returned by the third party API.

Thanks

maxim · July 24, 2020, 4:33pm

If you use the Temporal task queue rate limiting feature you can ensure that you don’t call the third party API above configured limit.

An activity can mark an error (or Exception in case of Java SDK) as non retryable looking into the error code returned by the third party.

ashwani · July 24, 2020, 4:39pm

Thanks! That should solve the problem. I’ll check it out

humblefool · September 10, 2020, 6:10am

@maxim Is there any example I can follow for the second part -
“An activity can mark an error (or Exception in case of Java SDK) as non retryable looking into the error code returned by the third party.”

maxim · September 10, 2020, 2:35pm

@humblefool In Go return error created through temporal.NewNonRetryableApplicationError and in Java throw an exception created through ApplicationFailure.newNonRetryableFailure.

humblefool · September 23, 2020, 2:35pm

Hi @maxim,

I have tried throwing “ApplicationFailure.newNonRetryableFailure” from my activity for specific exceptions but I see the activity is still retried as specified in retryOptions. Am I missing something?

maxim · September 23, 2020, 3:28pm

I assume that you are using Java SDK. I modified a sample to throw ApplicationFailure.newNonRetryableFailure and it wasn’t retried.

Could you create a reproduction of your problem and file an issue?

Benny_Bottema · March 4, 2021, 3:00pm

Continuing on the informative discussion on manually retrying here, what about forcing Temporal to do a retry outside previously defined retry policies?

Say I want to short circuit the current back-off cycle and have a flow retried immediately, because we fixed a known infra issue, is there a way to do that? Or do I have to tell my business owner to wait until Temporal is ready to retry? Ideally I would’ve liked a Retry button next to the Terminate button in the server dashboard.

madhu · March 12, 2021, 5:36am

One way i can think of achiving what you want is to reset the worflow to the previous completed event (through apis)

maxim · March 13, 2021, 6:39pm

This is not supported, but we have this feature in our backlog. Currently, I recommend setting RetryOptions.maxRetryInterval to a reasonable value. This allows timely retries even in the case of a prolonged outage.

madhu · March 15, 2021, 9:06am

Hi maxim, do you mean

ResetWorkflowExecutionRequest reset = ResetWorkflowExecutionRequest.newBuilder()
.setNamespace(nsconfig.getDefaultNamespace()).setWorkflowExecution(exectuionInfo.getExecution())
.setWorkflowTaskFinishEventId(eventId).setReason(reason).build();

		ResetWorkflowExecutionResponse response = workflowClientFactory
				.resetWorkflow(nsconfig.getDefaultNamespace(), reset);

Wont work? the eventId can be last completed event, so that one can force a manual trigger from specific step onwards.

maxim · March 16, 2021, 6:08pm

@madhu I don’t understand the last question. resetWorkflow works given that eventId is correct.

madhu · March 19, 2021, 9:43am

Yes, my suggestion is cant we use teh resetworkflow(with last completed event id) to achive what @Benny_Bottema is looking for (i.e. continue /retry from a specific step)

maxim · March 19, 2021, 3:53pm

Yes, it is possible to use reset to retry workflow from a specific point.

I don’t recommend using reset unless you are trying to work around some bug. In expected failure situations, activity retries should be used to ensure that workflow never fails.

RashmiPant · February 14, 2024, 6:26am

Hi @maxim ,

We have a workflow where we have implemented retrial policies for all the activities. We have a use case where we have to retry on customer request outside the retrial policies. What do you suggest will be best way to accomplish this.

maxim · February 14, 2024, 4:45pm

Would you elaborate on what “retry on customer request outside the retrial policies” means?

Topic		Replies	Views
Non retryable activity Community Support	6	468	June 28, 2024
Different retry options for different kinds of errors Community Support go-sdk	9	674	May 22, 2024
Retry all workflow logic in `Workflow.retry` Community Support java-sdk , retries	11	3106	August 19, 2021
Workflow Retry policy seems to not be getting respected Community Support python-sdk	4	765	February 29, 2024
Is having retry for both Worklfow and its activity redundant? Community Support java-sdk	10	1162	November 21, 2021

Retrying a workflow for a specific error scenario

Related topics