Is having retry for both Worklfow and its activity redundant?

Vikas_NS · September 9, 2021, 8:53am

Questions

I have an activity which makes network call and a workflow that holds this activity. Does having 20 max retries for both the workflow and the activity inside the workflow result in total of 20 x 20 network call? I mean, 20 times of activity failure constitutes 1 workflow failure. So 20 Workflow failure should result in 20 x 20 = 400 retries of network call?
I have 2 activities which make 2 different network calls and a workflow that holds these two activities. Both the activites and the workflow have max 20 retries. In the scenario where the first activity is successfull, and the second one keeps failing, after 20 retries of 2nd activity, the activity gives up . Now when the workflow kicks in for the 2nd time, will the first activity which was successfull be executed again? or is it just the failed activity be executed?
In what situations should we have retries for both workflow and activity?

tihomir · September 9, 2021, 1:34pm

1, 2: On each workflow retry, it is executed from the beginning which means already executed activities will be re-executed on the new run. Generally you do not want to fail your workflows on “intermittent” errors, for example your activity errors which could be fixed via new deployment or code changes/fixes.

3: Generally failing workflow execution is not the best idea. One use case for workflow retries is when you want to retry a workflow on unknown failures. You should however if possible make sure your workflows do not fail on known errors. In the case where you do want to retry on a known failure you should still handle that error in your workflow and have the control how to retry (use continueasnew for example explicitly).

Vikas_NS · September 13, 2021, 8:41am

Tihomir,

I’m finding a bit hard to understand workflow retry.

Please check the attached image.

I’m throwing a excpetion to fail the child workflow.

The child workflow fails and retries.

Below is my child worklfow options

ChildWorkflowOptions workflowOptions =
          ChildWorkflowOptions.newBuilder()
              .setRetryOptions(RetryOptions.newBuilder().setMaximumAttempts(3).build())
              .build();
      GreetingChild child = Workflow.newChildWorkflowStub(GreetingChild.class, workflowOptions);

Questions

Event though I have the max attemps set to 3, the child workflow seems to be retrying more than that. I have added a log just before I throw the exception. This log is appearing more than 3 times.
After the child workflow fails, based on your comments, for the next retry, I was excpecting a new workflow with different run id to be created. But looks like nothing like that happened

tihomir · September 13, 2021, 2:24pm

Unknown exceptions in your child workflow code which are set in setFailWorkflowExceptionTypes are converted to ApplicationFailure. Otherwise the workflow task is failed and your workflow gets blocked (the exception is treated as a bug that can be fixed on a future deployment which is what you are seeing in your test).

If you want to throw checked exceptions not in that list, you can use:

ApplicationFailure.newFailure
or
ApplicationFailure.newNonRetryableFailure

so in the HelloChild sample you are working off you can do in your child workflow:

throw ApplicationFailure.newFailure("Failing child workflow", "simulated");

(note you do not need that @SneakyThrows Lombok annotation)

and in your workflow you could do:

try {
        return child.composeGreeting("Hello", name);
        // return Async.function(child::composeGreeting, "Hello", name).get();
      } catch (ChildWorkflowFailure failure) {
        Workflow.getLogger(GreetingWorkflowImpl.class)
            .error(failure.getCause().getMessage());
        ...
      }

This should honor your ChildWorkflowOptions->RetryOptions and you should see the child workflow being retried 3 times before you catch the error in your workflow.

Child workflow invocations always throw ChildWorkflowFailure and you can get the original failure in the cause.

Also note that workflow retries only apply if you define RetryOptions in your WorkflowOptions.
So if you in your main method of the HelloChild Sample define:

GreetingWorkflow workflow =
        client.newWorkflowStub(
            GreetingWorkflow.class,
            WorkflowOptions.newBuilder()
                .setRetryOptions(RetryOptions.newBuilder().setMaximumAttempts(2).build())
                .setWorkflowId(WORKFLOW_ID)
                .setTaskQueue(TASK_QUEUE)
                .build());

and don’t catch the ChildWorkflowException in your parent workflow code, you should see your workflow being retried ( and have the ContinuedAsNew status as you pointed out).

Vikas_NS · September 20, 2021, 7:53am

Got it.
Thankyou Tihomir.

Could you please share a snippet on ussing setFailWorkflowExceptionTypes ? I want to know on which object I need to call this method.

tihomir · September 20, 2021, 1:16pm

WorkflowImplementationOptions implementationOptions = WorkflowImplementationOptions.newBuilder()
            .setFailWorkflowExceptionTypes(Throwable.class)
            .build();

....

worker.registerWorkflowImplementationType(implementationOptions, <WorkflowClass>.class);

for example if you want to fail the workflow on any exceptions for the particular WorkflowClass

Vikas_NS · November 20, 2021, 11:47am

Hi Tihomir,
What do you mean when you say workflow gets blocked?

Will the retry happen infinitely?
How do I control the retries?

Vikas_NS · November 20, 2021, 2:59pm

Will activity failure exception produce the same effect?

Want to know which are all the default temporal exceptions that produce this behavior.?

tihomir · November 20, 2021, 6:09pm

What do you mean when you say workflow gets blocked?
Will the retry happen infinitely?
How do I control the retries?

Workflow task is retried up to your defined workflow run/execution timeout, at which point the workflow times out.

Will activity failure exception produce the same effect?

Do you mean if an activity inside the child workflow fails? Yes if it fails, and after the activity exhausted its retries, it will be converted to ActivityFailure. This error since it happened inside a child workflow can be caught inside your workflow code as a ChildWorkflowFailure, that includes ActivityFailure as its cause, which will include the original exception (let’s say a NPE) as it’s cause.

Want to know which are all the default temporal exceptions that produce this behavior.?

All classes that extend TemporalFailure:

TimeoutFailure
ActivityFailure
ApplicationFailure
CancelledFailure
ChildWorkflowFailure
ServerFailure
TerminatedFailure

Vikas_NS · November 21, 2021, 1:43pm

How about frequency? I don’t want it to retry very frequently and use up resources.? Is it possible to control that?

maxim · November 21, 2021, 8:29pm

How about frequency? I don’t want it to retry very frequently and use up resources.? Is it possible to control that?

Currently, it is retried every workflow task timeout. We do have plans to add exponential retry options to this scenario.

Topic		Replies	Views
Child workflow call on failed activity Community Support retries , activity	4	1109	October 27, 2020
Replay of Workflow.retry() calling an Activity Community Support retries	3	851	August 13, 2021
Workflow keeps retrying until max when the Exception is in FailWorkflowExceptionTypes Community Support	3	278	June 28, 2023
Retry all workflow logic in `Workflow.retry` Community Support java-sdk , retries	11	3090	August 19, 2021
Childworkflow retry not workign Community Support java-sdk , child-workflow	2	453	June 20, 2023

Is having retry for both Worklfow and its activity redundant?

Related topics