Activity retries and alerting

Nathan · November 26, 2021, 6:13pm

Hi!

I spent quite some time for the last few months to learn about Temporal, and now I’m planning to deploy it, I’ve got some questions about retrying an activity. I read different topics without finding an answer, so here I come !

Let’s say I’ve a simple workflow, which starts an Activity A.
This Activity A just calls an external http API.

From what I understand from the different guidelines I read, the retry policy for the activity A should be “no max retry”, and if the external API is down, we will just retry until it’s up. This approach looks good to me.

But how can I be alerted if there is too many retries for the activity ? Is this something I should handle on my own ?

I want to easily know when the activity is failing and retrying indefinitely, to know if the external API is down, or if I made a mistake on the URL I’m calling, etc.
I’m looking for a way to list “all workflows where an activity has been retried more than 5 times” for example.

Setting up a MaxRetry policy for Activity A to 5 would answer this usecase, because I can easily list failed workflows, but it does not seem to be the best approach from what I read.

The goal is obviously to know when a strange behaviour is happening during an activity execution and look for the root cause.

Thanks !

tihomir · November 27, 2021, 12:50am

For a single wf, you can lookg at the web-ui summary page for a particular workflow. Information under “Pending Activities” includes the activity type, retry attempt count, as well as the last failure info.

same with tctl “desctribe” command, for example:

tctl wf desc -w <my_workflow_id>

Note that you can get the retry attempt inside your activity code as well, for example using Java SDK:

Activity.getExecutionContext().getInfo().getAttempt();

For all workflows in a namespace, you can use the sdk client api to get all workflows who have pending activities with retries > X, for example:

private static void getActivitiesWithRetriesOver(int retryCount) {
        ListOpenWorkflowExecutionsRequest listOpenWorkflowExecutionsRequest =
                ListOpenWorkflowExecutionsRequest.newBuilder()
                        .setNamespace(client.getOptions().getNamespace())
                        .build();

        ListOpenWorkflowExecutionsResponse listOpenWorkflowExecutionsResponse =
                service.blockingStub().listOpenWorkflowExecutions(listOpenWorkflowExecutionsRequest);
        for(WorkflowExecutionInfo info : listOpenWorkflowExecutionsResponse.getExecutionsList()) {
            DescribeWorkflowExecutionRequest describeWorkflowExecutionRequest =
                    DescribeWorkflowExecutionRequest.newBuilder()
                            .setNamespace(client.getOptions().getNamespace())
                            .setExecution(info.getExecution()).build();
            DescribeWorkflowExecutionResponse describeWorkflowExecutionResponse =
                    service.blockingStub().describeWorkflowExecution(describeWorkflowExecutionRequest);
            for(PendingActivityInfo activityInfo : describeWorkflowExecutionResponse.getPendingActivitiesList()) {
                if(activityInfo.getAttempt() > retryCount) {
                    System.out.println("Activity Type: " + activityInfo.getActivityType());
                    System.out.println("Activity attempt: " + activityInfo.getAttempt());
                    System.out.println("Last failure message : " + activityInfo.getLastFailure().getMessage());
                    // ...
                }
            }
        }
    }

tihomir · November 27, 2021, 1:06am

Yes you should rely on timeouts rather than RetryOptions->maximumAttempts. By default your retries will happen up to the activity ScheduleToCloseTimeout, if defined, if it’s not defined, they can retry up to the workflow run/execution timeout. If that is also not defined, then the retries are “unlimited”.

You can control what types of failures cause retries or not as well. You specify which failures should not cause retries by adding them in ActivityOptions->RetryOptions->DoNotRetry. For example if you do not want your activity to retry on IllegalArgumentException:


ActivityOptions.newBuilder()
  .setRetryOptions(RetryOptions.newBuilder()
  .setDoNotRetry(IllegalArgumentException.class.getName())
  .build())
.build());

Another option is to throw a non retryable application failure inside your activity, created via ApplicationFailure.newNonRetryableFailure.

With that, along with ability to get the retry attempt inside activity code, you could, depending on your business logic control at what point retries should stop, and can perform compensation logic inside your workflow or whatever you need to do.

Having automatic retries in the end is super helpful, as you can change your activity method code, and its activity options (and restart worker) to fix errors without breaking workflow determinism.

Nathan · November 30, 2021, 11:14am

Thanks for your answer!

Nathan · November 30, 2021, 2:40pm

Just a small translation of your response in go (as it’s the language I’m using):

	openWorkflows, err := client.ListOpenWorkflow(context.Background(), &workflowservice.ListOpenWorkflowExecutionsRequest{
		Namespace: "default",
	})
	if err != nil {
		log.Fatalln("fail to list open workflows", err)
	}

	for _, openWorkflow := range openWorkflows.GetExecutions() {
		describe, err := client.DescribeWorkflowExecution(context.Background(), openWorkflow.Execution.WorkflowId, openWorkflow.Execution.RunId)
		if err != nil {
			log.Fatalln("fail to descibe workflow", err)
		}

		for _, pendingActivity := range describe.GetPendingActivities() {
			log.Println(pendingActivity.GetAttempt(), pendingActivity.GetActivityType().Name, pendingActivity.GetLastFailure().Message)
		}
	}

tihomir · November 30, 2021, 3:50pm

Nice! Much less verbose indeed

kishore_kumar · July 10, 2023, 7:39am

Hey, Can you please help me do the same(getting no.of retries inside activity code) using typeScript?
Thankyou.

antonio.perez · July 11, 2023, 12:26pm

Hello @kishore_kumar

in typescript you have client.workflowService.describeWorkflowExecution that return a DescribeWorkflowExecutionResponse that contains pendingActivities. For each pending activity you can get attempt

Antonio

Raul-Ronald · November 27, 2023, 3:27pm

This seems to make a separate request for each workflow - at what point does this become a problem/hit rate limits?

Topic		Replies	Views
How to get retry policy within activity Community Support go-sdk , activity	2	420	May 21, 2024
Activity retries in the workflow replay Community Support java-sdk	3	1349	March 18, 2021
Activity Retries up to Max Attempts(3), but workflow logs show 60 Activity Timeouts Community Support java-sdk , general-impl	0	324	September 11, 2023
Redefine Max Retries for Activity Community Support design , activity	2	857	September 23, 2022
Activities Retries notified Workflow? Community Support java-sdk	3	573	April 11, 2022

Activity retries and alerting

Related topics