I spent quite some time for the last few months to learn about Temporal, and now I’m planning to deploy it, I’ve got some questions about retrying an activity. I read different topics without finding an answer, so here I come !
Let’s say I’ve a simple workflow, which starts an Activity A.
This Activity A just calls an external http API.
From what I understand from the different guidelines I read, the retry policy for the activity A should be “no max retry”, and if the external API is down, we will just retry until it’s up. This approach looks good to me.
But how can I be alerted if there is too many retries for the activity ? Is this something I should handle on my own ?
I want to easily know when the activity is failing and retrying indefinitely, to know if the external API is down, or if I made a mistake on the URL I’m calling, etc.
I’m looking for a way to list “all workflows where an activity has been retried more than 5 times” for example.
Setting up a MaxRetry policy for Activity A to 5 would answer this usecase, because I can easily list failed workflows, but it does not seem to be the best approach from what I read.
The goal is obviously to know when a strange behaviour is happening during an activity execution and look for the root cause.
For a single wf, you can lookg at the web-ui summary page for a particular workflow. Information under “Pending Activities” includes the activity type, retry attempt count, as well as the last failure info.
same with tctl “desctribe” command, for example:
tctl wf desc -w <my_workflow_id>
Note that you can get the retry attempt inside your activity code as well, for example using Java SDK:
Yes you should rely on timeouts rather than RetryOptions->maximumAttempts. By default your retries will happen up to the activity ScheduleToCloseTimeout, if defined, if it’s not defined, they can retry up to the workflow run/execution timeout. If that is also not defined, then the retries are “unlimited”.
You can control what types of failures cause retries or not as well. You specify which failures should not cause retries by adding them in ActivityOptions->RetryOptions->DoNotRetry. For example if you do not want your activity to retry on IllegalArgumentException:
With that, along with ability to get the retry attempt inside activity code, you could, depending on your business logic control at what point retries should stop, and can perform compensation logic inside your workflow or whatever you need to do.
Having automatic retries in the end is super helpful, as you can change your activity method code, and its activity options (and restart worker) to fix errors without breaking workflow determinism.
in typescript you have client.workflowService.describeWorkflowExecution that return a DescribeWorkflowExecutionResponse that contains pendingActivities. For each pending activity you can get attempt