Retry activity formula

carrotcap3 · March 24, 2022, 5:23pm

Are the retrys for activities supposed to be retried at fixed intervals for every workflow instance? because right now they seem to not be. example workflow with activity defined as follows:

@ActivityStub(startToClose = "PT1H", retryOptions = @RetryActivityOptions(
    backoffCoefficient = 3.0,
    initialInterval = 300,
    maximumAttempts = 5,
    maximumInterval = 86400, 
    doNotRetry = {"abc.my.custom.Exception"}))
private MyActivity myActivity;

Here are two examples of this workflow rerying. As you can see the first one retried 5 times over ~2.5 hours, the second retried all 5 times under 1 hour. I would expect temporal be using a defined formula that retried at the same intervals for all instances

Mar 24, 2022 @ 06:05:36.903
Mar 24, 2022 @ 03:50:31.549
Mar 24, 2022 @ 03:49:08.026
Mar 24, 2022 @ 03:34:03.821
Mar 24, 2022 @ 03:28:59.033

Mar 24, 2022 @ 08:03:31.904
Mar 24, 2022 @ 08:02:32.434
Mar 24, 2022 @ 07:17:29.904
Mar 24, 2022 @ 07:16:32.547
Mar 24, 2022 @ 07:11:31.534

maxim · March 24, 2022, 5:43pm

Are you 100% sure that the activity worker’s health and load didn’t affect the intervals?

carrotcap3 · March 29, 2022, 1:34pm

An update: retries work properly in our other environments. Do you know where I can start troubleshooting? Would a high load cause retry to start earlier then normal? Seems counter-intuitive

tihomir · March 29, 2022, 2:03pm

Worker tuning docs page has a metrics section that could help (check activity_schedule_to_start_latency gauge for possible high latencies).

retries work properly in our other environments

What is the difference if any between those environments and the one you ran on where you noticed the possible issues?

carrotcap3 · March 30, 2022, 6:27pm

I see “Duplicate activity retry timer task” for these activities in history service. What does it mean? We don’t have any difference between environments

tihomir · March 30, 2022, 10:26pm

Duplicate activity retry timer task

This comes from history service timerQueueActiveTaskExecutor.
It means your shard is reloaded after one of your activity retries is processed, but before its ack is persisted. After this shard reloads, this task will be processed again.

Some scenarios where this can happen is db is experiencing issues (look at your db logs to see what may be going on), or a history pod restart happens while your activity is retrying, or k8s decided to add/remove pods for some reason.

Topic		Replies	Views
Workflow retries logic Community Support go-sdk	5	623	April 10, 2023
Understanding workflow retries and failures Community Support go-sdk	3	9086	July 9, 2020
Activity Retries up to Max Attempts(3), but workflow logs show 60 Activity Timeouts Community Support java-sdk , general-impl	0	326	September 11, 2023
Activity retries and alerting Community Support retries	8	6794	November 27, 2023
Workflow Retry - Workflow should skip Activities which are successful in previous run Developer Corner java-sdk	8	1027	October 18, 2024

Retry activity formula

Related topics