Retry activity formula

Are the retrys for activities supposed to be retried at fixed intervals for every workflow instance? because right now they seem to not be. example workflow with activity defined as follows:

@ActivityStub(startToClose = "PT1H", retryOptions = @RetryActivityOptions(
    backoffCoefficient = 3.0,
    initialInterval = 300,
    maximumAttempts = 5,
    maximumInterval = 86400, 
    doNotRetry = {"abc.my.custom.Exception"}))
private MyActivity myActivity;

Here are two examples of this workflow rerying. As you can see the first one retried 5 times over ~2.5 hours, the second retried all 5 times under 1 hour. I would expect temporal be using a defined formula that retried at the same intervals for all instances

Mar 24, 2022 @ 06:05:36.903
Mar 24, 2022 @ 03:50:31.549
Mar 24, 2022 @ 03:49:08.026
Mar 24, 2022 @ 03:34:03.821
Mar 24, 2022 @ 03:28:59.033

Mar 24, 2022 @ 08:03:31.904
Mar 24, 2022 @ 08:02:32.434
Mar 24, 2022 @ 07:17:29.904
Mar 24, 2022 @ 07:16:32.547
Mar 24, 2022 @ 07:11:31.534

Are you 100% sure that the activity worker’s health and load didn’t affect the intervals?

An update: retries work properly in our other environments. Do you know where I can start troubleshooting? Would a high load cause retry to start earlier then normal? Seems counter-intuitive

Worker tuning docs page has a metrics section that could help (check activity_schedule_to_start_latency gauge for possible high latencies).

retries work properly in our other environments

What is the difference if any between those environments and the one you ran on where you noticed the possible issues?

I see “Duplicate activity retry timer task” for these activities in history service. What does it mean? We don’t have any difference between environments

Duplicate activity retry timer task

This comes from history service timerQueueActiveTaskExecutor.
It means your shard is reloaded after one of your activity retries is processed, but before its ack is persisted. After this shard reloads, this task will be processed again.

Some scenarios where this can happen is db is experiencing issues (look at your db logs to see what may be going on), or a history pod restart happens while your activity is retrying, or k8s decided to add/remove pods for some reason.