Are the retrys for activities supposed to be retried at fixed intervals for every workflow instance? because right now they seem to not be. example workflow with activity defined as follows:
Here are two examples of this workflow rerying. As you can see the first one retried 5 times over ~2.5 hours, the second retried all 5 times under 1 hour. I would expect temporal be using a defined formula that retried at the same intervals for all instances
Mar 24, 2022 @ 06:05:36.903
Mar 24, 2022 @ 03:50:31.549
Mar 24, 2022 @ 03:49:08.026
Mar 24, 2022 @ 03:34:03.821
Mar 24, 2022 @ 03:28:59.033
Mar 24, 2022 @ 08:03:31.904
Mar 24, 2022 @ 08:02:32.434
Mar 24, 2022 @ 07:17:29.904
Mar 24, 2022 @ 07:16:32.547
Mar 24, 2022 @ 07:11:31.534
An update: retries work properly in our other environments. Do you know where I can start troubleshooting? Would a high load cause retry to start earlier then normal? Seems counter-intuitive
I see “Duplicate activity retry timer task” for these activities in history service. What does it mean? We don’t have any difference between environments
This comes from history service timerQueueActiveTaskExecutor.
It means your shard is reloaded after one of your activity retries is processed, but before its ack is persisted. After this shard reloads, this task will be processed again.
Some scenarios where this can happen is db is experiencing issues (look at your db logs to see what may be going on), or a history pod restart happens while your activity is retrying, or k8s decided to add/remove pods for some reason.