Is there any way to reset the retry counter = 0 after a failed activity recovers and start heartbeating.
I have a long running Activity (think weeks+) that maintains its liveliness via heartbeat. Obviously the activity gets killed and respawned multiple time due to ec2/pod rebalancing.
Ideally I’d expect the retry counter / max attempts to reset once the activity starts heart beating again but it does not. I am currently doing workaround by setting a very high retry count like 30+ to account for infra failures. But ideally I’d want the activity to fail if more than 5 consecutive failures.
Is there any way I can reset the counter once the activity start heartbeating again ? (that should be the default behaviour IMO.
what does this activity do? does it periodically check on result of some async operation?
do you use heartbeat payload as “resume” mechanism when your pod restarts and activity times out and is retried on potentially different activity worker?
There isnt a way to reset max attempt count based on your sdk worker pod restarts.
Is there specific reason to restrict activity retries based on attempt count and not overall duration of time (ScheduleToClose activity timeout)?
In your workflow code do you handle activity failure after in your case max attempts is reached?
if you do, you can check if last attempt failure was a timeout (you can also check if it was a heartbeat timeout too) and then decide to schedule this activity again if needed as well as option.