Retry policy for activity failed due to worker deployments

SB_rb · February 22, 2026, 8:41pm

We have a multi-step workflow which runs for multiple days. Some activities submit a batch job to external service and poll for the results in a sleep-poll-sleep loop. The results can take up to days to complete. The polling activity is idempotent, when restarted it will resume polling the last submitted job and each poll call is accompanied by a temporal heartbeat. However, the workers are periodically deployed (CI/CD) and a running worker pod often gets killed and replaced by a new worker pod. Temporal detects a missing heartbeat and reschedules the activity but counts that as a retry. After 3 retries the workflow is failed by the temporal (current max_retries is 3), however the job was still being successfully polled and would have succeeded had the workflow not failed. Now here is the problem, if we set it to infinite retries that will also apply to genuine exceptions and timeouts raised by the python program that the activity is running, which should not be retried beyond a reasonable # of attempts. The obvious patch is to handle the program exceptions manually (or use tenacity), but that seems like not using temporal’s core strengths. Is there any other way to do this?

tihomir · March 1, 2026, 2:57am

Restrict retries based on activity ScheduleToClose timeout. Alternatively could have some custom logic when to raise non-retryable failure from activity code based on initial schedule time of activity, see here.

Topic		Replies	Views
Infinite retries handling Community Support	2	135	October 14, 2025
Activity retries without exception Community Support java-sdk , activity , best-practices	12	4078	August 16, 2023
Workflow should be in running status if activity failed due to retry exhausts Community Support	1	55	June 20, 2025
Workflow retries logic Community Support go-sdk	5	707	April 10, 2023
ActivityWorker Error invalid activityID or activity already timed out or invoking workflow is completed Community Support go-sdk	2	1849	September 23, 2021

Retry policy for activity failed due to worker deployments

Related topics