Is there a way to prevent Start-To-Close Timeout failures from being retried?
We have a resource-intensive Activity and we don’t want it to run indefinitely. We’ve configured a Start-To-Close Timeout, and also coded our Activity Worker to halt if the Timeout is exceeded. This works as expected; the Activity is halted, but it’s also queued for retry. These retries are undesirable as they tend to produce the same outcome (i.e. they also time out, which leads to retry after retry), wasting resources. Although we want retries in general, we would like to avoid retrying in this particular case.
I’m aware that Retry Policies can have Non-Retryable Errors, but that doesn’t appear to work for Start-To-Close Timeouts, unless I’m missing something?
@maxim wondering if there is a way to retry for certain exceptions and ignore others?
Example: I do a long polling in my activity (python, async activity) talking to external services periodically, I do want to retry if there are network failures in external calls etc but start-to-close timeout etc should be honored and activity should be stopped
You can specify which exceptions should be retried through the RetryPolicy.non_retryable_error_types. Or you can throw non retryable AppliationFailure from the activity code. StartToClose timeout is always retried up to maxAttempts or ScheduleToClose timeout. I don’t understand the use case for disabling retries for StartToClose timeout.
Intention is the long poll can take anywhere between 5 mins to 2 hours, so wanted to wait max of 2 hours, polling for every 5 seconds. I want the activity to be restarted automatically in case of network failures (there are few network calls, talking to few other services, removed them above for brevity), but should honor the schedule_to_close_timeout etc.
does the above example look okay w.r.t the requirements outlined?
I think I may have to set start_to_close_timeout to lower so worker failures etc are detected quickly and activity gets restarted?
Your code looks fine. Activity will be retried automatically on any failure.
I think I may have to set start_to_close_timeout to lower so worker failures etc are detected quickly and activity gets restarted?
The failure will be detected after 30 seconds due to the heartbeat timeout. BTW if start_to_close is not specified, it defaults to the schedule_to_close. So you don’t need to specify it in this case.