Misuse (?) of activity retry logic

Greetings. There are many AWS action “pairs” where you issue a command and then poll for the response to detect completion (yes, yes, there are alternatives to polling, you could hook up to AWS cloud events and use the activity doNotComplete capability but ignore that for now).

Examples include:

  • EC2/EBS: AttachVolume, DescribeVolumes (e.g. until those volumes becomes Attached).
  • SSM: SendCommand, GetCommandInvocation (until the command becomes Completed).

Ideally, you want to poll for the response every N seconds, because you really don’t want to have a long-running activity sitting a the worker stack for no reason. The ideal pattern is setting ActivityOptions with RetryOptions. In the activity, you check three states:

  1. Completion, you just return.
  2. Not completed, i.e. waiting – throw newFailure(..).
  3. Will never complete (e.g. failure, cancelled, etc.), throw newNonRetryableFailure(..).

Except, in the UI. Whenever it sees a failed activity, it gets plastered on the workflow.

So. it is possible to set a failure exception to indicate “it’s not really a failure, we just want to retry”?

I realize it’s a slight misuse of exception handling, but it works well otherwise. :grinning:

Obviously, the alternative is to do the logic myself in the workflow using a timer, but then I have code the retry logic too.

You are absolutely correct that this is not the best user experience. Here is the issue to get this fixed.