Handling logical retries as a dependency of multiple activities

See plantUML diagram

It is difficult to write this eloquently, but I am looking for suggestions on how to handle a workflow which has a need for retries outside of the atomic “activity” unit provided by temporal. I also want to keep my activities modular.

My workflow is simplified in the diagram above, but essentially several tasks make up a DAG.

  1. We must first fetch an identifier for an Object (ID). [fetchId]
  2. With ID, fetch the Object status from an external dependency. [fetchStatus]
  3. We also want to determine the state of our Object on our database. [getRecord]
  4. If our Object state has changed, persist. [record]
  5. While Object status is NOT complete, GOTO 2.

Since step 5 is at a higher level of abstraction than the activities, I have implemented my own retry logic in the workflow which uses Workflow.sleep if our status is not complete. However, this code was not clean. I would prefer to combine the entire repeatable block (2-5) into a single activity.

The problem is that my retries for each activity and the overall block repeatable need to vary heavily. Essentially, this is a poller which will have much larger backoff periods than the activities which will have limited backoff for dependency failures, etc.

I thought about using a child workflow, but would the best approach for that be to have my parent workflow implement an activity which invokes the repeatable block as a child workflow, then throw if the status is not complete (allowing the parent activity’s retry options to kick in)?

We have plans to allow activity to specify when it should be retried dynamically.

Until then I would recommend encapsulating the whole 5 steps in a child workflow which calls these steps in the loop and calls continue as new periodically to avoid unbounded event history growth. Note that from the parent workflow point of view the whole chain of the child executions will be seen as a single child invocation. See Child Workflows document for details.

BTW If you are using Java then you can use Workflow.retry method to implement retry of any part of the code.