Activity retries without exception

Hi,

I have a very open ended question here. I am trying to model a step in temporal which does the following:

  • Poll external service and get response
  • Persist the response in a database
  • Retry the above steps in sequence until the status field obtained in the response matches a value.

I was thinking of modeling an activity for this, but I have some open questions:

  1. How can I configure my activity to retry when there is no failure ? Eg. I want to retry until status field in response is ‘xyz’.
  2. Should I model this in a way such that both polling and persisting the response is part of same activity OR should i ideally create multiple activities - one for polling and other for persisting the response ?
  3. Can I use workflow sleep to retry and practically just use activities for polling response and persisting in db ?
  1. How can I configure my activity to retry when there is no failure ? Eg. I want to retry until status field in response is ‘xyz’.

You fail the activity by throwing an exception (in Java SDK at least) to force a retry. You specify which failures are not retryable by adding them ActivityOptions.RetryOptions.DoNotRetry. Another option is to throw a non retryable application failure created through ApplicationFailure.newNonRetryableFailure.

  1. Should I model this in a way such that both polling and persisting the response is part of same activity OR should i ideally create multiple activities - one for polling and other for persisting the response ?

I would model it as two separate activities unless the payload is large. The main reason that these activities might have different timeouts and retry options.

  1. Can I use workflow sleep to retry and practically just use activities for polling response and persisting in db ?

You can, but it is not recommended. The reason is that each retry is going to add events to the workflow history and it can get too large pretty fast. The built in service side retries to not add events to the history on retries. There are cases when you want to retry sequences of activities, then in this case using workflow side retries is reasonable.

This post has some additional ideas for polling.

Just to be clear, the external service goes through different states after submitting a request. Example: submitted → in_progress → failed/succeeded.

The only reason my activity needs to retry is because I want to exit only when response is in terminal state(which is failed/succeeded). Until then I want to retry to check the status.

So my question is, is it recommended to throw exception in temporal activity just so that it retries polling ?

So my question is, is it recommended to throw exception in temporal activity just so that it retries polling ?

Yes, it is recommended. The post I linked above contains some other options.

Hey Maxim,

Clarification question on when we’re saying “retry” / “polling” in this thread, as the words can vary in meaning wildly depending on how we use them.

My understanding is that non-deterministic business process logic should ideally be expressed in the Workflow. Sometimes, that logic involves checking an external resource and waiting for that resource to be in a certain state. The external system may have the resource available (in that it exists, and can be returned), but as far as the business process is concerned, the resource is not in a satisfactory state to move ahead with the next part of the workflow. Thus, we want to wait some number of hours and then check again on the state of the resource. (Ideally we would have a signal to help us drive this, but let’s say our external system doesn’t have such a notification mechanism, so we need to poll)

To me, the business process defined above belongs in the workflow, which would have a sleep-and-check-again loop, and the activity code would purely be about fetching the resource. In this example the retry would be infrequent (e.g. # of hours, with a maximum of N days). I am currently thinking of activity retries as more to deal with exceptional circumstances / failures, not as expected circumstances.

From a conceptual standpoint, is this in line with Temporal’s view on how the two pieces of logic should be separated, or should we be putting more of that “biz logic” in the activity itself, and modeling this “resource is not ready for us to move forward” concept away from the workflow?

Thank you!
Dan

My understanding is that non-deterministic business process logic should ideally be expressed in the Workflow.

Workflow has to express the deterministic business logic only.

From a conceptual standpoint, is this in line with Temporal’s view on how the two pieces of logic should be separated, or should we be putting more of that “biz logic” in the activity itself, and modeling this “resource is not ready for us to move forward” concept away from the workflow?

From the conceptual standpoint, I would love to not have a concept of activities at all. The ideal model is when an application developer writes any code he wants inside the workflow and Temporal magically preserves its state. But we live in a real world and Temporal has to make various compromises to deliver its “magic”. One of the compromises is that all external workflow actions have to reside in activities and their invocations are recorded as events in a history. So keeping the history size of each workflow as small as possible is necessary for the system performant operation. Using service side activity retry to wait for a resource reaching a certain state is a compromise that works well and doesn’t affect the history size independently of the number of retries.