How to rerun an activity when its async output produces an error downstream

Hello,

I’m a relatively new user so please be patient. I’m building on a workflow where I have an activity that calls out to a third party service to kick off a job. That job can take days to complete but when it does we receive a webhook. I have another activity that fetches the output of the completed job and stores it within our system for further processing. I currently have all of this modeled as follows:

Workflow
startThirdPartyJobActivity

await completionSignal

downloadAndStoreActivity

Everything is working fine when the code only uses the “happy path”. The problem I’m running into is sometimes the third party job can fail after it’s been successfully started and we can only detect this after the webhook (and completion signal) is sent. When this happens I’d like to retry the startThirdPartyJobActivity N times (just in case the failure is transient) and then fail the workflow if the retries don’t resolve the issue.

I’m looking for advice on 1) am I modeling this interaction in a reasonable way? and 2) what’s the best way to accomplish my goal? Should I be trying to use Async Completion? Should I just wrap it in a retry loop?

Thanks in advance for your help!

Hello @jcjmaven

Welcome!!

Should I be trying to use Async Completion?

it will work, you can set the maximun number of retries to startThirdPartyJobActivity. Since this is a long-running activity you should consider setting hearbeatTimeout and hearbeating using the ActivityCompletionClient.

In docs we have recomendations on when to use / not to use this approach

Another approach could be having this logic in a child workflow and setMaximumAttempts in the childWorkflowOptions.
If the completion signal is not satisfactory, you can fail the child workflow (e.g throwing ApplicaitonFailure) and it will retry according the retry policy you have set.

  1. am I modeling this interaction in a reasonable way?
  • what should happen if after x days the workflow does not receive the signal?
  • there is another approach you can use (I am not saying you have to), which is having an activity polling the result every X hours (see samples repo). This way you could detect if the third party service has failed or is down and handle the scenario accordenly

Antonio

Thanks for the quick response!

Can you elaborate a little more on how I could make the ActivityCompletionClient approach work? From what I understand I’d be using the client from outside of my startThirdPartyJobActivity but in order to do so I need to have either the taskToken or the Activity Id. Those only seem to be available from within the Activity so I don’t see how to make the connection.

I also don’t understand how the child workflow approach solves this problem; there’s likely a misunderstanding on my part. My understanding is that if a workflow fails and is retried any activities that had already been successfully completed won’t run again. So in my case, the startThirdPartyJobActivity would have completed successfully and I would fail the workflow after receiving the signal (and determining the 3rd party failed). When the workflow ran again it would pick up after the signal had been received, and since nothing had changed it would fail again. What am I missing?

Thank you for your help and patience.

Hi @jcjmaven

From what I understand I’d be using the client from outside of my startThirdPartyJobActivity but in order to do so I need to have either the taskToken or the Activity Id…

Correct, this will only work if you control the external service. You can pass the token to the external service within the request, and whenever the job completes you can complete the activity asynchronously. A much better approach for your use case is the one you have mentioned. (Triggering the job and awaiting in your workflow code)

You can either control the number of retries with a loop in your workflow code

for...{

- startThirdPartyJobActivity //set maxAttemp to 1 since you are handleling retries from your workflow code

- await completionSignal //you can set the duration (can vary between retries if needed) to handle not receiving a signal after X days.

}

downloadAndStoreActivity

or using Workflow.retry

My understanding is that if a workflow fails and is retried any activities that had already been successfully completed won’t run again.

Workflows do not retry by default if they fail ( I mean workflow status = failed), but you can configure them to retry with RetryOptions, then after the workflow is marked as failed temporal will create a new workflow execution and start from the begining.

I think you mean replay. Workflow execution status is stored in cache, but sometimes the worker has to replay the whole workflow history before continue with the workflow execution.

I also don’t understand how the child workflow approach solves this problem

This will create a new workflow execution if your workflow fails and start from the begining, in most of the cases setting workflow retries is not recomended. It depends on your needs.

Let us know if it helps,

Antonio

Many thanks for the detailed response Antonio! It looks like using Workflow.retry will accomplish the behavior I’m looking for.

Thanks again for your help.

Jeremy