Use case
Our application makes API calls. We use a workflow to prepare the request, and process the response. And an activity to make the API call.
In some cases, the API will return 200 but It may say that the intended action couldn’t be completed for XYZ reason.
Once the activity is completed, the workflow will review the response. As the intended action coulnd’t be completed, this is a failure from Business perspective even though we got HTTP 200.
Currently, we are throwing a non retryable exception to fail the workflow, so that the workflow status reflects the business status which is failure in this case.
Questions
What are the downsides of failing a workflow to have the workflow status reflect business status ?
Is there a better way to fail a workflow other than by throwing a non retryable exception? I
Imho tying the workflow execution status to a domain-specific business “status” term is not optimal. Your workflow can complete successfully even if your business logic does not require retries of the failed activity. This would allow you to do things like compensation and business-level recovery from the failure if needed. Your workflow final result(s) can be the indicator of your domain-specific “failure” and failed workflow status would indicate a technical failure that can be seen as a real bug in your workflow code that needs to be fixed.
You can catch an exception and re-throw it as ApplicationFailure which always fails the workflow. You can add exceptions to the list of exceptions that fail the workflow via WorkflowImplementationOptions
Thanks @tihomir. In our case there are multiple api calls in our main workflow. As per the business logic we have grouped them in child workflows. These child workflows can execute sequentially or in parallel as per business requirements. Now there are 2 scenarios of child workflow failure:
API call failed due to end system unavailability. In that case after all the retries are over we need to fail the child workflow and cancel all upcoming or running workflow executions. Once that is done we need to do compensation for all executed child workflows using SAGA approach.
API call was successful but received response stating some error (example: required field was missed in the request). In that case as well child workflow needs to be failed and cancel all upcoming or running workflow executions. Once that is done we need to do compensation for all executed workflows using SAGA approach.
For both the scenarios currently we are throwing “ApplicationFailure.newNonRetryableFailure/newFailure” from child workflow to mark it failed. Is there any other way to mark the workflow failed? If we mark the child workflow completed it will not be clear from temporal ui why the main workflow stopped executing other child workflows when the previous child workflow execution is completed.
Please let us know if you think above situations can be handled in a different way.
Thanks for the detailed description. Failing the child workflows with your business error is fine. From the original question I was not aware of the details.
It looks as you are trying to recover from failure for your child workflows and if cannot recover, fail with a business error so that’s fine