How to perform compensation actions on different workflow failures

Hi Temporal Team,

We have a need to handle transactions involving various microservices with a high degree of data consistency.
For this, we model a workflow, with activities and compensations.
Something like this:

public class MyWorkflowImpl implements MyWorkflow {
	private MyActivity v = ...
	private ExternalActivity externalActivity = ...

	public void executeLongTx(Long orderId) {
		Saga saga = new Saga(new Saga.Options.Builder().setParallelCompensation(false).build());		
		try {
			String externalTxId = externalActivity.execute(orderId);
			saga.addCompensation(externalActivity::cancel, externalTxId);

			saga.addCompensation(myActivity::cancel, orderId);		
		} catch (TemporalFailure failure) {			
			// Compensation Code
			log.error("Error: " + failure.getCause().getMessage());			
			Workflow.newDetachedCancellationScope(() -> saga.compensate()).run();

This works fine. When some activity fails, we catch the exception and run the compensation activities. The problem occurs to us in other failure scenarios.
For example, when a workflow reaches a Terminated, TimedOut, or Failed state.

In these situations, the compensation code written in the workflow is not executed.
So what would be the best way to handle compensation in these cases?
Basically we need to know which activities managed to execute successfully, what were their responses, and then execute the compensatory activities.


1 Like

For termination, this is a hard-stop of your execution (terminated by server) so you would not be able to catch an exception in your workflow code, but could handle io.temporal.client.WorkflowFailedException which would include io.temporal.failure.TerminatedFailure as its cause in your client code (or in client interceptor) if that helps. One thing you could possibly look into is to prevent termination api requests via a custom authorizer, see sample here.

Similar for timed out execution (workflow execution timeout is handled by the server) where you could catch io.temporal.client.WorkflowFailedException in your client code (or in client interceptor) that would include io.temporal.failure.TimeoutFailure as its cause. If worried about this, best not to set WorkflowExecution/Run/Timeout in your WorkflowOptions when starting the execution.

For workflow failed, I think you can make sure to handle exceptions similar as shown in your code. For your example you could catch ActivityFailure (or ChildWorkflowFailure for child workflows) which are delivered to your workflow code after all the retries are exhausted.

Ok Tihomir, thanks for the quick response.

I believe that within the execution of a workflow, we can catch a TemporalFailure and execute compensatory actions.

Outside of this, we might catch a WorkflowException. Can you think of a mechanism here to compensate the activities that have been executed successfully?

The answer is to not use Terminate (which by definition doesn’t support cleanup) but use cancellation instead. And do not use workflow timeout, which is, by definition, an automatic termination after some time.

Ok, thanks so much!