Local activity is retried infinitely in case a workflow exception occurs directly after

Hi, I have a question. We observed that by default, a workflow will perform a replay if some exception in the workflow itself occurs,
* unless the exception class is a ApplicationFailure.newNonRetryableFailure()
* or the excepction class is in the ‘FailWorkflowExceptionTypes’
for which the workflow will fail instead.

Now we also observed that in case there is a LOCAL activity in front of some workflow exception, it turns out that the LOCAL activity is executed but NOT stored at all in the history. Therefore, it happened that the workflow is replayed infinitely, and the LOCAL activity as well, resulting in an infinite number of messages sent by the activity.

We also know that one should avoid exceptions being thrown in a workflow, however in a rare case of a bug in our code, it might still occur that for example an IllegalStateException is thrown. This results in undesired behaviour, namely activities being executed over and over again.

We also checked that this does not happen for REMOTE activities, or in case we add a sleep of 1ms between the LOCAL activity and the IllegalStateException.

Question: is this behaviour as intended/expected, or might this be a bug in Temporal?
(Using Temporal server version 1.20.3)

public class TestReplayWorkflowImpl implements TestReplayWorkflow {

    private final TestReplayActivities testReplayLocalActivities = newLocalActivityStub(TestReplayActivities.class, LOCAL_ACTIVITY_OPTIONS);

    private Optional<OrderCreatedResponse> orderCreatedResponse = Optional.empty();

    @Override
    public void orderCreatedResponseReceived(OrderCreatedResponse response) {
        this.orderCreatedResponse = Optional.of(response);
    }

    @Override
    public void execute(Request request) {

        testReplayLocalActivities.createOrder();
        Workflow.await(() -> orderCreatedResponse.isPresent());

        testReplayLocalActivities.sendMessage("Message from local activity!");
        
        throw new IllegalStateException("Illegal state exception!");
    }
}

This behavior is expected. We plan to add the ability to pause workflow task retries after a configured interval to avoid too many retries.

Ok, thanks. Could you explain what the reason is that this is expected? From my perspective, I think what happens that if you ‘kill’ the workflow fast enough by throwing an exception, Temporal does not have enough time to store the local activity in the history (because it happens in the same thread), while waiting for 1ms is just enough time. Am I right there?

The main idea behind local activities is that their result is only recorded when the workflow task completes. This happens when a workflow is blocked and cannot progress without some external event like a timer, an activity completion, or a signal. An exception causes workflow task failure, meaning no result is recorded. The task is retried, which leads to the local activity reexecution.

Ok thank you very much for the clear explanation!