What are the different ways Temporal handles exceptions thrown in Workflows?

I’d like to confirm my understand of how Temporal handles uncaught Exceptions thrown outside of Activities. I have written my current understanding below, and I have a few specific questions about it.

  1. Are these cases accurate?
  2. Am I missing any other expected/standard cases?
  3. Is there standard Temporal terminology for these 3 cases, for example “replay” vs “retry”?
  4. Is case 3 the same thing that happens when a worker crashes? If not, what’s the difference?

How I think it works:

  • Case 1: thrown Exception extends Temporal Failure

Expected Result: Workflow fails and closes with no retries.

  • Case 2: thrown Exception does not extend TemporalFailure and is specified in WorkflowImplementationOptions.setFailWorkflowExceptionTypes

Expected Result: Workflow retries from scratch, ignoring EventHistory and re-executing any activities as well as the special Workflow.random*() methods. Q: In the event of a full retry, is the pervious EventHistory retained anywhere?

  • Case 3: thrown Exception does not extend TemporalFailure and is not specified in WorkflowImplementationOptions.setFailWorkflowExceptionTypes (like an NPE)

Expected Result: Workflow replays indefinitely or until hitting a timeout, using the Event History, waiting for a fix.

By default, workflows by default don’t have retry options and are not retried when failed or timed out.

  • Case 1 : thrown Exception extends Temporal Failure

Expected Result : Workflow fails and closes with no retries.

Workflow fails and is retried (by executing from the beginning) only if retry options are specified.

  • Case 2 : thrown Exception does not extend TemporalFailure and is specified in WorkflowImplementationOptions.setFailWorkflowExceptionTypes

Expected Result : Workflow retries from scratch, ignoring EventHistory and re-executing any activities as well as the special Workflow.random*() methods.

The same as Case 1. Workflow fails and is retried (by executing from the beginning) only if retry options are specified.

Q : In the event of a full retry, is the pervious EventHistory retained anywhere?

The workflow retry is modeled as a new workflow run. So it gets a new runId. The previous run data including its event history is still available up to the retention period.

  • Case 3 : thrown Exception does not extend TemporalFailure and is not specified in WorkflowImplementationOptions.setFailWorkflowExceptionTypes (like an NPE)

Expected Result : Workflow replays indefinitely or until hitting a timeout, using the Event History, waiting for a fix.

Correct.

When my workflow fails (due to an exception thrown from a workflow context), the workflow’s status (shown by the UI) is ‘Running’ and the workflow keeps retrying, even if I set the retry’s maximumAttempts to be 1 (when I start the client).
It seems that it’s relevant for activities’ failures only.
Is it the desired behavior? What should I do for marking it as Fail / prevent rerunning the workflow in case of workflow’s exception?

An exception that doesn’t extend TemporalFailure is treated as a bug and blocks workflow execution without failing it. It is done by design to avoid losing workflows on unexpected failures. You don’t want to lose the state of 100 million workflows due to a NullPointerException, for example.

If you want to fail a workflow throw an ApplicationFailure which extends TemporalFailure instead.

Thanks for the reply
So, if I understand correctly, if a non temporalFailure exception is thrown in the workflow context, it shouldn’t run again, right?
Because when I run a workflow that just prints ‘1’ to the screen and then throws a regular error, using default client configurations, I see on the worker’s console only one error, but multiple '1’s, that being printed to the screen with backoffCoefficient intervals
Is it the expected behavior?

if a non temporalFailure exception is thrown in the workflow context, it shouldn’t run again

Yes that’s the default behavior, to block and replay the workflow task where the error was thrown waiting for a fix.

You can influence this default behavior via WorkflowImplementationOptions->setFailWorkflowExceptionTypes for example:

WorkflowImplementationOptions options =
   WorkflowImplementationOptions.newBuilder()
      .setFailWorkflowExceptionTypes(NullPointerException.class)
      .build();
// ...
worker.registerWorkflowImplementationTypes(options, MyWorkflowImpl.class);

which would tell worker to fail execution on NPE thrown in workflow code (for executions of that registered workflow type). If you wanted to fail on any exceptions in workflow code you could set this to Throwable.class.