Throwing Exception vs Failure in workflow

Hi all,

I have a question about the Failure hierarchy and what the best practice is for handling workflow errors. Are workflows supposed to throw only Failures, or is it acceptable to throw an exception (e.g. IllegalArgumentException) from workflow code?

I ask because in the unit tests, throwing a Failure seems to work fine but throwing an exception seems to put the unit test into a loop of retries with the below message. This example is from WorkflowTestingTest.testFailure(), altered to throw IllegalArgumentException instead of ApplicationFailure. The below message gets printed out repeatedly every few seconds

Thanks!
Scott

	09:15:43.431 [Workflow Executor taskQueue="test-workflow", namespace="default": 3] ERROR i.t.internal.worker.PollerOptions - uncaught exception
java.lang.RuntimeException: Failure processing workflow task. WorkflowId=c5ac197d-598b-4e8e-98f3-9aa6019380cd, RunId=d8585e55-106d-43fe-b5d4-fc436992f71f
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.wrapFailure(WorkflowWorker.java:337)
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.wrapFailure(WorkflowWorker.java:275)
	at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:79)
	at io.temporal.internal.worker.PollTaskExecutor$$Lambda$76/0000000000000000.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:823)
Caused by: io.temporal.internal.replay.InternalWorkflowTaskException: Failure handling event 17 of 'EVENT_TYPE_WORKFLOW_TASK_STARTED' type. IsReplaying=false, PreviousStartedEventId=17, workflowTaskStartedEventId=17, Currently Processing StartedEventId=17
	at io.temporal.internal.statemachines.WorkflowStateMachines.handleEvent(WorkflowStateMachines.java:193)
	at io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleEvent(ReplayWorkflowRunTaskHandler.java:140)
	at io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTaskImpl(ReplayWorkflowRunTaskHandler.java:180)
	at io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTask(ReplayWorkflowRunTaskHandler.java:150)
	at io.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTaskWithEmbeddedQuery(ReplayWorkflowTaskHandler.java:199)
	at io.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTask(ReplayWorkflowTaskHandler.java:114)
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:309)
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:275)
	at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:73)
	... 4 common frames omitted
Caused by: java.lang.RuntimeException: WorkflowTask: failure executing SCHEDULED->WORKFLOW_TASK_STARTED, transition history is [CREATED->WORKFLOW_TASK_SCHEDULED]
	at io.temporal.internal.statemachines.StateMachine.executeTransition(StateMachine.java:140)
	at io.temporal.internal.statemachines.StateMachine.handleHistoryEvent(StateMachine.java:91)
	at io.temporal.internal.statemachines.EntityStateMachineBase.handleEvent(EntityStateMachineBase.java:63)
	at io.temporal.internal.statemachines.WorkflowStateMachines.handleEventImpl(WorkflowStateMachines.java:210)
	at io.temporal.internal.statemachines.WorkflowStateMachines.handleEvent(WorkflowStateMachines.java:178)
	... 12 common frames omitted
Caused by: java.lang.IllegalArgumentException: test
	at io.temporal.internal.testing.WorkflowTestingTest$FailingWorkflowImpl.workflow1(WorkflowTestingTest.java:135)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at io.temporal.internal.sync.POJOWorkflowImplementationFactory$POJOWorkflowImplementation$RootWorkflowInboundCallsInterceptor.execute(POJOWorkflowImplementationFactory.java:289)
	at io.temporal.internal.sync.POJOWorkflowImplementationFactory$POJOWorkflowImplementation.execute(POJOWorkflowImplementationFactory.java:253)
	at io.temporal.internal.sync.WorkflowExecuteRunnable.run(WorkflowExecuteRunnable.java:52)
	at io.temporal.internal.sync.SyncWorkflow.lambda$start$0(SyncWorkflow.java:121)
	at io.temporal.internal.sync.SyncWorkflow$$Lambda$93/0000000000000000.run(Unknown Source)
	at io.temporal.internal.sync.CancellationScopeImpl.run(CancellationScopeImpl.java:104)
	at io.temporal.internal.sync.WorkflowThreadImpl$RunnableWrapper.run(WorkflowThreadImpl.java:111)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	... 3 common frames omitted

By default, any unexpected exception which doesn’t extend TemporalFailure doesn’t fail a workflow. It blocks its execution and periodically retries waiting for the fix. This is done to avoid failing workflows due to unexpected bugs like NPE. In most cases, users do not want a few million workflows to fail and require manual intervention on a silly NPE bug introduced by a new deployment.

If you indeed want to fail the workflow on a specific exception include it into WorkflowImplementationOptions.FailWorkflowImplementationTypes. For example, if you want your workflow to fail on any exception (which we very rarely recommend in production) set it to Throwable. The WorkflowImplementationOptions are passed to Worker.registerWorkflowImplementationTypes call.

1 Like

If a Workflow retries due to one of the NPE cases, will it also retry (not replay) any activities that have completed before? I’m wondering what happens if I make a non-idempotent API call in my workflow that is being retried while it waits for a fix.

No, activities will not be retried. And workflow doesn’t retry due to NPE. I wasn’t precise. The last workflow task is retried. Not the whole workflow.

I want the workflow to fail for certain exceptions thrown and so I will be using WorkflowImplementationOptions.FailWorkflowImplementationTypes

My workflows, however, are async and long-running and I execute them using submitWorkflow. How can I create an exceptionMapper/handler to handle these failed workflow executions?

More generally as well, if a workflow fails (due to timeout, etc), can we register a handler that is invoked which then performs cleanup operations?

The recommended approach is to perform cleanup by the workflow itself before failing it.

But is there an Override method like void onError(final Throwable e) or void onEnd(final Throwable e) that we can use? Alternatively, I can basically have a try/catch over the entire code if there is no better way.

Alternatively, I can basically have a try/catch over the entire code if there is no better way.

Or could use workflow inbound interceptor to catch exceptions and handle there. Just note that execution timeouts are managed by server, so you could not for example catch timed out workflows, would probably need to do that in a client interceptor.

I think try/catch over the entire code is the best way, as the catch handler has all the necessary data to perform the cleanup.