I am using the java-sdk and am trying to set a custom UncaughtExceptionHandler. I see I can do this in PollerOptions. I am however failing to config my worker to use my new PollerOptions. Can someone point me in the right direction?
For context, I am seeing an exception for:
io.temporal.internal.replay.InternalWorkflowTaskException: Failure handling event 16 of type ‘EVENT_TYPE_WORKFLOW_TASK_COMPLETED’ during replay. {PreviousStartedEventId=15, workflowTaskStartedEventId=84, Currently Processing StartedEventId=15}
Which I think is a version change issue of my workflow. For my scenario, it’s less important that a workflow completes properly, but more important that if it does error it gets restarted. What I think I want to do is catch the error, and if possible try to pull some state from the failed workflow and rebuild it.
I believe PollerOptions
is an internal class and should not be used.
Yes, seems you have non-deterministic workflow code that could be caused by issues with your version updates like you mentioned but also check these constraints as well.
For versioning its good practice to test using WorkflowReplayer so you could catch these kinds of issues before deploying version updates.
If I understand correctly, I assume your execution does not fail in this case but gets blocked waiting for a fix up to a set workflow run/execution timeout.
By default workflow does not fail on exceptions that don’t extend TemporalFailure (InternalWorkflowTaskException does not) but keep replaying the workflow task waiting for a fix.
If you wanted to fail the workflow on any exception type you could set:
WorkflowImplementationOptions implementationOptions = WorkflowImplementationOptions.newBuilder()
.setFailWorkflowExceptionTypes(Throwable.class)
.build();
// ...
worker.registerWorkflowImplementationTypes(workflowImplementationOptions, MyWorkflowImpl.class);
By default workflow executions are not retried (unless you explicitly set retries in WorkflowOptions) but this is in most cases not recommended. I think best option could be to fix the code that causes the issue and redeploy if possible.
For failed executions you could just start new ones, or use reset feature to reset them to a particular point in execution before the failure. You can reset using api as well.