Find non-determinism issues from the web-ui

Hi!

Found out recently that a few out my workflows had stopped running due to non-determinism errors, but hadn’t flagged themselves as Failed or Cancelled, so I didn’t see them for a while.

As non-determinism errors don’t cause workflows to fail, is there any way other than clicking through all my running workflows in the web-ui to see if I’ve had non-determinism issues? Looking through all the filters, none of them seem to be related to this.

Quite new to this, so if there’s something fundamental I should be doing instead of checking the web-ui for this error, that’s also very helpful.

Thanks!

Also interested in this. Did you find a solution @John_Musson?

Hi,

This is not something that temporal provides OOTB,

Temporal SKDs will increase the metric https://docs.temporal.io/references/sdk-metrics#workflow_task_execution_failed if there is a NDE. In Java failure_reason is attached to the metric.

but hadn’t flagged themselves as Failed or Cancelled

The workflow execution won’t fail by default, temporal server will re-schedule the workflow task so you can fix the code and redeploy it without losing the execution state/context. Once the fix is redeployed, the workflow execution will continue from where it was left.

SDKs will allow you to fail the workflow if there is a NDE, for example, in Java

    worker.registerWorkflowImplementationTypes(
            WorkflowImplementationOptions.newBuilder()
                    .setFailWorkflowExceptionTypes(NonDeterministicException.class).build(),
            ...);

And you can always catch NDE with workflow.replay before deploying the code changes to production.

Antonio