Hi!
Found out recently that a few out my workflows had stopped running due to non-determinism errors, but hadn’t flagged themselves as Failed or Cancelled, so I didn’t see them for a while.
As non-determinism errors don’t cause workflows to fail, is there any way other than clicking through all my running workflows in the web-ui to see if I’ve had non-determinism issues? Looking through all the filters, none of them seem to be related to this.
Quite new to this, so if there’s something fundamental I should be doing instead of checking the web-ui for this error, that’s also very helpful.
Thanks!
Also interested in this. Did you find a solution @John_Musson?
Hi,
This is not something that temporal provides OOTB,
Temporal SKDs will increase the metric https://docs.temporal.io/references/sdk-metrics#workflow_task_execution_failed if there is a NDE. In Java failure_reason
is attached to the metric.
but hadn’t flagged themselves as Failed or Cancelled
The workflow execution won’t fail by default, temporal server will re-schedule the workflow task so you can fix the code and redeploy it without losing the execution state/context. Once the fix is redeployed, the workflow execution will continue from where it was left.
SDKs will allow you to fail the workflow if there is a NDE, for example, in Java
worker.registerWorkflowImplementationTypes(
WorkflowImplementationOptions.newBuilder()
.setFailWorkflowExceptionTypes(NonDeterministicException.class).build(),
...);
And you can always catch NDE with workflow.replay before deploying the code changes to production.
Antonio