I have implemented a workflow using the Temporal Go SDK, which is scheduled to run once a day via the Temporal scheduler. However, I’ve noticed something unusual: I am intermittently encountering a Non-Deterministic Execution (NDE) error, primarily after receiving a signal. The error consistently refers to the first activity ID.
What’s puzzling is that for the same signal message, the workflow completes successfully four out of five times, but on one occasion, it encounters the NDE. The code hasn’t been modified, and there are no conditional branches (e.g., if/else
) that could cause different execution paths.
Since the execution path remains consistent, I’m unsure why this is happening. Has anyone experienced similar behavior or could provide insights on what might be causing this issue?
Worker is using the default options.
Any help would be greatly appreciated!
Have you ran your code against the workflow check tool?
Do you have multiple workers and are you sure they all have same workflow code deployed?
NDE happend on workflow task replay before the signal is delivered to your workflow worker that at that time was processing your execution, so not sure thats the trigger. Trigger seems to be the first workflow task timeout which then caused workflow task to be replayed and most likely dispatched on a different worker (check “identity” fields of events 21 and 24 to confirm).
Would first check your code via mentioned workflow check tool and also try running event history of a completed execution of this workflow type through workflow replayer (see test here if that helps). Also if you can share your workflow code we can take a look as well.
the workflow completes successfully four out of five times,
most likely 4 of those times workflow started and completed on same worker and never was evicted from its cache in which case worker never had to do event history replay so nde was never checked
2 Likes
Thank you for the reply! It was very helpful. I ran the workflow check tool, and it revealed some non-deterministic elements in my workflow that were causing the NDE errors. The issues were primarily related to the use of time.Now
and the github.com/goccy/go-json
package.
Thanks again for pointing me in the right direction!
@tihomir I have resolved all the Non-Deterministic Execution (NDE) issues identified by the Workflow Checker, but I’m still encountering an NDE error after the signal.
- Is there a way to enforce running all workflow tasks and activities on the same worker?
- Are there options to configure the worker cache, such as setting its size or TTL?
but I’m still encountering an NDE error after the signal
is it same one as before? signal is i think the trigger for worker checking non-determinism but not the cause imho. would be helpful to be able to see your workflow code and event history json to try to debug
Is there a way to enforce running all workflow tasks and activities on the same worker?
have a single worker on the task queue these workflows and activities are scheduled on
Are there options to configure the worker cache, such as setting its size or TTL?
yes you can set cache size
worker.SetStickyWorkflowCacheSize(N)
default in go sdk is 10K