runReplayHistory accuses non-determinism for condition() with timeout

My CI runs runReplayHistory on every PR.

I have a workflow with an await condition(), and I want to add a timeout to it to prevent the workflow from running forever in case my “Done” signal never comes back.

Workflow (and replay histories) was working just fine. But as soon as I add a timeout, CI fails 'cause runReplayHistory accuses non-deterministic behavior. The error I get:

2024-12-06T21:45:15.170900Z  WARN temporal_sdk_core::worker::workflow: Failing workflow task run_id=37fdda9b-d084-4cdc-9b8d-d1460dde5c01 failure=Failure { failure: Some(Failure { message: "[TMPRL1100] Nondeterminism error: 
Timer machine does not handle this event: HistoryEvent(id: 21, Ok(ActivityTaskScheduled))", source: "", stack_trace: "", encoded_attributes: None, cause: None, failure_info: Some(ApplicationFailureInfo(ApplicationFailureInfo { r#type: "", non_retryable: false, details: None, next_retry_delay: None })) }), force_cause: NonDeterministicError }

Unhandled rejection { runId: undefined } DeterminismViolationError: Replay failed with a nondeterminism error. This means that the workflow code as written is not compatible with the history that was fed in. Details: Workflow activation completion failed: Failure { failure: Some(Failure { message: "[TMPRL1100] Nondeterminism error: 
Timer machine does not handle this event: HistoryEvent(id: 21, Ok(ActivityTaskScheduled))", source: "", stack_trace: "", encoded_attributes: None, cause: None, failure_info: Some(ApplicationFailureInfo(ApplicationFailureInfo { r#type: "", non_retryable: false, details: None, next_retry_delay: None })) }), force_cause: NonDeterministicError }

See the diff. Left-side is the working version, right-side is the version accusing non-determinism.

Any ideas why these changes make my workflow non-deterministic now? At first I thought it’d be the timeout, but now I wonder if it’s the if() block that checks for generationStatus (line 46).

Thanks in advance.

Can you share full json event history please?

await condition(() => generationStatus !== null, timeout);
adding a timeout is going to create non-determinism as your worker would now create a command for service to create a timer (if timeout duration > 0). However your event history so far does not have a TimerStarted event, so your worker detects that mismatch and would report it as nde.

looking at the full history however would allow us to check if nondeterminism is happening somewhere before this change, just in case

Thanks, @tihomir.

Reason for my confusion - I thought replay-history would run against my new code (as if it tried running my new code multiple times to ensure it was deterministic).

After your answer, and after talking a mate from my team, I realized the replay-history is actually comparing the new code (with timeout) with the old code (without timeout). Which, like you said, makes sense to no longer be compatible, as it now have a timeout. Pretty much what’s said here.

So yeah, basically I’d need versioning or, in my particular case, since it hasn’t been rolled out to production yet, I can just skip the replay-history, then enforce it back once we’re ready for production.