Workflow Determinism

Workflow Determinism

Temporal Workflows must be deterministic. Stated using our terminology:

For a given Workflow Type, its Workflow Definition (implementation) must produce the same sequence of Commands given the same History

Less formally, this means a Workflow must always do the same thing given the same inputs.

This requirement is important and necessary because it is how Temporal is able to ensure the state
of your workflow is not lost if, for example, the Worker running your workflow crashes.

As your Workflow is executed, Commands are generated and appended to the History of the Workflow along with other Events. If a Workflow’s state needs to be re-created on a new Worker, the entire History for that Workflow Execution is sent to the worker and replayed from the beginning. If the same sequence of Commands is emitted by the Worker’s implementation of the workflow as exists in the History, that is how Temporal ensures that the Workflow Definition has recreated the same state that it would’ve had when executed by whatever Worker last completed a Workflow Task.

If this Worker’s Workflow Definition were to emit a different Command than the one in the History
at any given point during replay, the SDK is faced with an impossible choice. Because the implementation has just done something different from what exists in History, the rest of that History cannot apply! Stated differently: The code we are executing is trying to do something different from what it did when previously executed, and therefore we are in some unknown state which can’t be the same state we should’ve reached

Encountering this situation causes our SDKs to emit a nondeterminism error.

Sources of Nondeterminism

There are two reasons your Workflow could be doing something different from what was expected during replay.

Code changes

The Workflow implementation for a particular Workflow type can only change in a limited fashion
and remain compatible with existing histories for that type.

Any change to the Workflow code which would cause it to emit commands in a different sequence given
the same input is not a backwards compatible change, and thus cannot be deployed to the same task
queue without potentially breaking outstanding Workflow Executions.

For example, if you have a Workflow which does the following:

1. Start a timer/sleep
2. Start and wait on an activity
3. Complete

Which you then change to:

1. Start and wait on an activity
2. Start a timer/sleep
3. Complete

If you then deploy a Worker with that new implementation, if there are any incomplete workflows
they may be picked up by the Worker, and they will fail with nondeterminism errors.

The reason to fail in this situation is to ensure that your Workflows reach a consistent state
given the same history. Ensuring the same command sequence is Temporal’s best way to do so. We
cannot actually compare state, as that would require snapshotting memory at the end of every Workflow
Task, which is impractical for a number of reasons.

You can in fact make changes to workflow code which will result in different internal state (different variable values, etc) without necessarily breaking determinism. However, if you were to change the order or type of commands based on that state, then determinism would again be broken. In effect, such “internal-only” changes produce no externally-visible difference (Queries aside), and are thus acceptable (though not recommended without good reason).

Additionally, we make some practical allowances that permit some minor changes.
You may, without breaking determinism:

  • Change the duration of a timer
  • Change the arguments to:
    • Activity (local or nonlocal)
    • Child workflow
    • Signal to external workflow

Such changes will not take effect if replaying a History which already contains those Commands, but
Workflows who are the first to reach the code emitting them will emit them with the new values.

In order to make changes safely, you can deploy new versions of Workflows to new Task Queues, or
you can use the appropriate versioning API for your chosen language. See help for:

Intrinsic

Intrinsic nondeterminism is when a Workflow, without any code change, does different things when
executed with the same input.

An obvious example is branching based on a random number (using a Temporal SDK’s randomness
API is fine, since the randomness is deterministic as a function of the workflow’s Run Id).

A perhaps slightly less obvious example would be branching based on wall-clock time. For example
(pseudocode):

fn my_workflow() {
    if system_clock().is_before("12pm") {
        await workflow.sleep(duration_until("12pm"))
    } else {
        await my_afternoon_activity()
    }
}

Here, system_clock() is a function returning wall clock time, rather than Workflow-defined time.
If you do this, a Workflow which executed before 12pm will have produced a different sequence of
Commands than that same Workflow replayed at 1pm. Some language SDKs (just TypeScript for now) will prevent you from even importing/using disallowed APIs. In others, you must use Temporal-provided
alternatives. For example, the right method to use in Java would be Workflow.currentTimeMillis().

Generally speaking, any function which performs side effects should not be called from workflow code.
All operations that do not purely mutate Workflow state should occur through Temporal SDK APIs. This
ensures that your Workflow will always reach the same state with the same history.

4 Likes