Unexpected NonDeterministicWorkflowError for simple workflow

Hey there! I’m trying to better understand why my workflow is hitting a NonDeterministicWorkflowError and how I can update the definition to avoid it.

My workflow is defined using the coinbase temporal-ruby SDK.

The workflow code looks like this:

class RecoverWorkflow
  def execute(id, sharding_num)
    shard_num = sharding_num || 1
    status_futures = (0..shard_num - 1).map { |index| RecoverActivity.execute(id, index) }
    statuses = status_futures.map(&:get)
    all_success =
      statuses
        .map
        .with_index do |s, i|
          success = s == true
          $logger.info("recovery job #{id} failed on shard index #{i}: #{s}") unless success
          success
        end
        .all?
    RecoverTimestampActivity.execute(id, all_success).get
    return all_success
  end
end

Error:

"type": "Temporal::NonDeterministicWorkflowError",
    "nonRetryable": false,
    "details": {
      "payloads": [
        "Unexpected command.  The replaying code is issuing: workflow (17), but the history of previous executions recorded: activity (17). Likely, either you have made a version-unsafe change to your workflow or have non-deterministic behavior in your workflow.  See https://docs.temporal.io/docs/java/versioning/#introduction-to-versioning."
      ]

Workflow Execution History:

From the history, my understanding is that the first RecoverActivity execution errored out due to our StartToClose timeout of 10 minutes and succeeded on the second attempt.

Is the non-determinism here caused by the fact that the input to the RecoverTimestampActivity can change based on RecoverActivity success or something else?

I think the error means that during worker event replay your workflow code after OrgScimRecoverTimeStampActivity does not invoke the OrgScimRecoverActivity which was invoked during execution already and is in event history for this execution.

Would look at what your workflow code does after OrgScimRecoverTimeStampActivity invocation.

Also would make sure all your workers have the same workflow code registered just in case.