In one of our workflows we are getting this non-determinism error that I don’t understand:
{
"message": "Nondeterminism(\"No command scheduled for event HistoryEvent(id: 1204, Some(StartChildWorkflowExecutionInitiated))\")",
"source": "",
"stackTrace": "",
"encodedAttributes": null,
"cause": null,
"applicationFailureInfo": {
"type": "",
"nonRetryable": false,
"details": null
}
}
What does it mean that there is no command scheduled for an event?
The workflow ran for less than 30 mins and there was definitely no incompatible workflow code change during that time period.
Here are the relevant events from the event history, including the event with the ID 1204 and other events referencing that ID:
{
"eventId": "1204",
"eventTime": "2023-07-02T11:37:45.351883423Z",
"eventType": "StartChildWorkflowExecutionInitiated",
"version": "0",
"taskId": "20987162",
"workerMayIgnore": false,
"startChildWorkflowExecutionInitiatedEventAttributes": {
"namespace": "default",
"namespaceId": "f843e2f7-30fc-495e-ae28-7a0c1ff72ea2",
"workflowId": "inspection-core-run_5ec3d230-f0f9-46ea-bfca-839faeb343d7_0457713d-1807-43b3-b72c-f8ad3fa502d7_bf71a66d-6d94-4bb5-be0f-45ad073b8d3f/e12c1207-d7f8-4144-b188-628422e88317_d53bf124-6dc6-4c27-8484-dab71cb57cd2",
"workflowType": {
"name": "inspect"
},
"taskQueue": {
"name": "inspection-workflow/us-ea-srq9",
"kind": "Normal"
},
"input": {
"payloads": [ ]
},
"workflowExecutionTimeout": null,
"workflowRunTimeout": "0s",
"workflowTaskTimeout": "120s",
"parentClosePolicy": "Terminate",
"control": "",
"workflowTaskCompletedEventId": "1190",
"workflowIdReusePolicy": "AllowDuplicate",
"retryPolicy": null,
"cronSchedule": "",
"header": {
"fields": {}
},
"memo": {
"fields": {}
},
"searchAttributes": {
"indexedFields": {}
}
}
},
{
"eventId": "1246",
"eventTime": "2023-07-02T11:37:45.661675656Z",
"eventType": "ChildWorkflowExecutionStarted",
"version": "0",
"taskId": "20987233",
"workerMayIgnore": false,
"childWorkflowExecutionStartedEventAttributes": {
"namespace": "default",
"namespaceId": "f843e2f7-30fc-495e-ae28-7a0c1ff72ea2",
"initiatedEventId": "1204",
"workflowExecution": {
"workflowId": "inspection-core-run_5ec3d230-f0f9-46ea-bfca-839faeb343d7_0457713d-1807-43b3-b72c-f8ad3fa502d7_bf71a66d-6d94-4bb5-be0f-45ad073b8d3f/e12c1207-d7f8-4144-b188-628422e88317_d53bf124-6dc6-4c27-8484-dab71cb57cd2",
"runId": "1714286f-630f-4686-91c9-cc656eaceb5c"
},
"workflowType": {
"name": "inspect"
},
"header": {
"fields": {}
}
}
},
{
"eventId": "1404",
"eventTime": "2023-07-02T11:41:00.943727941Z",
"eventType": "ChildWorkflowExecutionCompleted",
"version": "0",
"taskId": "20987637",
"workerMayIgnore": false,
"childWorkflowExecutionCompletedEventAttributes": {
"result": {
"payloads": [ ]
},
"namespace": "default",
"namespaceId": "f843e2f7-30fc-495e-ae28-7a0c1ff72ea2",
"workflowExecution": {
"workflowId": "inspection-core-run_5ec3d230-f0f9-46ea-bfca-839faeb343d7_0457713d-1807-43b3-b72c-f8ad3fa502d7_bf71a66d-6d94-4bb5-be0f-45ad073b8d3f/e12c1207-d7f8-4144-b188-628422e88317_d53bf124-6dc6-4c27-8484-dab71cb57cd2",
"runId": "1714286f-630f-4686-91c9-cc656eaceb5c"
},
"workflowType": {
"name": "inspect"
},
"initiatedEventId": "1204",
"startedEventId": "1246"
}
},
{
"eventId": "1556",
"eventTime": "2023-07-02T11:45:48.174443934Z",
"eventType": "WorkflowTaskFailed",
"version": "0",
"taskId": "20988052",
"workerMayIgnore": false,
"workflowTaskFailedEventAttributes": {
"scheduledEventId": "1554",
"startedEventId": "1555",
"cause": "NonDeterministicError",
"failure": {
"message": "Nondeterminism(\"No command scheduled for event HistoryEvent(id: 1204, Some(StartChildWorkflowExecutionInitiated))\")",
"source": "",
"stackTrace": "",
"encodedAttributes": null,
"cause": null,
"applicationFailureInfo": {
"type": "",
"nonRetryable": false,
"details": null
}
},
"identity": "8@inspection-workflow-698b5df6d4-kq4jg",
"baseRunId": "",
"newRunId": "",
"forkEventVersion": "0",
"binaryChecksum": "@temporalio/worker@1.4.4+215cf040ff09e11dc911b5abd1971ab145947d006965a0bbcaf9dfe3610c64a5"
}
},
In total, there are 15 WorkflowTaskFailed events all referencing the 1204 event ID.
I didn’t find any documentation on this specific non-determinism error. I found that it can be raised by the handle_command_event
function in the server. But I don’t understand how a None value would get pushed to the queue.