Nondeterministic workflow: history event is UpsertWorkflowSearchAttributes

We’ve been seeing some of our workflows failing with “nondeterministic workflow: history event is UpsertWorkflowSearchAttributes”. It’s entirely possible that we’ve missed something and our workflow does have some non-determinism, but we can’t find it and if it does then it’s very rare - this has happened on about 30 of 30000 over the past 2 weeks. We can go for several days without seeing any then we see a few over a few hours and then nothing again and we can’t work out what’s causing it.

In the example history below:

  • Everything runs fine up to event 33, which is where we use UpsertSearchAttributes to publish some status information and block the workflow using a selector with an AddReceive(channel).
  • A few minutes later the signal is received (event 34), a workflow task is scheduled but then fails

The really strange thing is that the ExportSpecificationBuilder activity referenced in the error is one that should only happen later in the workflow, after it has blocked on another selector/signal.

We’ve tried replaying the history by downloading it with “tctl wf show -w --output_filename=” then doing ReplayWorkflowHistoryFromJSONFile() on that file, but that runs without error. We can’t find any detailed help on diagnosing “nondeterministic” workflow errors.

We’re running 1.15. Does this correspond to any known issue, or do you have any suggestions for how we could investigate further?

1 2023-01-18T16:29:26Z WorkflowExecutionStarted {WorkflowType:{Name:ProcessStream.v2}, ParentInitiatedEventId:0, TaskQueue:{Name:ACTIVITY_TASK_QUEUE, Kind:Normal},
Input:[{“RequestId”:“workflows-frontend-86c8c7c5f4-57hsg/ZyMQ9jL6T1-114914”,“DocumentId”:“”,“AccountId”:“ouMUsX7BQZOZZLn5yBjTM
w”,“UserEmail”:“”,“ApiClientId”:“123473de15074d8dac3d8e0a61de4200”,“Operation”:“”,“MaxDocumentsCount”:30,“JobId”:“”,“Tas kId”:“”},
… Name":“default-file-system”}],“StreamTimeout”:2419200000000000},“StreamId”:“HBAzI_3rSMSlUfzpNDPwnA”,“StreamName”:“S-prd-
54731692.pdf”,“StreamCreatedAt”:“2023-01-18 16:29:26.98194809 +0000 UTC m=+5850.327798154”,“StreamTimeout”:2419200000000000},
nil], WorkflowExecutionTimeout:0s, WorkflowRunTimeout:0s, WorkflowTaskTimeout:10s, Initiator:Unspecified,
OriginalExecutionRunId:c547add3-fe6d-44d2-83f4-e08922543178, Identity:24@workflows-frontend-86c8c7c5f4-57hsg@,
FirstExecutionRunId:c547add3-fe6d-44d2-83f4-e08922543178, Attempt:1, FirstWorkflowTaskBackoff:0s,
SearchAttributes:{IndexedFields:map{AccountId:“ouMUsX7BQZOZZLn5yBjTMw”, ClientAction:“”, CollectionId:“DJXW6XurT3ixhkHwYO9Btw”,
CreatedTime:“2023-01-18T16:29:26.98194809Z”, DocumentsCount:0, ProjectId:“B_8_Yy5CQdyftFz1CexoBA”, ProjectName:“Atlanticare (v2)”,
State:“running”, StreamName:“S-prd-54731692.pdf”}}}
… (activities running normally)
24 2023-01-18T16:29:34Z WorkflowTaskStarted {ScheduledEventId:23,
Identity:22@workflows-activities-5fcbdbbf6f-zwcm2@,
RequestId:22ad525e-44b5-4235-85a0-ee1a49c19a2a}
25 2023-01-18T16:29:34Z WorkflowTaskCompleted {ScheduledEventId:23, StartedEventId:24,
Identity:22@workflows-activities-5fcbdbbf6f-zwcm2@,
BinaryChecksum:28cf57af1ba1acbd08bc283cb43f2e16}
26 2023-01-18T16:29:34Z UpsertWorkflowSearchAttributes {WorkflowTaskCompletedEventId:25,
SearchAttributes:{IndexedFields:map{DocumentsCount:1}}}
27 2023-01-18T16:29:34Z ActivityTaskScheduled {ActivityId:27, ActivityType:{Name:ExtractData}, TaskQueue:{Name:ACTIVITY_TASK_QUEUE, Kind:Normal},
Input:[{“VDocs”:[{“SrcBlobId”:“ujUG5ZofROqqIB80euybIQ”,“BlobId”:“C-lok3ZETsmWipQeouRQ3w”,“ReadResultBlobId”:“AAAAAAAAAAAAAAAAA
AAAAA”,“PdfWithReadBlobId”:“AAAAAAAAAAAAAAAAAAAAAA”,“FirstPage”:1,“LastPage”:6,“PageCount”:0,“Confidence”:0,“DocType”:“D ocument”,"
… playAllDocuments":false},“Triggers”:[{“ID”:“AAAAAAAAAAAAAAAAAAAAAA”,“Type”:“file-system”,“Name”:“default-file-system”}],
“Exporters”:[{“ID”:“AAAAAAAAAAAAAAAAAAAAAA”,“Type”:“file-system”,“Name”:“default-file-system”}],“StreamTimeout”:24192000 00000000}],
ScheduleToCloseTimeout:0s, ScheduleToStartTimeout:0s, StartToCloseTimeout:1m0s, HeartbeatTimeout:0s, WorkflowTaskCompletedEventId:25,
RetryPolicy:{InitialInterval:1s, BackoffCoefficient:2, MaximumInterval:1m40s, MaximumAttempts:0, NonRetryableErrorTypes:}}
28 2023-01-18T16:29:34Z ActivityTaskStarted {ScheduledEventId:27,
Identity:22@workflows-activities-5fcbdbbf6f-zwcm2@,
RequestId:cc03287f-a91b-4e42-8fe4-fd61c2606159,
Attempt:1}
28 2023-01-18T16:29:34Z ActivityTaskStarted {ScheduledEventId:27,
Identity:22@workflows-activities-5fcbdbbf6f-zwcm2@,
RequestId:cc03287f-a91b-4e42-8fe4-fd61c2606159,
Attempt:1}
29 2023-01-18T16:29:39Z ActivityTaskCompleted {Result:[[{“DocumentType”:{“Id”:“VoDA1u-pTUi_a5YmVvoEtw”,“Name”:“Document”,“ExtractorName”:“Document”,“Fields”:[{“Name”:“Date”,"
Display":0,“Confirm”:0,“Label”:“Date”,“Type”:1,“TextFieldSettings”:{“Placeholder”:“”,“Validations”:{“Required”:false,"Re quiredMess
… l}},{“FieldName”:“AutoDate”,“Result”:{“Text”:“9/15/2017”,“JsonValue”:“2017-09-15T00:00:00”,“RelativeConfidence”:1,“Areas
“:[{“Top”:14.940202,“Left”:189,“Bottom”:21.5866,“Right”:211.40598,“PageNumber”:2}],“Subtype”:null}}]},“IsDocTypeConfiden t”:true}]],
ScheduledEventId:27, StartedEventId:28, Identity:22@workflows-activities-5fcbdbbf6f-zwcm2@}
30 2023-01-18T16:29:39Z WorkflowTaskScheduled {TaskQueue:{Name:workflows-activities-5fcbdbbf6f-zwcm2:084d9822-76a4-42a1-874b-b761e178dde9,
Kind:Sticky}, StartToCloseTimeout:10s, Attempt:1}
31 2023-01-18T16:29:39Z WorkflowTaskStarted {ScheduledEventId:30,
Identity:22@workflows-activities-5fcbdbbf6f-zwcm2@,
RequestId:2513613e-662e-4f41-9f0f-46d39eef7641}
32 2023-01-18T16:29:39Z WorkflowTaskCompleted {ScheduledEventId:30, StartedEventId:31,
Identity:22@workflows-activities-5fcbdbbf6f-zwcm2@,
BinaryChecksum:28cf57af1ba1acbd08bc283cb43f2e16}
33 2023-01-18T16:29:39Z UpsertWorkflowSearchAttributes {WorkflowTaskCompletedEventId:32, **
** SearchAttributes:{IndexedFields:map{ClientAction:“validation”, **
** State:“pending:start_client_action”}}}

34 2023-01-18T16:52:09Z WorkflowExecutionSignaled {SignalName:start_client_action:validation,
Input:[{“Timeout”:1800000000000,“Language”:””}],
Identity:22@workflows-frontend-754b44d478-hfx47@}
35 2023-01-18T16:52:09Z WorkflowTaskScheduled {TaskQueue:{Name:workflows-activities-5fcbdbbf6f-zwcm2:084d9822-76a4-42a1-874b-b761e178dde9,
Kind:Sticky}, StartToCloseTimeout:10s, Attempt:1}
36 2023-01-18T16:52:09Z WorkflowTaskTimedOut {ScheduledEventId:35,
StartedEventId:0,
TimeoutType:ScheduleToStart}
37 2023-01-18T16:52:09Z WorkflowTaskScheduled {TaskQueue:{Name:ACTIVITY_TASK_QUEUE,
Kind:Normal},
StartToCloseTimeout:10s, Attempt:1}
38 2023-01-18T16:52:09Z WorkflowTaskStarted {ScheduledEventId:37,
Identity:22@workflows-frontend-754b44d478-9675t@,
RequestId:63924a3a-a2f4-462b-b104-0bd70c7d6221}
39 2023-01-18T16:52:09Z WorkflowTaskFailed {ScheduledEventId:37, StartedEventId:38,
Cause:WorkflowWorkerUnhandledFailure,
Failure:{Message:nondeterministic workflow: history event
is UpsertWorkflowSearchAttributes: (EventId:33, EventTime:(),
EventType:UpsertWorkflowSearchAttributes, Version:0, TaskId:73417801,
Attributes:&HistoryEvent_UpsertWorkflowSearchAttributesEventAttributes{Upsert
… , ActivityType:(Name:ExportSpecificationBuilder),
TaskQueue:(Name:ACTIVITY_TASK_QUEUE, Kind:Normal), Header:(Fields:map),
Input:(Payloads:[len=5]), ScheduleToCloseTimeout:0s,
ScheduleToStartTimeout:0s, StartToCloseTimeout:1m0s, HeartbeatTimeout:0s),
Source:GoSDK, FailureInfo:{ApplicationFailureInfo:{NonRetryable:false}}},
Identity:22@workflows-frontend-754b44d478-9675t@, ForkEventVersion:0,
BinaryChecksum:e1c61185a8a1ad1e4cc5d5b145d817b0}

Can you please share the json event history of this execution?

tctl wf show -w <wfid> -r <runid> --of myhistory.json

(and share contents of myhistory.json)
What you pasted is really difficult to read