Hi All
I would like to ask more regarding nondeterministic error that was being throw after temporal server restart (possibly because of the replay function), that makes the services print spam log, we need to terminate the workflow, if we don’t want to be spam with the error log.
For the introduction
This is our current flow in diagram
I would explain the chronology in here:
- Workflow ID start to run at Aug 25th 11:47:51 am
- Workflow continue to run the flow :
a. Signal 1
b. Signal 2
c. Activity 1
→ Status is NOT OK
d. Activity 2
Workflow then continue to wait for Signal 1 again.
When timer for signal 1 started it’s at Aug 25th 13:19pm, those if we add 24 hr,
timer action should be triggered on Aug 26th 13:19 pm.
if within 24 hr no signal being received we gonna end this workflow. - All of the workflow is being limit using timer mechanism, so that all of this need to be finish within 24 hour.
- At step number 2 while we wait Signal 1, there was an accident on our temporal server that it being shutdown at Aug 26th 09:00 am - 13:31 pm
When the temporal server being up, temporal seem to fire all the timer that actually need to be executed at 13:19 pm, and it’s actually continue to end the workflow and continue next activity
We have some miss configuration in here, but we don’t know if this cause non deterministic
scheduleToCloseTimeout 5s
scheduleToStartTimeout 5s
startToCloseTimeout 5s
and we think it seem to cause the activity time out,
when getting this activity error we actually have handle by checking the error and do some other activity which we wait for signal for another action
At 6.49 pm while we try to restart the server again, it caused nondeterministic error
PanicError: unknown command CommandType: Timer, ID: 25, possible causes are nondeterministic workflow definition code or incompatible change in the workflow definition
process event for QUEUE [panic]:
go.temporal.io/sdk/internal.panicIllegalState(...)
/build/vendor/go.temporal.io/sdk/internal/internal_decision_state_machine.go:409
go.temporal.io/sdk/internal.(*commandsHelper).getCommand(0xc041719f90, 0xc000000004, 0xc04261104a, 0x2, 0x0, 0x0)
/build/vendor/go.temporal.io/sdk/internal/internal_decision_state_machine.go:881 +0x185
go.temporal.io/sdk/internal.(*commandsHelper).handleTimerClosed(0xc041719f90, 0xc04261104a, 0x2, 0xc0422ae180, 0xc042898d80)
/build/vendor/go.temporal.io/sdk/internal/internal_decision_state_machine.go:1298 +0x4b
go.temporal.io/sdk/internal.(*workflowExecutionEventHandlerImpl).handleTimerFired(0xc00054a078, 0xc041df64c0)
/build/vendor/go.temporal.io/sdk/internal/internal_event_handlers.go:1097 +0x7d
go.temporal.io/sdk/internal.(*workflowExecutionEventHandlerImpl).ProcessEvent(0xc00054a078, 0xc041df64c0, 0x0, 0x0, 0x0)
/build/vendor/go.temporal.io/sdk/internal/internal_event_handlers.go:862 +0x648
go.temporal.io/sdk/internal.(*workflowExecutionContextImpl).ProcessWorkflowTask(0xc04223a600, 0xc042384a50, 0x1b06d80, 0xc00026e000, 0xc04223a600, 0x0)
/build/vendor/go.temporal.io/sdk/internal/internal_task_handlers.go:902 +0x7bf
go.temporal.io/sdk/internal.(*workflowTaskHandlerImpl).ProcessWorkflowTask(0xc00012e370, 0xc042384a50, 0xc042384ed0, 0x0, 0x0, 0x0, 0x0)
/build/vendor/go.temporal.io/sdk/internal/internal_task_handlers.go:749 +0x6d9
go.temporal.io/sdk/internal.(*workflowTaskPoller).processWorkflowTask(0xc000494820, 0xc042384a50, 0x0, 0x0)
/build/vendor/go.temporal.io/sdk/internal/internal_task_pollers.go:284 +0x4ae
go.temporal.io/sdk/internal.(*workflowTaskPoller).ProcessTask(0xc000494820, 0x109c560, 0xc042384a50, 0xc0006ab798, 0xfec75e)
/build/vendor/go.temporal.io/sdk/internal/internal_task_pollers.go:255 +0x85
go.temporal.io/sdk/internal.(*baseWorker).processTask(0xc000198780, 0x109c120, 0xc0003c5ce0)
/build/vendor/go.temporal.io/sdk/internal/internal_worker_base.go:400 +0x115
created by go.temporal.io/sdk/internal.(*baseWorker).runTaskDispatcher
/build/vendor/go.temporal.io/sdk/internal/internal_worker_base.go:305 +0xff
i will post example code in bellow thread
once again thank you