We’re trying to onboard Temporal at the company and looks like there’s a bug in the Temporal state machine. When looking at the code it seems like child workflow completion is being processed by a function which is not meant to process the child. It happens because final event happened and EntityStateMachine was emptied.
Here is how it happens:
I run several child workflows like this:
Promise result =
Async.procedure(processor::processRecord, workflowTransferObject, items,
);
childrenStartedByThisRun.add(result);
And then wait for them: Promise.allOf(childrenStartedByThisRun).get();
After processing some children parent workflow starts failing with:
"Unexpected event:event_id: 162\nevent_time {\n seconds: 1743047329\n nanos: 204815571\n}\nevent_type: EVENT_TYPE_CHILD_WORKFLOW_EXECUTION_COMPLETED\ntask_id:
Stack trace: stackTrace": “io.temporal.internal.statemachines.WorkflowStateMachines.handleNonStatefulEvent(WorkflowStateMachines.java:761)\nio.temporal.internal.statemachines.WorkflowStateMachines.handleSingleEvent(WorkflowStateMachines.java:479)\nio.temporal.internal.statemachines.WorkflowStateMachines.handleEventsBatch(WorkflowStateMachines.java:331)\nio.temporal.internal.statemachines.WorkflowStateMachines.handleEvent(WorkflowStateMachines.java:292)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.applyServerHistory(ReplayWorkflowRunTaskHandler.java:249)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTaskImpl(ReplayWorkflowRunTaskHandler.java:231)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTask(ReplayWorkflowRunTaskHandler.java:165)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTaskWithQuery(ReplayWorkflowTaskHandler.java:135)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTask(ReplayWorkflowTaskHandler.java:100)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handleTask(WorkflowWorker.java:446)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:337)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:278)\nio.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:105)\ndev.keyval.ebpfJava.utils.RunnableWithAncestors.run(RunnableWithAncestors.java:59)\njava.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)\njava.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)\njava.base/java.lang.Thread.run(Unknown Source)\n”,
“encodedAttributes”: null,
Processing of the child goes into handleNonStatefulEvent() which doesn’t have a case for EVENT_TYPE_CHILD_WORKFLOW_EXECUTION_COMPLETED therefore it fails. Looks like it should never go to that function.
Is this a bug?