Hi all,
I’m trying to investigate some weird workflow failures that our temporal worker encountered – we have a long-running activity that routinely signals the workflow, and for much of the day, it waits, sending messages every 30 minutes that it is in await mode. This is not done from the activity code but in a python program called by the activity. In testing this has worked perfectly fine, but for some reason, the last time it was ran, it ran into workflow timeout then failures of this kind each time it was signaled:
i {
"eventTime": "2021-03-15T19:22:09.000Z",
"eventType": "WorkflowTaskFailed",
"eventId": "33",
"details": {
"scheduledEventId": "30",
"startedEventId": "32",
"cause": "WORKFLOW_TASK_FAILED_CAUSE_UNSPECIFIED",
"failure": {
"message": "Failure handling event 5 of 'EVENT_TYPE_ACTIVITY_TASK_SCHEDULED' type. IsReplayng=true, PreviousStartedEventId=3, workflowTaskStartedEventId=32, Currently Processing StartedEventId=3",
"source": "JavaSDK",
"stackTrace": "io.temporal.internal.statemachines.WorkflowStateMachines.handleEvent(WorkflowStateMachines.java:193)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleEvent(ReplayWorkflowRunTaskHandler.java:140)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTaskImpl(ReplayWorkflowRunTaskHandler.java:180)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTask(ReplayWorkflowRunTaskHandler.java:150)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTaskWithEmbeddedQuery(ReplayWorkflowTaskHandler.java:202)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTask(ReplayWorkflowTaskHandler.java:112)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:309)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:275)\nio.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:73)\njava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\njava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\njava.lang.Thread.run(Thread.java:748)\n",
"cause": {
"message": "Event 5 of EVENT_TYPE_ACTIVITY_TASK_SCHEDULED does not match command COMMAND_TYPE_COMPLETE_WORKFLOW_EXECUTION",
"source": "JavaSDK",
"stackTrace": "io.temporal.internal.statemachines.WorkflowStateMachines.handleCommandEvent(WorkflowStateMachines.java:263)\nio.temporal.internal.statemachines.WorkflowStateMachines.handleEventImpl(WorkflowStateMachines.java:199)\nio.temporal.internal.statemachines.WorkflowStateMachines.handleEvent(WorkflowStateMachines.java:178)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleEvent(ReplayWorkflowRunTaskHandler.java:140)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTaskImpl(ReplayWorkflowRunTaskHandler.java:180)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTask(ReplayWorkflowRunTaskHandler.java:150)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTaskWithEmbeddedQuery(ReplayWorkflowTaskHandler.java:202)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTask(ReplayWorkflowTaskHandler.java:112)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:309)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:275)\nio.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:73)\njava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\njava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\njava.lang.Thread.run(Thread.java:748)\n",
"cause": null,
"applicationFailureInfo": {
"type": "java.lang.IllegalStateException",
"nonRetryable": false,
"details": null
},
"failureInfo": "applicationFailureInfo"
},
"applicationFailureInfo": {
"type": "io.temporal.internal.replay.InternalWorkflowTaskException",
"nonRetryable": false,
"details": null
},
"failureInfo": "applicationFailureInfo"
},
"identity": "1@singer-worker-dep-577444955c-zfx6d",
"baseRunId": "",
"newRunId": "",
"forkEventVersion": "0",
"binaryChecksum": "",
"eventId": "33",
"eventType": "WorkflowTaskFailed",
"kvps": [
{
"key": "eventTime",
"value": "Mar 15th 3:22:09 pm"
},
{
"key": "eventId",
"value": "33"
},
{
"key": "scheduledEventId",
"value": "30"
},
{
"key": "startedEventId",
"value": "32"
},
{
"key": "cause",
"value": "WORKFLOW_TASK_FAILED_CAUSE_UNSPECIFIED"
},
{
"key": "failure",
"value": "io.temporal.internal.replay.InternalWorkflowTaskException: Failure handling event 5 of 'EVENT_TYPE_ACTIVITY_TASK_SCHEDULED' type. IsReplayng=true, PreviousStartedEventId=3, workflowTaskStartedEventId=32, Currently Processing StartedEventId=3 \nio.temporal.internal.statemachines.WorkflowStateMachines.handleEvent(WorkflowStateMachines.java:193)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleEvent(ReplayWorkflowRunTaskHandler.java:140)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTaskImpl(ReplayWorkflowRunTaskHandler.java:180)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTask(ReplayWorkflowRunTaskHandler.java:150)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTaskWithEmbeddedQuery(ReplayWorkflowTaskHandler.java:202)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTask(ReplayWorkflowTaskHandler.java:112)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:309)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:275)\nio.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:73)\njava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\njava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\njava.lang.Thread.run(Thread.java:748)\n\nCaused By: java.lang.IllegalStateException: Event 5 of EVENT_TYPE_ACTIVITY_TASK_SCHEDULED does not match command COMMAND_TYPE_COMPLETE_WORKFLOW_EXECUTION \nio.temporal.internal.statemachines.WorkflowStateMachines.handleCommandEvent(WorkflowStateMachines.java:263)\nio.temporal.internal.statemachines.WorkflowStateMachines.handleEventImpl(WorkflowStateMachines.java:199)\nio.temporal.internal.statemachines.WorkflowStateMachines.handleEvent(WorkflowStateMachines.java:178)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleEvent(ReplayWorkflowRunTaskHandler.java:140)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTaskImpl(ReplayWorkflowRunTaskHandler.java:180)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTask(ReplayWorkflowRunTaskHandler.java:150)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTaskWithEmbeddedQuery(ReplayWorkflowTaskHandler.java:202)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTask(ReplayWorkflowTaskHandler.java:112)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:309)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:275)\nio.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:73)\njava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\njava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\njava.lang.Thread.run(Thread.java:748)\n"
},
{
"key": "identity",
"value": "1@singer-worker-dep-577444955c-zfx6d"
},
{
"key": "baseRunId",
"value": ""
},
{
"key": "newRunId",
"value": ""
},
{
"key": "forkEventVersion",
"value": "0"
},
{
"key": "binaryChecksum",
"value": ""
}
],
"eventTime": "Mar 15th 3:22:09 pm"
},
When I try to run the workflow again, the signal is successfully received and there are no failures or timeouts. I’m not entirely clear on what is the cause of failure here so any guidance is appreciated. Thanks!