Exception thrown while querying a workflow that has been terminated?

Hi,

I was wondering is it expected for a exception to be thrown when i try to query a workflow that has been terminated? Shouldn’t the replay still work in terms of getting status.

Here is the grpc exception that is being thrown.

io.grpc.StatusRuntimeException: INVALID_ARGUMENT: io.temporal.internal.replay.InternalWorkflowTaskException: Failure handling event 10 of 'EVENT_TYPE_WORKFLOW_EXECUTION_TERMINATED' type. IsReplaying=true, PreviousStartedEventId=8, workflowTaskStartedEventId=0, Currently Processing StartedEventId=8
	at io.temporal.internal.statemachines.WorkflowStateMachines.createEventProcessingException(WorkflowStateMachines.java:221)
	at io.temporal.internal.statemachines.WorkflowStateMachines.handleEventsBatch(WorkflowStateMachines.java:201)
	at io.temporal.internal.statemachines.WorkflowStateMachines.handleEvent(WorkflowStateMachines.java:175)
	at io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTaskImpl(ReplayWorkflowRunTaskHandler.java:177)
	at io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleQueryWorkflowTask(ReplayWorkflowRunTaskHandler.java:251)
	at io.temporal.internal.replay.ReplayWorkflowTaskHandler.handleQueryOnlyWorkflowTask(ReplayWorkflowTaskHandler.java:257)
	at io.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTask(ReplayWorkflowTaskHandler.java:112)
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:319)
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:279)
	at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:73)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.IllegalArgumentException: Unexpected event:event_id: 10
event_time {
  seconds: 1642567913
  nanos: 753142552
}
event_type: EVENT_TYPE_WORKFLOW_EXECUTION_TERMINATED
task_id: 95996084
workflow_execution_terminated_event_attributes {
  details {
    payloads {
      metadata {
        key: "encoding"
        value: "binary/null"
      }
    }
  }
  identity: "23763@C02C71F6MD6V@"
}

	at io.temporal.internal.statemachines.WorkflowStateMachines.handleNonStatefulEvent(WorkflowStateMachines.java:486)
	at io.temporal.internal.statemachines.WorkflowStateMachines.handleSingleEvent(WorkflowStateMachines.java:238)
	at io.temporal.internal.statemachines.WorkflowStateMachines.handleEventsBatch(WorkflowStateMachines.java:199)
	... 11 more

	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
	at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.queryWorkflow(WorkflowServiceGrpc.java:2983)
	at io.temporal.internal.external.GenericWorkflowClientExternalImpl.lambda$query$5(GenericWorkflowClientExternalImpl.java:216)
	at io.temporal.internal.retryer.GrpcSyncRetryer.retry(GrpcSyncRetryer.java:61)
	at io.temporal.internal.retryer.GrpcRetryer.retryWithResult(GrpcRetryer.java:51)
	at io.temporal.internal.external.GenericWorkflowClientExternalImpl.query(GenericWorkflowClientExternalImpl.java:209)
	at io.temporal.internal.client.RootWorkflowClientInvoker.query(RootWorkflowClientInvoker.java:141)
	at io.temporal.internal.sync.WorkflowStubImpl.query(WorkflowStubImpl.java:342)
	... 83 common frames omitted
Wrapped by: io.temporal.client.WorkflowQueryException: workflowId='6dadae4b-8c48-42f8-8002-e6e8e6583e5b', runId='619db0e4-a526-45c5-83cf-2c39619a3211', workflowType='CoolWorkflow'}
	at io.temporal.internal.sync.WorkflowStubImpl.query(WorkflowStubImpl.java:349)
	at io.temporal.internal.sync.WorkflowInvocationHandler$SyncWorkflowInvocationHandler.queryWorkflow(WorkflowInvocationHandler.java:309)
	at io.temporal.internal.sync.WorkflowInvocationHandler$SyncWorkflowInvocationHandler.invoke(WorkflowInvocationHandler.java:272)

Example of code

final TempWorkflow workflow =
        workflowClient.newWorkflowStub(
            TempWorkflow.class, workflowId.toString(), Optional.of(runId.toString()));
workflow.getStatus();

In this case, when i trigger the workflow query method getStatus this will throw the exception. The workflow was terminated. Here is a snapshot of what the history looks like

image

Thanks,
Derek

Looks like the open issue: Query failure on terminated workflow · Issue #391 · temporalio/sdk-java · GitHub

We are seeing this error in regular workflow execution as well. From the stack trace, it seems to be coming from same io.temporal.internal.statemachines.WorkflowStateMachines.handleNonStatefulEvent

SDK version: 1.8.0

Trace:

io.temporal.internal.replay.InternalWorkflowTaskException: Failure handling event 1399 of type 'EVENT_TYPE_WORKFLOW_TASK_COMPLETED' during execution. {PreviousStartedEventId=1398, workflowTaskStartedEventId=1403, Currently Processing StartedEventId=1398} 
io.temporal.internal.statemachines.WorkflowStateMachines.createEventProcessingException(WorkflowStateMachines.java:222)
io.temporal.internal.statemachines.WorkflowStateMachines.handleEventsBatch(WorkflowStateMachines.java:201)
io.temporal.internal.statemachines.WorkflowStateMachines.handleEvent(WorkflowStateMachines.java:175)
io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTaskImpl(ReplayWorkflowRunTaskHandler.java:176)
io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTask(ReplayWorkflowRunTaskHandler.java:145)
io.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTaskWithQuery(ReplayWorkflowTaskHandler.java:122)
io.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTask(ReplayWorkflowTaskHandler.java:97)
io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:241)
io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:199)
io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:93)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.base/java.lang.Thread.run(Unknown Source)

Caused By: java.lang.IllegalArgumentException: Unexpected event:event_id: 1399
event_time {
  seconds: 1649094400
  nanos: 532032321
}
event_type: EVENT_TYPE_WORKFLOW_TASK_COMPLETED
task_id: 177212170
workflow_task_completed_event_attributes {
  scheduled_event_id: 1397
  started_event_id: 1398
  identity: "<redacted>"
}
 
io.temporal.internal.statemachines.WorkflowStateMachines.handleNonStatefulEvent(WorkflowStateMachines.java:494)
io.temporal.internal.statemachines.WorkflowStateMachines.handleSingleEvent(WorkflowStateMachines.java:240)
io.temporal.internal.statemachines.WorkflowStateMachines.handleEventsBatch(WorkflowStateMachines.java:199)
io.temporal.internal.statemachines.WorkflowStateMachines.handleEvent(WorkflowStateMachines.java:175)
io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTaskImpl(ReplayWorkflowRunTaskHandler.java:176)
io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTask(ReplayWorkflowRunTaskHandler.java:145)
io.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTaskWithQuery(ReplayWorkflowTaskHandler.java:122)
io.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTask(ReplayWorkflowTaskHandler.java:97)
io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:241)
io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:199)
io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:93)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.base/java.lang.Thread.run(Unknown Source)

Likely cause is non-determistic workflow code. Would help if you could share so we can take a look. One thing to look at is use of system time in workflow code to rule that out.

The workflow code has some complex code. I am trying to reproduce with simple setup, but it is only happening in production at random. Error is not consistent.

I have seen non deterministic errors which will be clear something like below

Event 5 of EVENT_TYPE_ACTIVITY_TASK_SCHEDULED does not match command COMMAND_TYPE_COMPLETE_WORKFLOW_EXECUTION

But this error does not look like them. What it appears to me from event ids is, it is trying to process a previously completed WORKFLOW_TASK_COMPLETED event again and it is causing issues.

Did you try replaying your workflow history using WorkflowReplayer to see if you can reproduce this error and try to debug?