QueryWorkflow cannot query workflow for failed state?

I’m getting the following error when i query the workflow state:

Cannot query workflow due to Workflow Task in failed state

However, I’m still able to see the workflow state from the Temporal UI. I thought the UI uses the same method in querying the workflow type via query type with the QueryWorkflow API

Thanks,
Derek

How do you make the query request? It contains RejectCondition. Make sure that you specify NONE.

Just to add, would also look at event history of this execution. The mentioned error can happen if you have workflow task that continues being in failed state (look if you can find WorkflowTaskFailed/TimedOut events in this execution event history).

Thanks, i’ll look into using QueryWorkflowWithOptions

queryResponse, queryErr := s.wfClient.Temporal.QueryWorkflowWithOptions(ctx, &client.QueryWorkflowWithOptionsRequest{
			WorkflowID:           "ebca4a66-d32c-4b44-a414-001dc670923e",
			RunID:                     runId,
			QueryType:            workflow.QueryMetadataKey,
			QueryRejectCondition: enums.QUERY_REJECT_CONDITION_NONE,
		})

@tihomir @maxim

Current Use Case:
Currently, I call ListWorkflow to get the list of workflows and then for each one I call query to get the workflow state and aggregate the data back to the user. I’ve ran into issue with the call to my api will timeout because querying takes so long; especially with workflows that are left in a bad state (ex non deterministic), or due to workflow code changes, the query will eventually timeout on the caller side.

The only way I’ve been able to get around this problem is by putting a StartTime filter on my querying of list Workflows to by pass this issue, but this is not ideal

Code:

queryResponse, queryErr :=  workflow.QueryWorkflowWithOptions(ctx, &client.QueryWorkflowWithOptionsRequest{
				WorkflowID:           workflowId,
				RunID:                     runId,
				QueryType:            workflow.QueryMetadataKey,
			})

Note: in the code, there is the QUERY_REJECT_CONDITION_NOT_COMPLETED_CLEANLY that I could use, but this also captures the workflows that completed, but due to workflow code changes or incorrect workflow deterministic usage like UUID.

  1. From my understanding, versioning is one mechanism to solve this problem.
    • It’s not clear how you would use versioning to address an issue when you query a workflow. Is it possible to ListWorkflow by version in the query string. It wasn’t clear how this would be done in the query string
    • How do you also specify a version when starting the workflow ExecuteWorkflow. I’m not seeing anything in StartWorkflowOptions request of the API.
  2. However, it’s not clear what I can do after the fact since the workflow is part of the workflow history. One option that could work from what I’ve been told is reseting it, but I haven’t been able to get that to work. Maybe i’m understanding the usage it incorrectly
    • tctl ---ns "default" workflow reset -w e823235b-b0a3-4c7a-86dd-5c85c5eecc0e --event_id 3 --reason "test"
    • I tried running different variation of the command above specifying different event_id and in each case there seems to be error:
      • Code = InvalidArgument desc = StateRebuilder unable to Rebuild mutable state to event ID: 4, version: 0x401fdb0f70, this event must be at the boundary
        (‘export TEMPORAL_CLI_SHOW_STACKS=1’ to see stack trace
      • Or it would go through and I would get The system failed this Workflow Task. If a reset for this Workflow was requested check the progress on the new Workflow, otherwise reset this Workflow.

Thanks,
Derek

I would start from making sure that you never have broken workflows in production.