I’m currently working with Temporal workflows using the Java SDK and Temporal version 1.17.0. I’ve encountered a StatusRuntimeException
with the message NOT_FOUND: Workflow task not found
. This exception occurs during the process of reporting workflow progress to the server. Here is a snippet of the stack trace for context:
[worker] [Workflow Executor taskQueue="task-queue", namespace="namespace"] [] i.t.internal.worker.WorkflowWorker: Failure while reporting workflow progress to the server. If seen continuously the workflow might be stuck. WorkflowId=workflow-id, RunId=run-id, startedEventId=34
io.grpc.StatusRuntimeException: NOT_FOUND: Workflow task not found.
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:271)
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:252)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:165)
at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.respondWorkflowTaskCompleted(WorkflowServiceGrpc.java:3764)
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.lambda$sendTaskCompleted$0(WorkflowWorker.java:369)
at io.temporal.internal.retryer.GrpcSyncRetryer.retry(GrpcSyncRetryer.java:67)
at io.temporal.internal.retryer.GrpcRetryer.retryWithResult(GrpcRetryer.java:60)
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.sendTaskCompleted(WorkflowWorker.java:364)
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:259)
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:188)
at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:93)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:842)
I would like to log this exception as a metric for monitoring purposes, but I want to ensure that it doesn’t interrupt the workflow execution process.
What is the best practice for catching and handling this exception in a way that allows me to log it as a metric without affecting the workflow’s execution?
Any insights or examples from your experience would be greatly appreciated!
Thank you!