Exceptions in Client Poller

I am noticing couple of transient exceptions for (NOT_FOUND and INVALID_ARGUMENT) in my java-sdk poller on my client. My client is running on temporal-sdk:1.0.3

Pasting the stack trace for both

io.temporal.internal.worker.PollerOptions: uncaught exception java.lang.RuntimeException: Failure processing activity task. WorkflowId=8f776722-eb12-4b78-a3ce-6c14ade72e66, RunId=3fcfab37-7139-4950-a2d0-7da1cdfc595f, ActivityType=StartIngestion, ActivityId=b7bd742b-fbb5-3ddc-8846-b9e791e07966
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.wrapFailure(ActivityWorker.java:254) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.wrapFailure(ActivityWorker.java:154) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:79) ~[product-catalog-service-1.0.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: io.grpc.StatusRuntimeException: NOT_FOUND: invalid activityID or activity already timed out or invoking workflow is completed
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262) ~[product-catalog-service-1.0.jar:?]
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243) ~[product-catalog-service-1.0.jar:?]
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.respondActivityTaskCompleted(WorkflowServiceGrpc.java:2746) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.lambda$sendReply$0(ActivityWorker.java:278) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.common.GrpcRetryer.lambda$retry$0(GrpcRetryer.java:109) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.common.GrpcRetryer.retryWithResult(GrpcRetryer.java:127) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.common.GrpcRetryer.retry(GrpcRetryer.java:106) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.sendReply(ActivityWorker.java:272) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:198) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:154) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:73) ~[product-catalog-service-1.0.jar:?]
	... 3 more

Second Exception

io.temporal.internal.worker.PollerOptions: uncaught exception java.lang.RuntimeException: Failure processing workflow task. WorkflowId=a14cd152-ae1a-4b35-9e64-e03049db1603, RunId=6be12137-0016-4b1c-a14b-ba66a2ad01b2
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.wrapFailure(WorkflowWorker.java:337) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.wrapFailure(WorkflowWorker.java:275) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:79) ~[product-catalog-service-1.0.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: UnhandledCommand
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262) ~[product-catalog-service-1.0.jar:?]
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243) ~[product-catalog-service-1.0.jar:?]
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.respondWorkflowTaskCompleted(WorkflowServiceGrpc.java:2673) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.lambda$sendReply$0(WorkflowWorker.java:369) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.common.GrpcRetryer.lambda$retry$0(GrpcRetryer.java:109) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.common.GrpcRetryer.retryWithResult(GrpcRetryer.java:127) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.common.GrpcRetryer.retry(GrpcRetryer.java:106) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.sendReply(WorkflowWorker.java:362) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:313) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:275) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:73) ~[product-catalog-service-1.0.jar:?]
	... 3 more

The first exception happens when an activity reports completion after it timed out or workflow already closed. If it happens frequently It usually indicates that activity timeout is too small.

The second exception is benign and should be changed to the DEBUG level. It means that a new event was received by the service while the workflow task tries to complete the workflow. To handle this race condition without data loss the workflow task completion fails and the task is retried. See this discussion that explains how this mechanism is used to ensure that signals are not lost in the presence of such race conditions.

1 Like