Exceptions in Client Poller

I am noticing couple of transient exceptions for (NOT_FOUND and INVALID_ARGUMENT) in my java-sdk poller on my client. My client is running on temporal-sdk:1.0.3

Pasting the stack trace for both

io.temporal.internal.worker.PollerOptions: uncaught exception java.lang.RuntimeException: Failure processing activity task. WorkflowId=8f776722-eb12-4b78-a3ce-6c14ade72e66, RunId=3fcfab37-7139-4950-a2d0-7da1cdfc595f, ActivityType=StartIngestion, ActivityId=b7bd742b-fbb5-3ddc-8846-b9e791e07966
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.wrapFailure(ActivityWorker.java:254) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.wrapFailure(ActivityWorker.java:154) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:79) ~[product-catalog-service-1.0.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: io.grpc.StatusRuntimeException: NOT_FOUND: invalid activityID or activity already timed out or invoking workflow is completed
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262) ~[product-catalog-service-1.0.jar:?]
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243) ~[product-catalog-service-1.0.jar:?]
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.respondActivityTaskCompleted(WorkflowServiceGrpc.java:2746) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.lambda$sendReply$0(ActivityWorker.java:278) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.common.GrpcRetryer.lambda$retry$0(GrpcRetryer.java:109) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.common.GrpcRetryer.retryWithResult(GrpcRetryer.java:127) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.common.GrpcRetryer.retry(GrpcRetryer.java:106) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.sendReply(ActivityWorker.java:272) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:198) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:154) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:73) ~[product-catalog-service-1.0.jar:?]
	... 3 more

Second Exception

io.temporal.internal.worker.PollerOptions: uncaught exception java.lang.RuntimeException: Failure processing workflow task. WorkflowId=a14cd152-ae1a-4b35-9e64-e03049db1603, RunId=6be12137-0016-4b1c-a14b-ba66a2ad01b2
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.wrapFailure(WorkflowWorker.java:337) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.wrapFailure(WorkflowWorker.java:275) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:79) ~[product-catalog-service-1.0.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: UnhandledCommand
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262) ~[product-catalog-service-1.0.jar:?]
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243) ~[product-catalog-service-1.0.jar:?]
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.respondWorkflowTaskCompleted(WorkflowServiceGrpc.java:2673) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.lambda$sendReply$0(WorkflowWorker.java:369) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.common.GrpcRetryer.lambda$retry$0(GrpcRetryer.java:109) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.common.GrpcRetryer.retryWithResult(GrpcRetryer.java:127) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.common.GrpcRetryer.retry(GrpcRetryer.java:106) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.sendReply(WorkflowWorker.java:362) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:313) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:275) ~[product-catalog-service-1.0.jar:?]
	at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:73) ~[product-catalog-service-1.0.jar:?]
	... 3 more

The first exception happens when an activity reports completion after it timed out or workflow already closed. If it happens frequently It usually indicates that activity timeout is too small.

The second exception is benign and should be changed to the DEBUG level. It means that a new event was received by the service while the workflow task tries to complete the workflow. To handle this race condition without data loss the workflow task completion fails and the task is retried. See this discussion that explains how this mechanism is used to ensure that signals are not lost in the presence of such race conditions.

1 Like

Hi,
I am also facing similar issue as mentioned in first exception.

I can able to start the workflow using Java SDK Client, the workflow request visible in temporal-web. It show the Running status and getting Terminated after couple of minutes.
How to resolve this issue?

14:29:53.103 [grpc-nio-worker-ELG-1-4] DEBUG io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler - [id: 0x6daaa4e9, L:/172.31.90.150:54421 - R:temporal-server.net/169.23.5.210:443] INBOUND HEADERS: streamId=21 headers=GrpcHttp2ResponseHeaders[:status: 200, content-type: application/grpc, grpc-status: 14, grpc-message: upstream connect error or disconnect/reset before headers. reset reason: connection failure, date: Fri, 13 Aug 2021 08:59:52 GMT, server: envoy] padding=0 endStream=true
14:29:53.104 [Workflow Executor taskQueue=“local”, namespace=“nohaosb-dev”: 2] WARN io.temporal.internal.common.GrpcRetryer - Retrying after failure
io.grpc.StatusRuntimeException: UNAVAILABLE: upstream connect error or disconnect/reset before headers. reset reason: connection failure
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.respondWorkflowTaskCompleted(WorkflowServiceGrpc.java:2654)
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.lambda$sendReply$0(WorkflowWorker.java:383)
at io.temporal.internal.common.GrpcRetryer.lambda$retry$0(GrpcRetryer.java:79)
at io.temporal.internal.common.GrpcRetryer.retryWithResult(GrpcRetryer.java:97)
at io.temporal.internal.common.GrpcRetryer.retry(GrpcRetryer.java:76)
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.sendReply(WorkflowWorker.java:376)
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:323)
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:279)
at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:73)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
14:29:53.104 [grpc-nio-worker-ELG-1-4] DEBUG io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler - [id: 0x6daaa4e9, L:/172.31.90.150:54421 - R:temporal-server.net/169.23.5.210:443] OUTBOUND RST_STREAM: streamId=21 errorCode=8
14:29:53.126 [grpc-nio-worker-ELG-1-3] DEBUG io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler - [id: 0xd4536e47, L:/172.31.90.150:54420 - R:temporal-server.net/169.23.5.52:443] OUTBOUND HEADERS: streamId=47 headers=GrpcHttp2OutboundHeaders[:authority: temporal-server.net, :path: /temporal.api.workflowservice.v1.WorkflowService/RespondWorkflowTaskCompleted, :method: POST, :scheme: https, content-type: application/grpc, te: trailers, user-agent: grpc-java-netty/1.36.1, client-version: 1.0.7, supported-server-versions: >=0.31.0 <2.0.0, client-name: temporal-java, grpc-accept-encoding: gzip, grpc-timeout: 9999547u] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false
14:29:53.126 [grpc-nio-worker-ELG-1-3] DEBUG io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler - [id: 0xd4536e47, L:/172.31.90.150:54420 - R:temporal-server.net/169.23.5.52:443] OUTBOUND DATA: streamId=47 padding=0 endStream=true length=389 bytes=00000001800a6e0a2436643338623161612d303938612d343236362d396430622d393766376261386534386665121c467269204175672031332031343a32393a…
14:29:53.200 [grpc-nio-worker-ELG-1-4] DEBUG io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler - [id: 0x6daaa4e9, L:/172.31.90.150:54421 - R:temporal-server.net/169.23.5.210:443] OUTBOUND HEADERS: streamId=23 headers=GrpcHttp2OutboundHeaders[:authority: temporal-server.net, :path: /temporal.api.workflowservice.v1.WorkflowService/PollWorkflowTaskQueue, :method: POST, :scheme: https, content-type: application/grpc, te: trailers, user-agent: grpc-java-netty/1.36.1, client-version: 1.0.7, supported-server-versions: >=0.31.0 <2.0.0, client-name: temporal-java, grpc-accept-encoding: gzip, grpc-timeout: 69999899u] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false
14:29:53.200 [grpc-nio-worker-ELG-1-4] DEBUG io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler - [id: 0x6daaa4e9, L:/172.31.90.150:54421 - R:temporal-server.net/169.23.5.210:443] OUTBOUND DATA: streamId=23 padding=0 endStream=true length=44 bytes=00000000270a0b6e6f68616f73622d64657612070a056c6f63616c1a0f323134393640575347313932303036
14:29:53.718 [grpc-nio-worker-ELG-1-3] DEBUG io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler - [id: 0xd4536e47, L:/172.31.90.150:54420 - R:temporal-server.net/169.23.5.52:443] INBOUND HEADERS: streamId=47 headers=GrpcHttp2ResponseHeaders[:status: 200, content-type: application/grpc, grpc-status: 5, grpc-message: Workflow task not found., grpc-status-details-bin: CAUSGFdvcmtmbG93IHRhc2sgbm90IGZvdW5kLhpCCkB0eXBlLmdvb2dsZWFwaXMuY29tL3RlbXBvcmFsLmFwaS5lcnJvcmRldGFpbHMudjEuTm90Rm91bmRGYWlsdXJl, x-envoy-upstream-service-time: 341, date: Fri, 13 Aug 2021 08:59:53 GMT, server: envoy] padding=0 endStream=true
14:29:53.719 [grpc-nio-worker-ELG-1-3] DEBUG io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler - [id: 0xd4536e47, L:/172.31.90.150:54420 - R:temporal-server.net/169.23.5.52:443] OUTBOUND RST_STREAM: streamId=47 errorCode=8
14:29:53.719 [Workflow Executor taskQueue=“local”, namespace=“nohaosb-dev”: 2] ERROR io.temporal.internal.worker.PollerOptions - uncaught exception
java.lang.RuntimeException: Failure processing workflow task. WorkflowId=Fri Aug 13 14:29:31 IST 2021, RunId=ceb15a42-4c21-4a65-8943-0c001a05ba33, Attempt=2
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.wrapFailure(WorkflowWorker.java:349)
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.wrapFailure(WorkflowWorker.java:279)
at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:79)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: io.grpc.StatusRuntimeException: NOT_FOUND: Workflow task not found.
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.respondWorkflowTaskCompleted(WorkflowServiceGrpc.java:2654)
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.lambda$sendReply$0(WorkflowWorker.java:383)
at io.temporal.internal.common.GrpcRetryer.lambda$retry$0(GrpcRetryer.java:79)
at io.temporal.internal.common.GrpcRetryer.retryWithResult(GrpcRetryer.java:97)
at io.temporal.internal.common.GrpcRetryer.retry(GrpcRetryer.java:76)
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.sendReply(WorkflowWorker.java:376)
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:323)
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:279)
at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:73)
… 3 common frames omitted