io.grpc.StatusRuntimeException: UNKNOWN: shard closed

Vikas_NS · January 19, 2022, 2:25pm

When ever we run performance test we observer the below error.

Questions

Could you please help us understand the cause of this error.?
What should we scale to fix this?

Unable to start the workflow:\nio.temporal.client.WorkflowServiceException: workflowId=TEST', runId='', workflowType='InstanceWorkflow'}\n\tat io.temporal.internal.sync.WorkflowStubImpl.wrapStartException(WorkflowStubImpl.java:184)\n\tat io.temporal.internal.sync.WorkflowStubImpl.startWithOptions(WorkflowStubImpl.java:120)\n\tat io.temporal.internal.sync.WorkflowStubImpl.start(WorkflowStubImpl.java:138)\n\tat io.temporal.internal.sync.WorkflowInvocationHandler$StartWorkflowInvocationHandler.invoke(WorkflowInvocationHandler.java:242)\n\tat io.temporal.internal.sync.WorkflowInvocationHandler.invoke(WorkflowInvocationHandler.java:178)\n\tat com.sun.proxy.$Proxy176.execute(Unknown Source)\n\tat io.temporal.internal.sync.WorkflowClientInternal.lambda$start$4ed02937$1(WorkflowClientInternal.java:308)\n\tat io.temporal.internal.sync.WorkflowClientInternal.start(WorkflowClientInternal.java:256)\n\tat io.temporal.internal.sync.WorkflowClientInternal.start(WorkflowClientInternal.java:299)\n\tat io.temporal.internal.sync.WorkflowClientInternal.start(WorkflowClientInternal.java:308)\n\tat io.temporal.client.WorkflowClient.start(WorkflowClient.java:382)\n\tat com..coordinator.workflow.IAPVWorkflowExecutor.executeWorkflow(IAPVWorkflowExecutor.java:38)\n\tat com.coordinator.service.workflow.WorkflowStartServiceImpl.start(WorkflowStartServiceImpl.java:30)\n\tat com..ECWorkflowServiceGrpc$MethodHandlers.invoke(ECWorkflowServiceGrpc.java:217)\n\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182)\n\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\n\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\n\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\n\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\n\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:340)\n\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866)\n\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\n\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\nCaused by: **io.grpc.StatusRuntimeException: UNKNOWN: shard closed** \n\tat io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)\n\tat io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)\n\tat io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)\n\tat io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.startWorkflowExecution(WorkflowServiceGrpc.java:2631)\n\tat io.temporal.internal.external.GenericWorkflowClientExternalImpl.lambda$start$0(GenericWorkflowClientExternalImpl.java:88)\n\tat io.temporal.internal.retryer.GrpcSyncRetryer.retry(GrpcSyncRetryer.java:61)\n\tat io.temporal.internal.retryer.GrpcRetryer.retryWithResult(GrpcRetryer.java:51)\n\tat io.temporal.internal.external.GenericWorkflowClientExternalImpl.start(GenericWorkflowClientExternalImpl.java:81)\n\tat io.temporal.internal.client.RootWorkflowClientInvoker.start(RootWorkflowClientInvoker.java:55)\n\tat io.temporal.internal.sync.WorkflowStubImpl.startWithOptions(WorkflowStubImpl.java:113)\n\t... 24 common frames omitted","Kubernetes.namespace":"test” timestamp":"2022-01-14T21:49:32.546Z","@version":"1","s_sourcetype":"bifrost"}

Vikas_NS · January 20, 2022, 1:42pm

@tihomir @maxim

Any help here?

tihomir · January 20, 2022, 3:49pm

This error is returned to the client if request reaches history node when that node is shutting down (during deployment/redeployment, for instance). This error should not be exposed to the client, and the request should be just retried.

What server version are you using? I believe this was fixed in 1.14.0 via this commit.

Vikas_NS · January 21, 2022, 2:38pm

We are on 1.12.0

Makes sense, we have seen our History Node being recyled frequently during load testing.

Is the intent of retrying is to route the request to a live history node?

tihomir · January 21, 2022, 8:14pm

Yes, or to retry until deployed/redeployed one is up and running.

Topic		Replies	Views
GRPC connection error with temporal when i try running jar while starting Temporal worker Community Support java-sdk	3	460	May 29, 2024
Unable to run temporal workflow from jar Community Support java-sdk	16	3241	February 20, 2025
Error testing with TestWorkflowEnvironment in java Community Support	5	1322	November 24, 2020
Randomly Start workflow operation fails with internal error Community Support	1	29	December 2, 2024
Issue running workflow - always remaining in "RUNNING" status Community Support java-sdk	1	518	September 28, 2021

io.grpc.StatusRuntimeException: UNKNOWN: shard closed

Related topics