We have an endpoint which can be called to trigger temporal workflows via the WorkflowClient. We have made sure that all requests use the same instance of the WorkflowClient.
We have found that if the endpoint has not received a request for a period of time e.g. overnight, when it is next called it fails with the following error:
Caused by: io.grpc.StatusRuntimeException: INTERNAL: Panic! This is a bug!
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.startWorkflowExecution(WorkflowServiceGrpc.java:2627)
at io.temporal.internal.external.GenericWorkflowClientExternalImpl.lambda$start$0(GenericWorkflowClientExternalImpl.java:88)
at io.temporal.internal.retryer.GrpcSyncRetryer.retry(GrpcSyncRetryer.java:61)
at io.temporal.internal.retryer.GrpcRetryer.retryWithResult(GrpcRetryer.java:51)
at io.temporal.internal.external.GenericWorkflowClientExternalImpl.start(GenericWorkflowClientExternalImpl.java:81)
at io.temporal.internal.client.RootWorkflowClientInvoker.start(RootWorkflowClientInvoker.java:55)
at io.temporal.internal.sync.WorkflowStubImpl.startWithOptions(WorkflowStubImpl.java:113)
... 39 more
Caused by: java.lang.IllegalStateException: nameResolver is not started
at com.google.common.base.Preconditions.checkState(Preconditions.java:502)
at io.grpc.internal.ManagedChannelImpl.shutdownNameResolverAndLoadBalancer(ManagedChannelImpl.java:360)
at io.grpc.internal.ManagedChannelImpl.enterIdleMode(ManagedChannelImpl.java:422)
at io.grpc.internal.ManagedChannelImpl.access$900(ManagedChannelImpl.java:118)
at io.grpc.internal.ManagedChannelImpl$IdleModeTimer.run(ManagedChannelImpl.java:352)
at io.grpc.internal.Rescheduler$ChannelFutureRunnable.run(Rescheduler.java:103)
at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:95)
at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:127)
at io.grpc.internal.Rescheduler$FutureRunnable.run(Rescheduler.java:80)
at io.grpc.netty.shaded.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
at io.grpc.netty.shaded.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
at io.grpc.netty.shaded.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
... 1 more
Based on the stacktrace it looks as though there is something that is entering an idle mode. This issue is non-recoverable and the pod in kubernetes must be restarted to resolve the issue.
Is this a known issue and are there config settings that can be updated to avoid this issue from occuring?