Failure to start workflow if WorkflowClient left idle

cg1972 · June 23, 2022, 11:34pm

We have an endpoint which can be called to trigger temporal workflows via the WorkflowClient. We have made sure that all requests use the same instance of the WorkflowClient.

We have found that if the endpoint has not received a request for a period of time e.g. overnight, when it is next called it fails with the following error:

Caused by: io.grpc.StatusRuntimeException: INTERNAL: Panic! This is a bug!
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
	at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.startWorkflowExecution(WorkflowServiceGrpc.java:2627)
	at io.temporal.internal.external.GenericWorkflowClientExternalImpl.lambda$start$0(GenericWorkflowClientExternalImpl.java:88)
	at io.temporal.internal.retryer.GrpcSyncRetryer.retry(GrpcSyncRetryer.java:61)
	at io.temporal.internal.retryer.GrpcRetryer.retryWithResult(GrpcRetryer.java:51)
	at io.temporal.internal.external.GenericWorkflowClientExternalImpl.start(GenericWorkflowClientExternalImpl.java:81)
	at io.temporal.internal.client.RootWorkflowClientInvoker.start(RootWorkflowClientInvoker.java:55)
	at io.temporal.internal.sync.WorkflowStubImpl.startWithOptions(WorkflowStubImpl.java:113)
	... 39 more
Caused by: java.lang.IllegalStateException: nameResolver is not started
	at com.google.common.base.Preconditions.checkState(Preconditions.java:502)
	at io.grpc.internal.ManagedChannelImpl.shutdownNameResolverAndLoadBalancer(ManagedChannelImpl.java:360)
	at io.grpc.internal.ManagedChannelImpl.enterIdleMode(ManagedChannelImpl.java:422)
	at io.grpc.internal.ManagedChannelImpl.access$900(ManagedChannelImpl.java:118)
	at io.grpc.internal.ManagedChannelImpl$IdleModeTimer.run(ManagedChannelImpl.java:352)
	at io.grpc.internal.Rescheduler$ChannelFutureRunnable.run(Rescheduler.java:103)
	at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:95)
	at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:127)
	at io.grpc.internal.Rescheduler$FutureRunnable.run(Rescheduler.java:80)
	at io.grpc.netty.shaded.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
	at io.grpc.netty.shaded.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
	at io.grpc.netty.shaded.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
	at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
	at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
	at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	... 1 more

Based on the stacktrace it looks as though there is something that is entering an idle mode. This issue is non-recoverable and the pod in kubernetes must be restarted to resolve the issue.

Is this a known issue and are there config settings that can be updated to avoid this issue from occuring?

tihomir · June 24, 2022, 12:35am

There was an issue reported for this is java sdk here.
Original cause is grpc-java issue here.

What is the Java SDK version you are using?
The temp fix was included in 1.6.0.

cg1972 · June 24, 2022, 12:46am

The jdk details are:

openjdk version "1.8.0_332"
OpenJDK Runtime Environment (build 1.8.0_332-b09)
OpenJDK 64-Bit Server VM (build 25.332-b09, mixed mode)

tihomir · June 24, 2022, 12:47am

Thanks, I was asking for the Temporal Java SDK version you are using.

cg1972 · June 24, 2022, 12:50am

Sorry @tihomir , I realized when I re-read your response. We are using 1.5.0 of the temporal-sdk. Sounds like we just need to upgrade this.

Topic		Replies	Views
Randomly Start workflow operation fails with internal error Community Support	1	29	December 2, 2024
Issue running workflow - always remaining in "RUNNING" status Community Support java-sdk	1	518	September 28, 2021
Sometime WorkflowClient.start() is not registering the workflow for execution Community Support	1	152	May 23, 2024
Exceptions from parallel activities causing Workflow restart infinitely Community Support java-sdk	14	1194	February 9, 2022
Handling Internal Temporal Exceptions Without Disrupting Workflow Execution Community Support java-sdk , logging , exception	2	31	April 1, 2025

Failure to start workflow if WorkflowClient left idle

Related topics