I have been experiencing problem with the following call:
WorkflowClient.start(workflow::myworkflowfunction, myArgument)
I am using the java SDK (kotlin spring boot application, with grpc server serving requests).
The temporalio version is 1.20.3.
My temporal java SDK version is 1.22.3.
I am also using io.temporal:temporal-spring-boot-autoconfigure-alpha:1.22.3.
My application structured like:
GRPC request → WorkflowClient.start(workflow::myworkflowfunction, myArgument) → return workflowID
The issue is quite flaky and not occur consistently. Sometimes when I start the application, the grpc server can handle requests normally. But sometimes when I start the application, the first grpc request will get stuck and wait indefinitely.
I use debugger and pause the execution, and find out that the thread stuck at the following point:
"grpc-nio-worker-ELG-1-3 @coroutine#1@12911" daemon prio=5 tid=0x35 nid=NA waiting
java.lang.Thread.State: WAITING
at jdk.internal.misc.Unsafe.park(Unsafe.java:-1)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
at io.grpc.stub.ClientCalls$ThreadlessExecutor.waitAndDrain(ClientCalls.java:752)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:161)
at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.startWorkflowExecution(WorkflowServiceGrpc.java:3873)
at io.temporal.internal.client.external.GenericWorkflowClientImpl.lambda$start$1(GenericWorkflowClientImpl.java:74)
at io.temporal.internal.client.external.GenericWorkflowClientImpl$$Lambda$1433.267659044.apply(Unknown Source:-1)
at io.temporal.internal.retryer.GrpcSyncRetryer.retry(GrpcSyncRetryer.java:69)
at io.temporal.internal.retryer.GrpcRetryer.retryWithResult(GrpcRetryer.java:60)
at io.temporal.internal.client.external.GenericWorkflowClientImpl.start(GenericWorkflowClientImpl.java:69)
at io.temporal.internal.client.RootWorkflowClientInvoker.start(RootWorkflowClientInvoker.java:101)
at io.temporal.client.WorkflowStubImpl.startWithOptions(WorkflowStubImpl.java:104)
at io.temporal.client.WorkflowStubImpl.start(WorkflowStubImpl.java:129)
at io.temporal.client.WorkflowInvocationHandler$StartWorkflowInvocationHandler.invoke(WorkflowInvocationHandler.java:239)
at io.temporal.client.WorkflowInvocationHandler.invoke(WorkflowInvocationHandler.java:175)
at com.sun.proxy.$Proxy146.recordPrincipalPosted(Unknown Source:-1)
at com.airwallex.lending.transactionengine.service.workflow.recordPrincipalPosted.RecordPrincipalPostedWorkflowStarter$$Lambda$1429.793875220.apply(Unknown Source:-1)
at io.temporal.client.WorkflowClientInternalImpl.lambda$start$431e99ec$1(WorkflowClientInternalImpl.java:367)
at io.temporal.client.WorkflowClientInternalImpl$$Lambda$1432.1687751410.apply(Unknown Source:-1)
at io.temporal.client.WorkflowClientInternalImpl.start(WorkflowClientInternalImpl.java:359)
at io.temporal.client.WorkflowClientInternalImpl.start(WorkflowClientInternalImpl.java:367)
at io.temporal.client.WorkflowClient.start(WorkflowClient.java:369)
at com.airwallex.lending.transactionengine.service.workflow.recordPrincipalPosted.RecordPrincipalPostedWorkflowStarter.registerWorkflow(RecordPrincipalPostedWorkflowStarter.kt:77)
at com.airwallex.lending.transactionengine.rpc.LendingTransactionRpc.recordPrincipalPosted$suspendImpl(LendingTransactionRpc.kt:49)
at com.airwallex.lending.transactionengine.rpc.LendingTransactionRpc.recordPrincipalPosted(Unknown Source:-1)
at com.airwallex.lending.transactionengine.grpc.generated.LendingTransactionServiceRpcServerImpl.recordPrincipalPosted(LendingTransactionServiceRpcServerImpl.kt:24)
at com.airwallex.lending.transactionengine.grpc.generated.LendingTransactionServiceGrpcKt$LendingTransactionServiceCoroutineImplBase$bindService$3.invoke(LendingTransactionServiceOuterClassGrpcKt.kt:245)
at com.airwallex.lending.transactionengine.grpc.generated.LendingTransactionServiceGrpcKt$LendingTransactionServiceCoroutineImplBase$bindService$3.invoke(LendingTransactionServiceOuterClassGrpcKt.kt:245)
at io.grpc.kotlin.ServerCalls$unaryServerMethodDefinition$2$invoke$$inlined$map$1$2.emit(ServerCalls.kt:80)
at kotlinx.coroutines.flow.internal.SafeCollectorKt$emitFun$1.invoke(SafeCollector.kt:15)
at kotlinx.coroutines.flow.internal.SafeCollectorKt$emitFun$1.invoke(SafeCollector.kt:15)
at kotlinx.coroutines.flow.internal.SafeCollector.emit(SafeCollector.kt:87)
at kotlinx.coroutines.flow.internal.SafeCollector.emit(SafeCollector.kt:66)
at io.grpc.kotlin.HelpersKt$singleOrStatusFlow$1$1.emit(Helpers.kt:65)
at kotlinx.coroutines.flow.internal.SafeCollectorKt$emitFun$1.invoke(SafeCollector.kt:15)
at kotlinx.coroutines.flow.internal.SafeCollectorKt$emitFun$1.invoke(SafeCollector.kt:15)
at kotlinx.coroutines.flow.internal.SafeCollector.emit(SafeCollector.kt:87)
at kotlinx.coroutines.flow.internal.SafeCollector.emit(SafeCollector.kt:66)
at io.grpc.kotlin.ServerCalls$serverCallListener$requests$1.invokeSuspend(ServerCalls.kt:228)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTaskKt.resume(DispatchedContinuation.kt:256)
at kotlinx.coroutines.DispatchedTaskKt.resumeUnconfined(DispatchedTask.kt:190)
at kotlinx.coroutines.DispatchedTaskKt.dispatch(DispatchedTask.kt:161)
at kotlinx.coroutines.CancellableContinuationImpl.dispatchResume(CancellableContinuationImpl.kt:397)
at kotlinx.coroutines.CancellableContinuationImpl.completeResume(CancellableContinuationImpl.kt:513)
at kotlinx.coroutines.channels.AbstractChannel$ReceiveHasNext.completeResumeReceive(AbstractChannel.kt:947)
at kotlinx.coroutines.channels.ArrayChannel.offerInternal(ArrayChannel.kt:83)
at kotlinx.coroutines.channels.AbstractSendChannel.trySend-JP2dKIU(AbstractChannel.kt:155)
at io.grpc.kotlin.ServerCalls$serverCallListener$1.onMessage(ServerCalls.kt:280)
at io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
at com.airwallex.grpc.error.GrpcServerExceptionInterceptor$ExceptionAwareServerCallListener.access$onMessage$s-1054185244(GrpcServerExceptionInterceptor.kt:79)
at com.airwallex.grpc.error.GrpcServerExceptionInterceptor$ExceptionAwareServerCallListener$onMessage$1.invoke(GrpcServerExceptionInterceptor.kt:86)
at com.airwallex.grpc.error.GrpcServerExceptionInterceptor$ExceptionAwareServerCallListener$onMessage$1.invoke(GrpcServerExceptionInterceptor.kt:85)
at com.airwallex.grpc.error.GrpcServerExceptionInterceptor$ExceptionAwareServerCall.closeOnError(GrpcServerExceptionInterceptor.kt:73)
at com.airwallex.grpc.error.GrpcServerExceptionInterceptor$ExceptionAwareServerCallListener.onMessage(GrpcServerExceptionInterceptor.kt:85)
at io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
at io.micrometer.core.instrument.binder.grpc.MetricCollectingServerCallListener.onMessage(MetricCollectingServerCallListener.java:63)
at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:324)
at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:309)
at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:833)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializeReentrantCallsDirectExecutor.execute(SerializeReentrantCallsDirectExecutor.java:49)
at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener.messagesAvailable(ServerImpl.java:841)
at io.grpc.internal.AbstractStream$TransportState.messagesAvailable(AbstractStream.java:183)
at io.grpc.internal.MessageDeframer.processBody(MessageDeframer.java:413)
at io.grpc.internal.MessageDeframer.deliver(MessageDeframer.java:276)
at io.grpc.internal.MessageDeframer.deframe(MessageDeframer.java:178)
at io.grpc.internal.AbstractStream$TransportState.deframe(AbstractStream.java:211)
at io.grpc.internal.AbstractServerStream$TransportState.inboundDataReceived(AbstractServerStream.java:262)
at io.grpc.netty.shaded.io.grpc.netty.NettyServerStream$TransportState.inboundDataReceived(NettyServerStream.java:210)
at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler.onDataRead(NettyServerHandler.java:517)
at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler.access$900(NettyServerHandler.java:111)
at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler$FrameListener.onDataRead(NettyServerHandler.java:835)
at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onDataRead(DefaultHttp2ConnectionDecoder.java:307)
at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$PrefaceFrameListener.onDataRead(DefaultHttp2ConnectionDecoder.java:691)
at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onDataRead(Http2InboundFrameLogger.java:48)
at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readDataFrame(DefaultHttp2FrameReader.java:415)
at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:250)
at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:159)
at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41)
at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:173)
at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:393)
at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$PrefaceDecoder.decode(Http2ConnectionHandler.java:250)
at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:453)
at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529)
at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468)
at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.grpc.netty.shaded.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:829)
Some other observation:
- when it gets stuck, my other grpc endpoints also not responding and stuck
- No logs found on my temporal server. And the status is SERVING.
- The timeout is set base on the debugger (see screenshot), however, the call just stuck forever, no more grpc request (even health check) can be processed. But tomcat can still process request (I can getHealth by hitting the tomcat endpoint).
How can I further debug?
Can I somehow interrupt the execute without terminating the entire spring application? (like a timeout for this WorkflowClient.start() call?)
Thanks!