Not receiving signals to the workflow

Signal to the workflow is failing with the below error:

java.util.concurrent.CancellationException: The gRPC request was cancelled
	at i.t.i.retryer.GrpcRetryerUtils.createFinalExceptionIfNotRetryable(GrpcRetryerUtils.java:59)
	at i.t.i.retryer.GrpcSyncRetryer.retry(GrpcSyncRetryer.java:77)
	at i.t.internal.retryer.GrpcRetryer.retryWithResult(GrpcRetryer.java:60)
	at i.t.internal.retryer.GrpcRetryer.retry(GrpcRetryer.java:50)
	at i.t.i.c.e.GenericWorkflowClientImpl.signal(GenericWorkflowClientImpl.java:85)
	at i.t.i.c.RootWorkflowClientInvoker.signal(RootWorkflowClientInvoker.java:139)
	at i.t.client.WorkflowStubImpl.signal(WorkflowStubImpl.java:91)

Can you provide any details on how to fix this? Or if something is missing in our signal implementation. TIA!

Team,

Anything that you can help me herewith?

@Sowmya
sorry for the late reply

Could you describe your environment or provide a reproducer?

Does this only happen with signals? is the client for signals and start workflow executions the same?

Thanks,

Unfortunately, this is not reproducible in local environment. To give some context on the design, we have one worker with one task queue and is registered with two workflows. Among these two, one is used by Schedule and other’s a normal workflow. Schedule one is not live yet, but the other workflow is and we are facing issue with this.

It’s happening only with signals. And yes, the clients for signal and start workflow are same.

From my analysis, I realised this has happened multiple times in the last 30 days but didn’t happen for every workflow.

Below code shows how we send signals:

This happens from outside the workflow

WorkflowInterfaceClass wf = client.newWorkflowStub(workflowInterfaceClass.class, workflowId);
wf.signalMethod();

And the workflow interface class:

@WorkflowInterface
public interface WorkflowInterfaceClass {
// start method & query methods

@signalMethod
void signalMethod();
}

Workflow implementation class:

public class testImpl implements WorkflowInterfaceClass{
private boolean var = false;
// Overides other methods
 @Override
    public void signalMethod() {
       var = true;
    }
}

Is there something that we are missing here? We seem to be following Java SDK developer's guide - Features | Temporal Documentation, and I see here that workflow is started before signal’s called. Is this needed all the time?

@Sowmya

I don’t see anything wrong with the code you have shared,

if the workflow is not running, the error would be different, “WorkflowNotFoundException”

I see here that workflow is started before signal’s called. Is this needed all the time?

Yes, if you are not sure whether your workflow is running, you can use signalWithStart instead,
https://www.javadoc.io/doc/io.temporal/temporal-sdk/latest/io/temporal/client/WorkflowStub.html

Is the error you provided the full error log?

Here’s the full error log:

java.util.concurrent.CancellationException: The gRPC request was cancelled
	at i.t.i.retryer.GrpcRetryerUtils.createFinalExceptionIfNotRetryable(GrpcRetryerUtils.java:59)
	at i.t.i.retryer.GrpcSyncRetryer.retry(GrpcSyncRetryer.java:77)
	at i.t.internal.retryer.GrpcRetryer.retryWithResult(GrpcRetryer.java:60)
	at i.t.internal.retryer.GrpcRetryer.retry(GrpcRetryer.java:50)
	at i.t.i.c.e.GenericWorkflowClientImpl.signal(GenericWorkflowClientImpl.java:85)
	at i.t.i.c.RootWorkflowClientInvoker.signal(RootWorkflowClientInvoker.java:139)
	at i.t.client.WorkflowStubImpl.signal(WorkflowStubImpl.java:91)
	... 54 common frames omitted
Wrapped by: io.temporal.client.WorkflowServiceException: workflowId='scheduled-release-workflow-2023-06-28T21:07:26.414234-4096-67829980', runId='', workflowType='ScheduledReleaseWorkflow'}
	at i.t.client.WorkflowStubImpl.signal(WorkflowStubImpl.java:96)
	at i.t.c.WorkflowInvocationHandler$SyncWorkflowInvocationHandler.signalWorkflow(WorkflowInvocationHandler.java:294)
	at i.t.c.WorkflowInvocationHandler$SyncWorkflowInvocationHandler.invoke(WorkflowInvocationHandler.java:271)
	at i.t.c.WorkflowInvocationHandler.invoke(WorkflowInvocationHandler.java:175)
	at com.sun.proxy.$Proxy307.releaseRequestCompleted(Unknown Source)
	at c.s.g.s.ScheduledReleasesWorker.notifyReleaseRequestCompleted(ScheduledReleasesWorker.java:254)
	at c.s.g.s.ScheduledReleasesWorker$$FastClassBySpringCGLIB$$9d196f2d.invoke(<generated>)
	at o.s.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
	at o.s.aop.framework.CglibAopProxy.invokeMethod(CglibAopProxy.java:386)
	at o.s.aop.framework.CglibAopProxy.access$000(CglibAopProxy.java:85)
	at o.s.a.f.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:704)
	at c.s.g.s.ScheduledReleasesWorker$$EnhancerBySpringCGLIB$$9a4813a.notifyReleaseRequestCompleted(<generated>)
	at c.s.g.s.ScheduledReleasesWorker$$FastClassBySpringCGLIB$$9d196f2d.invoke(<generated>)
	at o.s.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
	at o.s.aop.framework.CglibAopProxy.invokeMethod(CglibAopProxy.java:386)
	at o.s.aop.framework.CglibAopProxy.access$000(CglibAopProxy.java:85)
	at o.s.a.f.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:704)
	at c.s.g.s.ScheduledReleasesWorker$$EnhancerBySpringCGLIB$$35f06758.notifyReleaseRequestCompleted(<generated>)
	at c.s.g.r.g.i.ReleaseServiceGrpcImpl.notifyReleaseRequestCompleted(ReleaseServiceGrpcImpl.java:125)
	at c.s.g.r.g.i.ReleaseServiceGrpcImpl.setReleaseRequestStatus(ReleaseServiceGrpcImpl.java:108)
	at c.s.g.r.g.i.ReleaseServiceGrpcImpl$$FastClassBySpringCGLIB$$4edcfd60.invoke(<generated>)
	at o.s.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
	at o.s.a.f.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:793)
	at o.s.a.f.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
	at o.s.a.f.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763)
	at o.s.a.a.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:89)
	at c.s.gdc.metrics.MetricsAspects.profile(MetricsAspects.java:82)
	at c.s.gdc.metrics.MetricsAspects.profileGrpc(MetricsAspects.java:64)
	at j.i.r.GeneratedMethodAccessor430.invoke(Unknown Source)
	at j.b.i.r.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at j.base/java.lang.reflect.Method.invoke(Method.java:566)
	at o.s.a.a.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:634)
	at o.s.a.a.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:624)
	at o.s.a.a.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:72)
	at o.s.a.f.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
	at o.s.a.f.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763)
	at o.s.a.i.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97)
	at o.s.a.f.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
	at o.s.a.f.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763)
	at o.s.a.f.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:708)
	at c.s.g.r.g.i.ReleaseServiceGrpcImpl$$EnhancerBySpringCGLIB$$e0ea9a11.setReleaseRequestStatus(<generated>)
	at c.s.g.r.g.ReleaseGrpc$MethodHandlers.invoke(ReleaseGrpc.java:760)
	at i.g.s.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182)
	at b.g.TracingServerInterceptor$TracingServerCallListener.onHalfClose(TracingServerInterceptor.java:153)
	at i.g.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
	at i.g.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
	at i.g.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
	at i.g.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)
	at i.g.i.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:355)
	at i.g.i.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:867)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at i.g.internal.SerializingExecutor.run(SerializingExecutor.java:133)
	at j.b.u.c.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at j.b.u.c.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

A few questions:

  • Which type of Temporal Server are you using?
  • If Self Hosted or Temporal CLI, which version?
  • Is there any gRPC proxy between your client and the server?

This is definitely not an issue on the Workflow side.

The SignalWorkflow API call made by the client completes once the server accepts that signal (ie. it has atomically checked that the Workflow ID does exists and is still running, and has either written the signal to that Workflow’s history, or has buffered it if a Workflow Task is already in progress). The client does not wait for the signal to be delivered to a Workflow Worker, so any exception thrown on the Workflow side will not affect the client call to SignalWorkflow.

Generally speaking, the gRPC’s CANCELLED error code indicates that the client itself has cancelled the gRPC call. It is obviously not impossible that some gRPC server may also send that error code in other situations, but I’m not aware of any such cases on Temporal Server. Is your client going through a gRPC proxy?

It really looks like, for some reason, something on the client side is cancelling the SignalWorkflow gRPC call. This could notably happen if the current thread gets interrupted.

Your full stack trace shows that the SignalWorkflow API call is itself executed in the context of another gRPC call (on what appears to be a distinct, server-side gRPC handler). Could it be that this parent gRPC call exceeded its deadline? I’m not sure about this, but I doubt that context deadline exceeded errors propagate from one grpc connection context to another one; if I’m right on that, then it would make sense that the parent gRPC connection interrupts the current thread when the allocated time expires, which cause the child gRPC request to get cancelled.

@jwatkins , Apologies for the delayed response and thanks for your update, it helped.

You are right, the signal execution happens inside another grpc call. And I observed that there’s another exception raised for the parent gRPC which says upstream request timed out. This may have been the reason for the cancellation exception we are seeing.