Workflow Cancellation not being applied

Greetings. I’ve been experimenting, trying to figure out the interaction between (i) cancellation-scopes and (ii) external cancellation of the workflow (i.e. with untypedWorkflowStub(…).cancel()). I’ve come across something odd. Following is with server 1.3.2 and Java SDK 1.0.3.

So have a parent-workflow that creates a child-workflow. The child workflow is coded to not complete (i.e. await(() -> false), to wait for a cancellation – just so I can see what happens.

Once a the parent/child workflows have been instantiated, I externally issue a cancel, based on the workflowId. If I look at the history of the parent-workflow instance, you see the following sequence of events:

  • WorkflowExecutionCancelRequested
  • WorkflowTaskScheduled
  • WorkflowTaskStarted
  • WorkflowTaskCompleted

Except . . . that the parent-workflow entrypoint is never called (from a debugger), which means the workflow is not actually cancelled, as it stays running. Now, I did notice that the cancellation event is sent into a sticky task-queue.

Any idea why this is happening? Many thanks!

Sean


1 Like

Hey, might be a dumb question but when you say:

So have a parent-workflow that creates a child-workflow. The child workflow is coded to not complete (i.e. await(() → false

Do you mean literally with the code await(() -> false)? If so, there is no way the child will ever be notified. At least for our GoSDK the cancellation event comes through the ctx object via a Done() method which blocks until a cancellation is received.

Hey,

Maxim did some digging and I was wrong in my original response. It should be possible to await(() -> false) assuming you’re using the workflow.await.

Is there any chance you can share your code?

Here is the (simple) code for the child-workflow. If you want all files, should I email them over?

Many thanks!

Sean

Screen Shot 2020-12-03 at 1.38.06 PM

ryland@temporal.io, I would love the code

Hey I actually noticed another issue which may not be causing a problem but definitely could. I see that you are generating a UUID for the child you create in the cancellation scope. The native language UUID is not a safe API within a workflow context. This is because it’s not deterministic and will yield different results when run multiple times.

I’ve debugged this issue for a while and I think I understand what’s going on.
When we create a child workflow in the cancellation scope, promise that is supposed to handle cancellation requests is getting attached to that new scope. That is okay, BUT when child scope run method completes, we also remove child scope and the cancellation promise from the chain, thus removing cancellation handler along with it. This results in all further cancellation requests being ignored. We need to discuss internally what the fix should be, for now I can confirm that this is a bug in the SDK.

This change should address the problem for now https://github.com/temporalio/sdk-java/pull/269 we should be able to push it out in the following 1.0.4 release soon.

Thank you, I will certainly try it!

Just tried it. Appears to work as advertised, thank you!

For completeness, I updated the try/catch from the sample to see what the exceptions looks like.

Screen Shot 2020-12-16 at 3.39.26 PM

1 Like