Weird cancellation-scope behaviour (Java SDK)

Greetings. I have come across a weird case using a cancellation-scope in the Java SDK. I was hoping someone could comment? Imagine a workflow which:

  1. In a cancellation-scope, launch an async child workflow (and keep the promise). Child-workflow is a long-running operation.
  2. Run the cancellation-scope.
  3. Wait N seconds.
  4. Cancel the cancellation-scope.
  5. Join to the child workflow’s promise.

My expectation is that (a) the child workflow would receive a CanceledFailure exception, which (b) according to the documentation it should re-throw when it exits, and then (c) would be thrown out of the join.

But, that’s not what happens. When you look at the workflow history: the parent workflow records the cancellation sent to the child, but, the child-workflow doesn’t start until a later event, and therefore, doesn’t get the CanceledFailure exception.

I think this is a bug, but my understanding might be incorrect.

public static class MigrateCancellationWorkflowTest implements MigrateCancellationWorkflow {

        @Override
        public void migrate(final DatabaseMapping databaseMapping) {

            final AtomicReference<Promise<Void>> reference = new AtomicReference<>();

            final CancellationScope scope =
                Workflow.newCancellationScope(() -> reference.set(
                    Async.procedure(
                        Workflow.newChildWorkflowStub(
                            MigrateBackupFilesWorkflow.class,
                            ChildWorkflowOptions.newBuilder()
                                .setCancellationType(WAIT_CANCELLATION_COMPLETED)
                                .setNamespace(DCP_NAMESPACE)
                                .setTaskQueue(DCP_PRIMITIVES_TASK_QUEUE)
                                .setWorkflowExecutionTimeout(Duration.ofHours(12))
                                .build())::migrateBackupFiles,
                        databaseMapping, "s3://bucket/key", true)));

            scope.run();
            Workflow.sleep(Duration.ofSeconds(10));
            scope.cancel();

            // Expect the following to throw `CanceledFailure`.
            reference.get().get();
        }
    }

Regards,

Sean

This looks as same issue discussed in this recent post. Also see opened issue here.

Workaround as discussed there was to wait for child workflow execution to start inside the cancellation scope.

Greetings, and thank you, that was a super fast reply.

I tried the fix (waiting for child to start) and it works. However, after #4, i.e. the cancel is issued, the child-workflow still doesn’t receive a CanceledFailure. Could that be related to the open issue I wonder?

S

Can you show your child workflow history?
Note that child workflow execution is always going throw ChildWorkflowFailure with CanceledFailure as it’s cause.

I’ve create a small sample app, and extracted the 2x workflows (parent + child). The events look correct (to me, at least).

The child workflow simulates a long-running operation with Workflow.sleep(Duration.ofHours). You can see the timer getting cancelled in the history. In Java, I expect it to throw CanceledFailure out of the sleep – but it doesn’t happen. My understanding is that it should be propagated, and that’s the way a child gets notified that it’s been cancelled.

Update: I might have figured it out. To get the history I switched from TestWorkflowEnvironment to docker. Now, it works. Maybe TestWorkflowEnvironment has a bug? If I can find a way to extract history, could forward that along.

It could be. Will try to reproduce using TestWorkflowEnvironment and report back.

I’ve got the code, just not sure how to prove it, e.g. can I dump out history as json from an executed test workflow?

You could use something like this for example:

private String getWorkflowExecutionHistoryAsJson(
      WorkflowExecution workflowExecution, TestWorkflowRule testWorkflowRule) {
    GetWorkflowExecutionHistoryRequest request =
        GetWorkflowExecutionHistoryRequest.newBuilder()
            .setNamespace(testWorkflowRule.getTestEnvironment().getNamespace())
            .setExecution(workflowExecution)
            .build();

    return testWorkflowRule
        .getTestEnvironment()
        .getWorkflowService()
        .blockingStub()
        .getWorkflowExecutionHistory(request)
        .getHistory()
        .toString();
  }

in your test, let’s say something like:

 WorkflowStub untyped = WorkflowStub.fromTyped(workflow);
 WorkflowExecution workflowExecution = untyped.start("World");

 String result = untyped.getResult(String.class);
 // ....

 String history = getWorkflowExecutionHistoryAsJson(workflowExecution, testWorkflowRule));
1 Like