When a Workflow is cancelled, I assume worker will keep executing until it reaches a safe point to cancel. My question is how is this handled?
Does the workflow gets cancelled when the workflow code interacts with the Workflow API? Or only when the method returns?
How is cancellation propagated to child workflows? Does the parent workflow only gets cancelled when all child workflows have successfully cancelled? If yes, does it also depends on the activity completion? Or are the child workflows cancelled asynchronously (i.e., can we have at some point, a parent workflow cancelled but a child workflow still running)?
Is there a way an Activity can know that the workflow which has called it has been called, so that the Activity can stop at the next safe point? Or the Activity always have to complete anyway?
Cancellation request returns OK as soon as it is accepted by the service. It is delivered and handled by a workflow asynchronously as potentially the cleanup code can take long time.
The way cancellation is handled by the workflow code is SDK specific.
Go
Go it uses Context.Done channel to notify workflow code that cancellation request was received. Most Temporal Go SDK APIs will immediately fail with CanceledError when called with already canceled context.
An activity that is already executing the behavior depends on the value of WaitForCancellation property of ActivityOptions. If this property is set to true the activity Future is going to become ready only after the activity completes or reports the cancellation. For an activity to get canceled it must heartbeat as heartbiting is the only way at this point to send information to an already executing activity. If the WaitForCancellation is set to false the cancellation is still sent to an activity, but the activity result Future becomes ready with CanceledError immediately.
The same applies to a child workflow. ChildWorkflowOptions.WaitForCancellation property is used. And the cancellation of a child workflow follows the same approach as of any other workflow. As it is not possible to use a context that spawned from a root workflow context to call any SDK functions a disconnected context has to be used for cleanup. Use workflow.NewDisconnectedContext to create one.
Java
Java doesn’t have a commonly used way to cancel computation besides a very unfriendly Thread.interrupt. So the Temporal framework uses its own CancellationScope abstraction. The code inside a CancellationScope is canceled when cancel is called on the surrounding scope. The main workflow method is always invoked in the context of a root cancellation scope. And this root scope is canceled when a workflow is canceled.
The behavior of an activity invocation is controlled through ActivityOptions.cancellationType property. It can have three values: WAIT_CANCELLATION_COMPLETED, TRY_CANCEL, ABANDON. The WAIT_CANCELLATION_COMPLETED blocks the activity invocation until the activity is canceled or completed, TRY_CANCEL sends cancellation to the activity, but immediately fails its invocation in the workflow code and ABANDON doesn’t send cancellation and immediately fails it in the workflow code. For an activity to get canceled it must heartbeat as heartbiting is the only way at this point to send information to an already executing activity.
The child workflow has similar ChildWorkflowOptions.cancellationType property with the similar behavior.
Thank you for the detailed explananation!
Just a few more clarifications:
I don’t fully understand this comment: “blocks the activity invocation until the activity is canceled or completed”. Taking the example of WAIT_CANCELLATION_COMPLETED, this means the Workflow will only be effectively cancelled when Activity finishes, right? What do you mean exactly by “block”?
What’s the status of the Workflow if it has already been requested to cancel but cancellation did not completed yet? Will it is still be in “running” state or it goes to “cancelled” state immediately? Is there a way to see this status in the API of Temporal Service?
Does the Hearbeat call from the Activity automatically fails with a concrete exception when the workflow the activity belongs has already been cancelled, is it?
It blocks in the sense of delaying the result of activity until it canceled. The way result is delivered depends on SDK and on how the activity was invoked. For example, in Go SDK the workflow can block on Future.Get or use Selector to wait on multiple activity results. In Java, the synchronous activity invocation blocks the calling thread. But when activity in Java is invoked asynchronously the result is delivered through a Promise which can be either used to block a thread or to process result or failure in a callback.
It is in the running state as workflow can ignore a cancellation request and continue execution for a long time. Currently, the only way to see that workflow has received a cancellation request is by looking into its execution history. I filed an issue to get this information added to DescribeWorkflowExecutionResponse.
In Java, the heartbeat call fails with the ActivityCancelledException if an activity was canceled. It fails with ActivityNotExistsException if workflow that invoked it has closed for any reason. In Go, no error is returned from heartbeat call as the context is set to Done.
Is it possible to run compensation actions in case of failure by using the Python SDK ?
Are there any samples regarding with rollback and compensation activities ?