I have a small design pattern question. Suppose I have a workflow W1 which can be cancelled. The cancellation has a cleanup which needs to be propagated to multiple systems and I would like the cancellation to be robust, similar to the workflow. One way to go about this is to create a disconnected context and let the cancellation be a long-running activity. However, if the workflow W1 crashes and continues on restart at this point, will this cancellation activity be reissued? (I can create a cancellation child workflow and start it if that’s not the case but I want to know what happens.)
The cancellation can be as complex as needed. So you can call more than one activity from the using a disconnected context or even child workflows.
I think there is misunderstanding about “workflow W1 crashes”. Workflows don’t crash unless their code returns an error. Crashes of worker processes don’t crash workflows, so you don’t need to take care to recover from them. Crashes of workers do crash activities. So you want your activities to have appropriate retry policies and reasonable timeouts to be retried.
Thanks Maxim. By workflow crashing, I meant some systemic outage such as OOM on cadence pods / networking errors or something else outside the control of the code itself. I did not think of a better term at that point.
I see. So such issues should not crash workflows, they might only delay their execution. So my original answer that the cleanup logic is going to complete without any additional code was correct.
Thanks a lot for the answer!
if you are looking at a workflow level construct /pattern did you try out SAGA?
Temporal supports saga pattern which can do ncessary compensation/cleanups.
Here is an example: