We are looking at migrating a long running job from a legacy Java application to a Temporal workflow. The job can be thought of as a large batch update. It takes a list of items as input from the user and may take a few minutes to a few hours to complete depending on the input.
It has the following phases of processing:
The application determines additional impacted items through analysis of dependencies of the user specified items
The application then processes the user specified and impacted items, updating them in-memory
Assuming all items have been successfully updated, only then are the changes committed to the legacy DB, otherwise the in-memory changes are cleared
The application builds a report of the results
The application sends a completion notification
Phases 1, 4, and 5 would make sense as distinct activities. Due to the in-memory processing of 2, both 2 & 3 would need to be in a single activity based on my Temporal understanding.
My dilemma lies in the fact that activity cancellation is dependent on Heartbeats and we want the cancellation during 2 to be responsive. I can sprinkle heartbeats into 2 so they occur as frequently as desired. However, the timeout of the activity would need to be no less that the largest anticipated DB commit. Thus, the responsiveness of the cancelation would roughly be no better than the configured timeout due to Heartbeat throttling.
It seems to me that if I want timely cancellations, 2 & 3 really need to be distinct activities, implying that 2 would need to be redesigned to store the intermediate changes vs the in-memory approach. That would require more work than we can afford on the legacy system.
One alternative I found in a forum post (Best practices for long-running activities) was to create a component that simply issues the heartbeat on a timer. That way I could have a faster heartbeat for 2 and simply issue timer based heartbeats during the commit. Would that be a reasonable approach?
I am new to Temporal, but I have worked my way through the documentation, some of the samples, and courses and apologize if I am missing something basic.
You can route activities to hosts or even specific processes. So you can ensure that the activities access the same in-memory cache if needed.
You can implement your own “cancel” activity that would be invoked at the same host. This way heartbeating wouldn’t be required to deliver cancellation. It still will be required to detect worker process failure.
Thank you for the suggested approaches! I have a few follow questions on the first approach and am still digesting the second.
“… route activities to hosts or even specific processes.”
To confirm my understanding of this approach, the key is to have both 2 & 3 use the same worker. In an ideal world, this would happen due to the sticky worker preference of Temporal. But given timeouts, worker eviction, etc., a sticky worker can’t be assumed and thus needs to be guaranteed through task queues.
Where I am confused is determinism. If activity 2 (in-memory) completes, doesn’t it need to reflect the in-memory changes into the event history so that the workflow is deterministic if activity 3 fails? Is there a means to make sure that 2 would be re-run if 3 fails?
Activities don’t need to be deterministic—only workflow code. In the case of caching, the whole sequence of activities needs to be executed at a different process. This is also shown in the fileprocessing sample.
To confirm my understanding of the approach, with the assumption that of routing to the same host, I would add a “cancel” Temporal activity to my workflow. This activity would set a flag that the other activities would inspect, at key points in their logic, to abort processing. If I have that right, I have two more follow up questions:
What exception should my activities throw when responding to the cancel to inform the workflow properly? Would it be ActivityCompletionException?
I am thinking that I will use a signal on the workflow to call the “cancel” activity, since my workflow implementation will already have the activity stub available to directly call it.
As this is not a built-in Temporal cancellation, use an ApplicationFailure or complete the activity successfully and add this information to the result.
My apologies for continued questions… Using ApplicationFailure surfaces this as a Failed workflow rather than a Canceled workflow. I would like to have the workflow to be reflected as canceled. Is that possible?