When do LocalActivity markers become fully durable?

I’ve read through the documentation/forums regarding LocalActiviies, as well as made some anecdotal observations from event histories. I’ve come to the conclusion that LocalActivities themselves do not demarcate a fully durable marker beyond the local worker instance, but rather ride on top of other events such as the completion of the current task. Is this accurate?

To put it more concretely, say I have a simple Workflow with 2 sequential activities where the second depends on the output of the first. If these are implemented as normal activities, the marker for activity 1 is naturally fully durable in the Temporal backend before the task for activity 2 is scheduled.

However, if they are LocalActivities, the markers for Activity 1 and 2 are queued up within the worker and are not sent to Temporal until, in this case, the workflow completes. Some failure scenarios may in fact re-invoke the Activities during replay. For instance, if the failure is in the worker after the local activities complete but before the worker completes the task.

Do I have this right?

That is correct, when you schedule a local activity, the SDK will immediately try to run it and schedule a local timer to a little less than the workflow task timeout. It will send any commands generated while processing the workflow task as soon as there is no more local activities to run or the timer elapses. Note that incomplete local activities are not recorded in history.

Thanks for confirming @bergundy. At least for my use cases, having the activity marker be durable before return is important for correctness, so I’ll switch back to regular activities. It turns out that I was fooled initially into thinking normal activities performed materially worse for my use case (high-bandwidth) but I think I was being subject to the difference in timeout/retry behavior. Once I adjusted my timeouts for regular activities, the high-bandwidth use-case performs fine with normal activities, and I acquire the important durability property.

It does make me wonder how useful LocalActivities are in the face of the durability limitations. I’m sure I am just lacking imagination outside of my own use, but here are a few ways I can think of to enhance them at least for use cases similar to mine:

  1. Have an optional flag on the existing LocalActivity interface for recording the marker before the function returns
  2. Introducing a new semantic for invoking local-like activities but with durability guarantees.
  3. Introducing a barrier function that allows the workflow to explicitly indicate durability points (such as after invoking 1 or more local activities.

Having any of these options would have the benefits of LocalActivities when I know my local node has the activity code available and is short-lived. I.e. I can avoid the scheduling/data-movement penalty but yet still get the Temporal durability that makes the platform so powerful.

For (3), I wonder if using the existing await(true) would work?

The idea of letting users control durability for local SDK work (currently only activities) has come in internal discussions before. It’s really a tradeoff between latency/throughput and durability.
Since Temporal is a general purpose execution platform I think we have no choice but to give this control to the user eventually, I’m not sure what the API would look like though.