Our use case: We have very long running jobs (1-12 hours) that we run as part of a larger workflow. The code for these jobs is entirely owned by us, so we have flexibility in how we manage the jobs.

We are thinking to use a very quick activity to trigger/launch the job, which is handled by a totally different process. Then, the job itself will signal the workflow when it is done. I’ve seen this pattern mentioned in a few other places, so I think this is a good approach (but please correct me if I am wrong). [EDIT: after some more reading, maybe this is a prime use-case for Async Activity Completion and i don’t need signals at all? Looks like I could use RecordActivityHeartbeat() to heartbeat from the spawned process]

The complication: It is possible that this async job will crash, in which case we would want the workflow to relaunch the job. So, I want the workflow to be able to detect when/if the job has crashed.

I was thinking we could have the job’s code use Signaling as a sort of “heartbeat”. Basically, the job would say “i’m alive” once every 2 minutes or so. The workflow could select on the signal channel and if it doesn’t receive a signal after e.g. 3 minutes, a timer fires, it assumes the job has crashed and it launches a new job (which will start signaling again).

Is this a valid approach? Any potential issues that I should look out for? I looked around for examples of using Signals as a kind of heartbeat but failed to find this use case (apologies if I missed documentation). Note that this would be simpler if the job was a long-running activity (we could just use temporal’s heartbeating) but I’ve seen advice to use quick activities plus signaling where possible.

If an external system cannot guarantee the job execution/completion then using async activity completion with heartbeat is the way to go.

Will MaxConcurrentActivityExecutionSize still be respected with async activities? Example: if I set my worker to have MaxConcurrentActivityExecutionSize=1, and it starts an async activity and the completion will be done by another process entirely, will the worker wait until the async activity is completed by the other process before it takes on another activity?

The limit is supported only by Java SDK and only within a Java process. See Activity.useLocalManualCompletion.

There is no way to enforce the limit when task token based completion is used.

