Long Running Activities Right Practices Questions

Hello, I have some questions about long running activities and the right practices.

First of all, regarding startToCloseTimeout. If we have an activity that has 2hrs startToCloseTimeout, and the worker that was executing the workflow died when we it was running this activity. When a new worker is up and wants to resume playing workflow what will happen. I believe the workflow will be idle when a new worker is up and only when our activity times out (after 2hrs) it will be replayed by the new worker. Is my understanding right? In that case what if the startToCloseTimeout was 4 days for example how would we make sure the workflow is not idle for all that time?

Regarding this, I have my second question. I guess activity heartbeats might help prevent workflow from being idle all that time when a new worker is up. But how does this actually help? As I understand heartbeats are called from within the activity method like that in Java

Activity.getExecutionContext().heartbeat(details);

but if the workflow if idle how would this activity heatbeat?

Also, what are the right practices of using heatbeats. Should I include Activity.getExecutionContext().heartbeat() after each few lines in my code? Or can I make it happen frequently each 1 minute for example?

Also when regarding long running activities, when they should be used exactly and when should we divide the large activity into smaller activities?
We have this example in our code and not sure if it a right practice. We have one large activity that executes an algorithm. This algorithm will fetch data from the database on different occaasions. Also, it will communicate with external services (one time thorugh REST and a few other times through RabbitMQ). Is this something recommended by Temporal to do or we should divide this activity into several ones where they only contain the code necessary to fetch data and communicating with other services.

Lastly, regarding the long running activity I talked about we realized that our java application memory consumption is forever increasing until our application fails with out of memory error. After invesitgating the issue is seems to be caused by this long running activity. What could be a possible reason for that? We tried setting workflow cache size but it didn’t help.

Thank you in advance!

In that case what if the startToCloseTimeout was 4 days for example how would we make sure the workflow is not idle for all that time?

Workflow is idle only if it is blocked waiting for the activity completion. It can do other things if needed.
As you pointed out the best way to deal with a long running activity is specifying shorter heartbeat interval and heartbeat from the activity implementation.

Regarding this, I have my second question. I guess activity heartbeats might help prevent workflow from being idle all that time when a new worker is up. But how does this actually help? As I understand heartbeats are called from within the activity method like that in Java

Activity.getExecutionContext().heartbeat(details);

but if the workflow if idle how would this activity heartbeat?

Workflows are idle only if their logic tells them to wait for something like activity completion. Activity heartbeats are needed to detect activity failures faster than StartToClose timeout. I don’t understand why you think that activity is not able to heartbeat if a workflow is blocked waiting for the activity completion.

Also, what are the right practices of using heatbeats. Should I include Activity.getExecutionContext().heartbeat() after each few lines in my code? Or can I make it happen frequently each 1 minute for example?

You can heartbeat as frequently as it makes sense for the activity implementations. How frequently the actual heartbeat calls to the Temporal service depends on the configured heartbeat timeout. By default heartbeat is sent to the service (assuming that the activity called heartbeat) each 0.8 * heartbeatTimeout interval.

Also when regarding long running activities, when they should be used exactly and when should we divide the large activity into smaller activities?

We recommend activities that are performing a single well defined function. This gives you more control around failure scenarios, better visibility into the state of the process and finer grained recovery. You can route activities to the same host or even process if they need to access a locally cached data.

Also, it will communicate with external services (one time thorugh REST and a few other times through RabbitMQ).

This certainly should belong to multiple activities. The messages received through RabbitMQ should be delivered to a workflow as signals.

After invesitgating the issue is seems to be caused by this long running activity. What could be a possible reason for that? We tried setting workflow cache size but it didn’t help.

Workflow cache size is not related to how many parallel activities can be executed by a worker. Use WorkerOptions.maxConcurrentActivityExecutionSize to limit the number of parallel activities. If you still get OOMed after limiting this number then you have to troubleshoot your activity implementation.

If you described your use case in more detail then I could provide more detailed help.