Daemon. Long term tasks

Hi everyone,

I hope you’re all doing well!

I’m currently working on a project where I need to implement a daemon or a long-term process using Temporal. Specifically, I need this process to run indefinitely, or for extended periods, and be able to handle events or tasks that may occur at irregular intervals.

I’ve explored the Temporal documentation and have a basic understanding of workflows and activities, but I’m unsure about the best practices for setting up something that needs to run continuously or for very long durations.

Here are a few questions I have:

  1. Long-Running Workflows: How can I create a workflow that runs indefinitely or for a very long time? Are there specific configurations or patterns that I should follow?
  2. Activity Heartbeating: For activities that might take a long time to complete, how can I use heartbeating effectively to ensure that they don’t time out prematurely?
  3. Event Handling: What’s the best way to design the workflow so that it can handle events or tasks that might occur at irregular intervals? Is there a way to “wake up” the workflow when such events happen?
  4. Resource Management: How can I ensure that my long-term processes are resource-efficient and do not lead to memory leaks or other resource management issues?
  5. Failure Recovery: What are the best practices for handling failures or crashes in long-term processes to ensure they can recover gracefully and continue running?

I would greatly appreciate any guidance, examples, or references to documentation that could help me get started on the right path. If anyone has experience with similar implementations or can share some insights, that would be fantastic!

Thank you in advance for your help and support!

Best regards

  1. Long-Running Workflows: How can I create a workflow that runs indefinitely or for a
    very long time? Are there specific configurations or patterns that I should follow?

There is no hard limit on a workflow’s duration. However, a single workflow is bounded in the number of events it can process and activities and child workflows it can run. To bypass this limitation, such always-running workflows are expected to call continue-as-new periodically to reset the history size to 0. This also allows them to update their code in a timely manner.

  1. Activity Heartbeating: For activities that might take a long time to complete, how can I use heartbeating effectively to ensure that they don’t time out prematurely?

Make sure that you set the heartbeat timeout for the activity. Activity heartbeat calls are throttled up to 4/5 of the activity timeout. So if the timeout is not set then the heartbeat call might never reach the service.

  1. Event Handling: What’s the best way to design the workflow so that it can handle events or tasks that might occur at irregular intervals? Is there a way to “wake up” the workflow when such events happen?

Send a signal to a workflow for each event. Workflow is automatically worken up when a signal is received.

  1. Resource Management: How can I ensure that my long-term processes are resource-efficient and do not lead to memory leaks or other resource management issues?

If you call continue-as-new periodically, this shouldn’t be an issue.

  1. Failure Recovery: What are the best practices for handling failures or crashes in long-term processes to ensure they can recover gracefully and continue running?

Don’t throw exceptions from these worklows and Temporal will take care of the rest.

Hi Maxim,

Thank you for your detailed response! Your insights on managing long-running workflows, activity heartbeating, and failure recovery are very helpful.

For implementing daemon processes, I see three potential approaches:

  1. Trigger an Activity in a Separate Thread:
  • This approach involves starting an activity that runs the long-term process in a separate thread. However, this method depends on another activity or mechanism to keep the workflow active and manage the lifecycle of the daemon process.
  1. Run a while(true) Loop in an Activity:
  • Here, the activity itself contains a while(true) loop that continuously performs the desired operation. Proper heartbeating is crucial to prevent timeouts, and this method requires careful resource management within the activity to avoid issues.
  1. Chain Activities in a Recursive Loop:
  • This method involves chaining activities in a recursive manner where each activity invocation triggers the next one. This approach can effectively manage resource usage and workflow history size by periodically using continue-as-new.

Each of these approaches has its own use cases and considerations. For example, the first approach provides flexibility in managing the daemon process lifecycle, but it requires an additional activity to keep the workflow active. The second approach is straightforward but needs careful handling of long-running operations and heartbeating. The third approach balances resource management and history size by leveraging continue-as-new.

I’m leaning towards the third approach for its balance of efficiency and simplicity, but I’d love to hear your thoughts or recommendations on these methods.

Thank you again for your guidance!

All the best

For implementing a consumer of signals, I see three approaches:

Creating a Signal Handler that Spins Up an Activity for Each Signal:
    This approach uses a signal handler within the workflow to trigger an activity whenever a signal is received. This ensures that each signal is handled promptly and can scale with the number of incoming signals.

Creating a Recursive Activity that Waits for Signals and Consumes the Signal Queue Sequentially:
    This method involves an activity that waits for signals and processes them one by one in a recursive manner. It ensures that signals are processed sequentially, maintaining order and consistency.

Using a while(true) Loop in a Started Activity:
    This involves an activity running in a separate thread or using a while(true) loop to consume signals continuously. This can be implemented as:
        A consumer activity executed in another thread.
        A consumer activity using a while(true) loop.
        A recursive consumer activity.

Each of these approaches for signal handling has its benefits. The first approach ensures prompt and parallel handling of signals, the second maintains sequential processing, and the third provides continuous and flexible signal consumption.

I’m leaning towards using the recursive activity for its balance of efficiency and simplicity but would love to hear your thoughts or any recommendations on these methods.

A starting point is to put the logic of your application in the workflow, and keep activities straightforward where they do one task and return.

Then, if you might run into a particular situation where this isn’t practical, look for a solution for that particular issue.

For example, workflows are limited in the number of events that they can process per second. If you might run into this, one possibility might be to push some work out into activities. But it might also turn out that a solution could be to split the work into multiple workflows that run in parallel.