Open long-running workflows waiitng on workflow.await()

Abhijith_K · May 25, 2023, 2:42am

Consider we have million workflows waiting on a signal() / user input. In this case, especially in java where threads are costly, we end up having million workflow instances/runIDs waiting on a signal.

Does this mean we will be having all these million workflow instances/runIDs as threads in suspended state and in memory?
WorkerFactoryOptions.Builder setWorkflowCacheSize(int workflowCacheSize) and
WorkerFactoryOptions.Builder setMaxWorkflowThreadCount(int maxWorkflowThreadCount) does help in caching and avoiding creation/resuming of workflows from replay, but keeping millions of workflows in consideration, what would be the acceptable values?
If for some reasons we hit limits of threads and are purged/killed, and the signals arrive after few days, will the workflow be still restarted and execution is guranteed? If yes, how is this achieved?

antonio.perez · May 25, 2023, 5:53pm

Hello @Abhijith_K

Does this mean we will be having all these million workflow instances/runIDs as threads in suspended state and in memory?

No, your workers created from same worker factory can utilize up to MaxWorkflowThreadCount. Worker can process however many many more workflow executions as it has thread eviction.

WorkerFactoryOptions.Builder setWorkflowCacheSize(int workflowCacheSize) and
WorkerFactoryOptions.Builder setMaxWorkflowThreadCount(int maxWorkflowThreadCount) does help in caching and avoiding creation/resuming of workflows from replay, but keeping millions of workflows in consideration, what would be the acceptable values?

It also plays a role, the number of pollers and ConcurrentWorkflowTasks and ConcurrentActivityTasks among others (please see Developer's guide - Worker performance | Temporal Documentation)

The configuration will depend on the workflow’s nature. How many of those workflows will be concurrently running? or do they be mostly sleeping/awaiting to be signaled? and the number of worker replicas too.

If for some reasons we hit limits of threads and are purged/killed, and the signals arrive after few days, will the workflow be still restarted and execution is guranteed? If yes, how is this achieved?

The workflow history is persisted in the database. If a workflow execution is evicted from cache (or your worker crashes) , the same or another worker will poll the tasks, and if it can not find the workflow execution in cache (metric sticky_cache_miss) the worker will poll the workflow history and replay it to recover the workflow execution state and continue from there.

Let me know if it helps,

Abhijith_K · May 26, 2023, 7:55am

Thanks @antonio.perez for the help.

Topic		Replies	Views
Worker Memory usage Community Support go-sdk , mysql	17	4459	June 26, 2024
Are each workflow.sleep and worflow.await backed by a thread? if so when will the thread be relinquished? Community Support java-sdk	2	1804	February 17, 2021
Using signals to build a WorkflowQueue with high throughput Community Support java-sdk	3	1526	December 13, 2022
Long Running Activities Right Practices Questions Community Support java-sdk , general-impl , metrics	1	1140	January 21, 2024
Scaling implications of signals and sleep Community Support go-sdk , general-impl , signals , scaling , workflow-implementat	2	824	November 15, 2022

Open long-running workflows waiitng on workflow.await()

Related topics