Hi Team -
When I have millions of workflows that have Workflow.sleep() used , I saw several threads where it is mentioned that this is resource light and we can scale to millions as long as the DB has the space to store the sleeping workflows.
Is it the worker process that polls the DB to check when it is time to fire these tasks ? I am trying to use this feature of temporal as an Event scheduling service, so want to do a load test to see if the system scales to millions of sleeping WFs. I am not exactly sure if the workers poll the timers stored in DB or is it some other component (temporal service) .
Would be great if you could help me get some clarity.
Is it the worker process that polls the DB to check when it is time to fire these tasks ?
No, service creates and manages user timers durably. When timer fires service needs to deliver the timer fired event to your worker so it can continue your workflow execution. It does that by creating a workflow task and dispatching it to your worker.
I see this column called shard_id in the timer_tasks table, so I am assuming there are multiple processes or threads monitoring these timers. Is it correct to assume that this is horizontally scalable ? I am running temoporal on my local docker, so not sure how to check which component of the temporal service should be scaled out for supporting millions of timers apart from the DB .
This is what the docs say - not sure which component apart from DB to scale out. "Temporal has two main parts: 1. Your application 2. The Temporal Service (a set of services and components)"
Yes, it is horizontally scalable. Each shard has its own queues, including a durable timer queue. Note that it is not possible to change the number of shards after a cluster was initialized.