Understand Workflow.sleep behaviors / internals

I’m trying to understand how Workflow.sleep works/implemented internally, and how Temporal wakes up slept workflow on scheduled time (or after certain sleeping period). It seems to me that Temporal has some sort of scheduler/tick system built-in to achieve that ?

Say if we have thousands/millions of sleeping workflows that scheduled to be wake up at same time, can Temporal handle such cases ? e.g. a simple workflow

function workflow () {
    Workflow.sleep('2022-10-09T22:09:59Z' - current_time)
    doSth()
}

Also wanna hear about your opinion on use Temporal versus Common Distributed Scheduler for such simple scheduling tasks.

Internally Temporal relies on durable timer queue abstraction. So all these sleeping workflows are going to have a task scheduled to be delivered at that timestamp. When time comes the task is delivered, the appropriate workflow is updated with “TimerFired” event, a workflow task is put into workflow task queue. Then your workflow worker are going to pick it up, recover workflow to its last state and execute doSth() operation.

The only problem with scheduling a very large number of workflows to wake up simultaneously is that other namespaces hosted by the same cluster might experience a slowdown in task processing while these millions of timers are executed. In this case, we recommend to jit them to some reasonable period of time (let’s say to fire during a 15-minute window). If your cluster is not multi-tenant then I don’t see any issues.

Thanks for the explaining all the details. Just curious, are there any concerns (scaling issues) if those durable timers last for months or even years ?

It is not about how many timers, but the disk space that all the open workflows consume. So if the DB disk is provisioned correctly there shouldn’t be an issue.