Sync and Async Activity

TLDR; Blocking a workflow thread doesn’t consume a thread at the worker process. So you can have hundreds of millions of blocked workflows and process them with workers that have hundreds of threads in their thread pools.

Workflow Task

Workflow executes as a series of workflow tasks. While workflow code executes, all the external requests it makes (like scheduling activities, child workflows, and timers) are not sent directly to the service but are accumulated at the workflow worker as commands. Only when all workflow code threads are completely blocked, the workflow task is declared completed by sending all the accumulated commands to the service.

After the workflow task completes, the full state of the workflow, which includes all the threads it is using can be released back to the process.

Later, when some event (like activity completion or timer firing) happens, the service schedules a new workflow task. A workflow worker (possibly a different than the original one) picks it up from the workflow task queue. Restores the workflow state to were it was when the previous workflow task was completed and delivers the new event to the workflow. The new event unblocks some of the workflow threads causing new requests, which in turn become commands, and the whole process repeats.

Note that workflow consumes threads only when it is making progress. And each such workflow task is usually very short in the order of milliseconds. This allows having a practically unlimited number of blocked workflows with a limited number of workflow worker threads.

Workflow Caching

Recreating the state of a workflow from scratch on every workflow task is pretty resource-intensive. So as optimization workflows are cached at workflow workers. A cached workflow still holds all the threads its code is blocked on. That’s why if you take a thread dump of a workflow worker, you will see a bunch of blocked workflow threads.

Every time a workflow that executes its workflow task needs to create a thread it gets it from an internal thread pool. If the pool doesn’t have any free threads, some workflow is kicked out of the cache, and its resources are released. The released threads are returned back to the thread pool, and the new thread creation succeeds.

One consequence of this design is that the number of cached workflows in Java is limited, not by memory, but the number of threads the process can sustain.

5 Likes