What’s the workflow_task_executor
in the python worker configuration?
My understanding is that Python workers have following components for executing code:
- An event loop for executing workflow tasks
- An event loop for executing tasks from async activities plus the rest of Temporal, such as communicating with the server
- An executor for executing tasks from synchronous activities
But the worker configuration also has a workflow_task_executor
, so I don’t know where that fits into this picture.
I also read these two forum posts #1 and #2, but I don’t think they quite have the answer
An event loop for executing workflow tasks
This is not the case exactly. Actually every workflow run gets its own deterministic event loop, and of course the worker runs in an event loop, but workflow tasks being CPU bound and needing to be timed out for deadlock detection are run in threads so they don’t block other workflow tasks. So workflow_task_executor
is the thread pool these tasks are run on. A well-behaving workflow should execute the workflow parts of a workflow task in milliseconds, but it may not and we can’t lock up the whole system if it doesn’t (this is the same in all SDKs).
We just opened a PR that will be changing the default for this value from a thread pool with max workers as max(os.cpu_count(), 4)
to a thread pool with max workers as max_concurrent_workflow_tasks
or 500
: Improve workflow task deadlock and eviction by cretz · Pull Request #806 · temporalio/sdk-python · GitHub
1 Like
Thanks as always, Chad! Here’s my updated understanding. I’m curious to hear what you think!
Activities
- The activities and the temporal worker SDK code both run in whatever event loop the user gives the worker
- sync activities run in the activity executor
Workflows
Even though they’re defined with async
, each workflow gets its own “workflow event loop” as described in the blog (which isn’t constantly looping – it just gets cycled through to see if any futures resolve during a workflow task execution). So the async
is referring to each workflow’s individual event loop; it is not referring to the global event loop. So in some sense, this workflow code can be thought of as not being async
, but more like sync, and that’s why it’s run a thread pool executor.
Side notes
And the only ways to use more than one CPU core in a python worker (considering the GIL) are:
- run the sync activities in a process pool executor
- run more than one worker process
Also, I really like the PR. I think things will be much less likely to get clogged because the OS can preemptively switch between the threads/workflow tasks, and one bad task is less likely to have as adverse of an effect.
Yes, this is all basically correct.
Just to clarify, this is the case with Python in general and not related to Temporal. And the options listed below of effectively “use multiprocessing or use multiple processes” are common ones.
We often discourage this option because people often don’t take into account the struggles of multiprocess Python (wrt pickling, lack of cancellation support, etc).
1 Like