Long CPU intensive activities in single threaded languages (Python or Javascript)

Background:

In our workflow we have:

  1. Short running activities
  2. Long running CPU intensive activities
  3. Queries, Signals, etc.

In addition, we are using the Python SDK.

What I would expect:

A worker which is busy processing a long running activity will not pick up any other activities (or queries or signals) until it has finished processing it.

What is actually happening:

The moment the intensive activity is picked up, we are noticing many other short running activities or queries are timing out. I assume they are being picked up by the busy worker, who’s process is busy with the long running activity.

What is the recommended configuration / solution?

Is there any configuration that should be set specifically for single threaded languages such as Python or Javascript?

I assume defining workers for cpu intensive activities separately from the workers for short running activities is one solution, but is there a way to avoid this operational overhead?

Thanks for the help.

You can use the max_concurrent_activities setting when creating a worker to limit the number of activities that worker can run at a time. It is defaulted to 100. It is common to set it to a low number (or even 1) to limit resource-intensive activities.

Thanks for the quick reply.
Would this setting limit just the concurrent activities? Or would it also make sure not to run any queries, signals or workflow tasks on the worker while it is running an activity?

Just concurrent activities. But you can make activity and/or workflow only workers. If a worker only has activities set on it and no workflows, it never polls workflow work. Similarly if a worker only has workflows set on it and no activities (or does have activities for local activity use but no_remote_activities=True is set), then it never polls for activity work. So you can create activity-only and workflow-only workers on a task queue in completely separate areas/processes.

Ok, and I understand this is the common practice?
(If so would it make sense for a feature request to limit all types of concurrent tasks on a worker, not just activities? Would be especially useful for single threaded languages to prevent these issues)
And one more clarification - query tasks are considered “Workflow work”?

Yes, especially for Python where AI companies want to run activities on limited GPU resources. So they start max_concurrent_activities=1 activity-only workers on each GPU-based resource and use Temporal to distribute the work.

No, because splitting into two workers is basically the same thing. Workers are just activity workers and workflow workers, and there is not very much overlap (local activities notwithstanding). They are just combined for user simplicity but can be split just as easily.

Yes, workflow work is anything applying to the workflow from the server. While a query may not technically make a “workflow task” in the event history, it is commonly still referred to as a task to accomplish (and so is governed by max_concurrent_workflow_tasks). Granted workflow tasks should take just milliseconds to complete (local activities notwithstanding), so there is not usually a need to limit them to low concurrency. However, they do use memory when cached and so the cache size may be worth tuning.