Max no of Worker Entities per Worker Process

Hello ,
We are considering rewriting several of our Jobs (100s) to Temporal .
Some of them are light weight ,less frequently used and some really long running and heavy.

I have read that the ideal no of Workers per worker process is one .

We are thinking of separate Task Queues per WorkFlow. All activities belonging to the workflow will also use the same Task Queue.

Now If we put one Worker Entity per worker process , we will end up with multiple Worker Processes and Worker Process Clusters .This isn’t ideal.

What would be the Max no of Worker Entities per worker process in a 16GB server .
I know this will depend on several other factors ,however any guidance you may have will be helpful.

I have gone through many resources but cant find any answers.I cant provide all the links because of the limitation of only two links.

Workers

What would be the Max no of Worker Entities per worker process in a 16GB server .

I don’t think concrete answer for this exists, is going to be highly dependent on the workers cache size (number of workflows execs they are keeping in memory).
Depending on SDK used, for Java for example you would also need to test total number of threads created.

From SDK metrics side you can watch
temporal_sticky_cache_size and temporal_workflow_active_thread_count when testing your worker loads on the server.

Hope this helps.

What are you trying to achieve? I would start by running them on the same task queue. This way host resources will be much better utilized.

Thanks @maxim for your input .If we take this approach Can you tell how do we route them to different workflows and activities ?

Let’s say we have 100 jobs in the current legacy system.They are all heterogeneous. The idea that was running in my mind was ,
100 jobs → 100 work flows
In a single “worker process cluster” run several of these jobs based on configurations and depending on how many such poller threads we can run .

Would love to hear and expand on your idea of running everything on same Task Queue.

The second question I have is .
We have a schema based multi tenant system on the data base side .When a DB/tenant is down for maintenance /upgrade , how can we gracefully stop all the jobs running for one particular tenant?

[Update] @maxim I followed your approach and tested this with two different work flows with same task queue. I was able to successfully run them .

Do you see any other scalability issue with this approach ?
Should we be using different WorkerOptions than the default to get better scalability .

A job is not just a workflow. It is a workflow + activities.

Workflows don’t usually consume many resources; they also consume resources in short bursts (aka workflow tasks) when new events need to be processed. So allocating many workflow types in the same worker is fine. A single task queue scales very well. So you might need to add more worker processes when the load grows, but there is no need to break them into separate workers/task queues in the majority of cases.

Activities are different. Some of them can be very resource-intensive. In this case, separating such activities into separate pools of processes (by scheduling them on a different task queue) might make sense. If your activities perform short DB updates or RPC calls such separation is not needed.

When a DB/tenant is down for maintenance /upgrade , how can we gracefully stop all the jobs running for one particular tenant?

There is no need to stop those jobs. Activities use exponential retry and can be retried as long as needed. As soon as the maintenance is done the activities will succeed on the next retry and workflow will continue executing without any manual intervention.