In Temporal, we don’t schedule workflows to specific workers. We do it with activities as they are actual consumers of CPUs/GPUs and other resources.
hard constraints - CPU and GPU of the worker must match that specified to the workflow
You use a separate activity task queue per GPU type.
priority - higher priority workflows are scheduled first
Temporal doesn’t support priorities natively yet. Depending on your use case, different workarounds are possible. What is the maximum rate of activities per second you need to schedule across all priorities?
soft constraints - a worker with a matching value, if existed, will be scheduled first. Otherwise, any worker matches the hard constraints can.
Depending on the rate of execution different solutions are possible.
You use a separate activity task queue per GPU type.
We will have to literally create ten of thousands of queues for each combinations of the hard constraints. Some constrains are a list, for example a list of OSes and a list of test programs.
The rate of execution is roughly 500K jobs per day on 11K machines. Tyically the submission rate is a few jobs per seconds at the peak hours. Thanks
We will have to literally create ten of thousands of queues for each combinations of the hard constraints. Some constrains are a list, for example a list of OSes and a list of test programs.
I see. In this case, I would move the constraint matching outside of Temporal. Implement an activity that given constraints returns the hostname to run it on. Then use a host specific task queue to run the actual computing activity at that host.
The only “filtering” is on the level of task queue. All activities scheduled to a given task queue will be delivered to the workers that listen on it without any filtering.