I have a use case in which I know ahead of time the number of CPU cores and RAM consumed by each activity in my temporal workflow. I want to ensure efficient utilization of CPU cores and memory across my compute resources using some basic heuristic like greedy bin packing. I have considered using Resource-Based Slot Suppliers, but my concern is that this does not allow the user to provide estimates of CPU / memory usage of a given activity ahead of time; this strikes me as a big issue for my use case, where a single activity can consume nearly all the RAM on a given machine.
I have also looked into Custom Slot Suppliers, but I do not think these are sufficiently flexible for my use case; specifically, the context given to the reserve_slot
procedure gives information identifying the worker and the task queue but not the pending activities within the task queue. If I understand correctly, this would allow me to prevent workers from executing activities from a particular task queue but not to control which activities they execute to begin with.
Is there an idiomatic way to implement this sort of thing in temporal?
Hey @jean_lannes ,
Custom suppliers is probably your best bet here, though optimizing across the fleet is going to be somewhat challenging.
You correctly identify that the interface doesn’t tell you what types of activities are pending within the queue, and this is because it can’t - there is no reasonably efficient way for Temporal Server to do so. It can be done, but it’d be too inefficient to do every time a task is matched.
One option you could consider is simply constraining each activity type to a task queue specifically for that type. That way you always know a-priori that tasks delivered on a certain queue will be of a certain type. This doesn’t necessarily allow you to bin-pack optimally across workers, but is probably going to be pretty good.
If you truly need to perform this packing in a near-optimal way - you’ll likely have to write your own service, and then your custom slot supplier implementation can communicate with that service.
Do let me know how it works out and what you end up doing.
Hello @Spencer_Judge,
Thanks, this is very helpful.
Could you point me to any resources / documentation on constructing custom services for job placement? (Or codebases implementing this pattern.) Are there particular interfaces for this in the SDKs, or does it just involve using the standard task-routing interface?
Unfortunately not, since the custom slot supplier is quite new there’s not too much out there, and I don’t know of any (public) examples of interacting with other services, unfortunately.
There is nothing built in to the SDK for it either, but, I’m also not sure what that would look like without being so generic as to be nearly useless.
What I can say is that the part of your custom slot supplier implementation that needs to communicate with the external service should probably happen in some background thread/coroutine. The actual reservation methods should, ideally, be able to quickly hand out a slot if one is known to be ready rather than performing network comms every time. IE: Try to “preload” the knowledge about slot readiness, if that makes sense.
Thanks, I think the use of a custom slot supplier communicating with an external service could potentially work. The one case that might be a bit difficult is that where resource requirements could vary between tasks of the same activity type. (The actual use case has to do with file processing, where RAM requirements depend on the size of the file, and there is no good way to distribute this.) Can you think of any way I could accommodate this pattern using an external service?
I can think of a few possible options:
If your worker fleet is relatively small/static, you might want to have a coordinating workflow or workflow(s) whose job is to look at the file sizes involved in the processing of a certain number of files, decide how work should be packed, and then create use a task queue per-worker to distribute that work out.
If that doesn’t really work because the number of workers is pretty dynamic or chunking ahead of time doesn’t work, etc, then another option could be to have one activity that determines the size of the file and then you could use the previous solution I mentioned in combination with a set of task queues where each queue represents a bin of the file size (ex 100-400Mb or something), then you have a fairly accurate idea of whether or not the worker can take a task from a given queue.
Thanks, I think the multi-queue approach is probably my best bet. My only remaining concern is that supporting multiple dimensions may result in an inordinately large number of queues. For e.g., if I want to optimize based on the amount of RAM, number of CPU cores, and amount of disk space used and assume 5 values for each, then we already need 125 queues to accommodate all of this. Is this a bad idea? Do I risk significantly slowing my system by polling so many queues?
Thanks again for the advice.