Scaling Strategy For Workers With Rate-Limited Task Queues

luanderson · April 9, 2025, 8:43pm

My team has requirements to rate-limit requests to our dependencies to RPS values that we agreed upon with the dependency owners. We thought it would be easiest to implement this rate-limiting by creating dedicated task queues for each activity type and using setMaxTaskQueueActivitiesPerSecond.

We are using defaults of 200 for max concurrent workflow/activity execution sizes.

I’m seeking advice for how to best scale our workers. We are scaling based off of Prometheus metrics.

It seems that the activity schedule to start latency metric would normally be a good metric to use to gauge whether workers are becoming backlogged with tasks or not, but this metric is no longer relevant to us if the task queues are being rate limited to lower values than the max concurrent workflow/activity execution sizes.
If we scale based off of memory/cpu only then the workers won’t necessarily scale when all of their slots are filled with workflow/activity executions if those executions are not very intensive.
It looks like task queues have ApproximateBacklogSize but we would need to expose this for every task queue and scrape it to use it as a Prometheus metric.

Third option seems like the “most correct” solution but I’m not sure we would have time to implement and test it. So, please let me know if I’m missing anything, and any advice is appreciated.

tihomir · April 20, 2025, 4:45pm

It seems that the activity schedule to start latency metric would normally be a good metric to use to gauge whether workers are becoming backlogged with tasks or not, but this metric is no longer relevant to us if the task queues are being rate limited to lower values than the max concurrent workflow/activity execution sizes.

Correct, by setting task queue dispatch limits you are creating a natural backlog of activity tasks

If we scale based off of memory/cpu only then the workers won’t necessarily scale when all of their slots are filled with workflow/activity executions if those executions are not very intensive.

Believe also correct, question is to find the max number of activity task slots a single worker can process before hitting your cpu limits. It’s something you could load-test and monitor your temporal_worker_task_slots_available metric for worker_type=ActivityWorker. If you are at 200 and you are at like 10% cpu then i think you should consider increasing your worker activity task slots.

It looks like task queues have ApproximateBacklogSize but we would need to expose this for every task queue and scrape it to use it as a Prometheus metric.

Yeah there is a sample here if helps

You could also consider scaling on temporal_worker_task_slots_available for worker_type=ActivityWorker, if reaches 0 for a chosen duration of time.

Topic		Replies	Views
Autoscaling Workers Based on Custom Prom Metrics, For one specific activity in the queue Community Support worker , deployment	1	294	March 21, 2024
Suggested metrics to autoscale Temporal workers on Community Support general-impl , metrics , kubernetes	9	7992	January 3, 2024
Rate limit activity on all workers Community Support java-sdk	5	1781	March 11, 2022
Strategies for Scaling AWS Services Community Support scaling	9	2225	October 1, 2021
Rate limit per activity Community Support java-sdk	11	2441	January 15, 2024

Scaling Strategy For Workers With Rate-Limited Task Queues

Related topics