My team has requirements to rate-limit requests to our dependencies to RPS values that we agreed upon with the dependency owners. We thought it would be easiest to implement this rate-limiting by creating dedicated task queues for each activity type and using setMaxTaskQueueActivitiesPerSecond
.
We are using defaults of 200 for max concurrent workflow/activity execution sizes.
I’m seeking advice for how to best scale our workers. We are scaling based off of Prometheus metrics.
- It seems that the activity schedule to start latency metric would normally be a good metric to use to gauge whether workers are becoming backlogged with tasks or not, but this metric is no longer relevant to us if the task queues are being rate limited to lower values than the max concurrent workflow/activity execution sizes.
- If we scale based off of memory/cpu only then the workers won’t necessarily scale when all of their slots are filled with workflow/activity executions if those executions are not very intensive.
- It looks like task queues have ApproximateBacklogSize but we would need to expose this for every task queue and scrape it to use it as a Prometheus metric.
Third option seems like the “most correct” solution but I’m not sure we would have time to implement and test it. So, please let me know if I’m missing anything, and any advice is appreciated.