Best Practices for Managing Throttling Limits

Roman13 · September 4, 2024, 3:48pm

Thank you very much for providing such a fantastic product! I’m just beginning to explore Temporal, and I can feel a bit of a steep learning curve.

I have a question regarding the optimal architecture for solving the following task:

Our service sends requests to N (let’s assume 30) different LLM models, and we need to manage these requests in a way that doesn’t exceed the rate limits, which can vary per model (e.g., max 50, 100, 200 requests per minute).

I see two potential approaches:

Create a separate queue and worker for each type of LLM model. However, this seems like an expensive solution due to the need to maintain a large number of independent workers.
Use a single queue for all model types and simply describe a retry mechanism. However, this may result in a large number of unsuccessful requests waiting for the throttling period to reset.

Could you please provide guidance on the best approach to take in this scenario?

maxim · September 4, 2024, 4:20pm

You can run more than one worker per process. So you can run all 30 of them in a single process. So I recommend (1).

Topic		Replies	Views
Multi-tenant strategy for throttling Community Support go-sdk , general-impl , multi-tenant	3	1632	January 8, 2024
Dynamic rate limiting Community Support general-impl	11	1497	October 4, 2021
Throttling Temporal Community Support java-sdk , cadence	6	2187	September 29, 2022
Rate limit configuration and best practices Community Support java-sdk , best-practices	6	3963	August 15, 2022
Job queuing design Community Support	5	590	April 25, 2024

Best Practices for Managing Throttling Limits

Related topics