Rate limit configuration and best practices

Hi! I’m investigating all the possible configurations for rate limiting and so far I’ve found that we can rate limit on Namespace level, Task queue/Worker level, and DB/persistence level.

Namespace:

  • frontend.namespacerps

Task queue/worker:

  • maxTaskQueueActivitiesPerSecond - global task queue limit on all workers
  • maxWorkerActivitiesPerSecond - per worker limit

Persistence:

  • frontend.persistenceMaxQPS
  • matching.persistenceMaxQPS
  • history.persistenceMaxQPS

Our main bottleneck is the DB here and I want to better understand how the rate limiting work at each level:
(1) Will Temporal handle and control the rate limit internally or throw exception to client if limit exceeded? I believe for Namespace rate limit client will get an exception, what about task queue and persistence rate limit?
(2) Will there be DB write/read to control rate limiting which will add extra throughput to DB that we need to worry about?
(3) Is there any best practice how to set these configurations? If we want to configure all of them (so we have 3 layer of rate limiting), any tips on how to set/tune the number?

There is also at each service instance level, that apply across all namespaces:
frontend.rps, matching.rps, history.rps

  1. For WorkerActivitiesPerSecond limit, SDK will only poll at allowed rate. For TaskQueueActivitiesPerSecond, temporal server will slow down dispatching task instead of return error. So your worker’s long poll will wait longer before server return tasks for them. For all other rate limiting, if exceeds limit, server return ResourceExhaustedError.

  2. If you rate limit your task queue, and the task rate is higher than allowed, then task will go to database, and be read back and dispatch to worker, that is some extra load to DB.

  3. Set persistenceMaxQPS, especially history.persistenceMaxQPS, to below your database capacity so your persistence layer is protected. Set namespacerps to meet your work load during peak business hours. Set task queue limit to protect your dependencies so you don’t overwhelms them.

2 Likes

Hi @Yimin_Chen , just want to double check: for the persistenceMaxQPS, I assume it will be checked before writing to DB (not after writing to DB and before task dispatch). Is my understanding correct? Will there be any extra DB operations in order to honor this throttle limit that we need to be aware of? Just want to make sure that we can confidently use this value to protect our persistence layer. Thank you!

That is right. It is checked before calls to persistence.

Hi @Yimin_Chen sorry that I have follow-up questions about when the different persistenceMaxQPS settings are evaluated and how to handle rate limited exception.

For example if we call WorkflowStub.start() to start a workflow execution, will all the frontend, history and matching service’s persistenceMaxQPS be evaluated synchronously, and throw exception to the temporal client if any of those exceeds the rate limit? I’m not sure if history and matching service will talk to DB in the start workflow call?

Another question is that in the middle of a workflow execution and let’s say the history service is hitting the rate limit talking to persistence layer, how does the Temporal client get notified? Any example code that you could share about where to catch the ResourceExhaustedError you mentioned above?

Hi, could someone please help take a look at my follow-up question above? Thanks!

Yes, those persistence rate limit are evaluated synchronously. Frontend will retry those errors so if they are transient they won’t be visible to client. But if they persist, the ResourceExhaustedError will be returned to client.
For start workflow request, only history needs to use persistence for the call.