Hi,
we’re considering replacing our home grown system with temporal,
but i’m stuck trying to figure out rate limiting.
We have tens of thousands of users and need to carefully rate limit their activities to avoid noisy neighbors. Often in the context of rate limiting, the answer is to create a workflow for each user that implements the limits, however, this would immediately exceed the total running workflow cap of temporal.
an idea i had was to hold a “currently running” counter in redis and query it from an activity. if its too high, the activity would fail and retry later. However, this seems to interact poorly with the way temporal is designed, specifically i’m not sure if there’s a way to ensure the counter is always eventually decreased.
a third option might be using a single workflow that counts running tasks for all users through a channel, but that probably quickly becomes a bottleneck?
none of these options really offer full protect against a simple denial of service, because even tho the task would fail due to the custom limiter, it would still be in the temporal queue, which is full eventually