Working with queues

Hi all.
We are designing an asynchronous system. In it, incoming requests will be processed by the workflow, by transmitting a signal. All this will happen at a certain speed for the client, which depends on the capacity of the cluster and the consumption settings from the queues. There is at least a separate queue for each client.
But there can be A LOT of requests, more than the client is allowed to process. Our first thought was to put a broker (Kafka) in front of the cluster as pressure compensation. But if you remember what a signal is, this is the same message to the temporal queue as to the kafka queue, then you can come to the conclusion that an intermediate broker may be redundant. However, Kafka is a tried and tested thing and is able to withstand heavy loads with relatively small resources. What will happen to the temporal if it has a lot of queues? Is there any way to manage unprocessed messages in queues (observe, destroy, etc.)?
For example, perhaps we can get a metric on the number of messages before sending a signal?
My respect for your work!

All this will happen at a certain speed for the client, which depends on the capacity of the cluster and the consumption settings from the queues.

Temporal service also has rate limits you can provide for frontend / history / matching across all namespaces via configs (dynamic config):
frontend.rps
history.rps
matching.rps

And per namespace per frontend role:
frontend.namespaceRPS

What will happen to the temporal if it has a lot of queues?

There is no limit on number of task queues by design.

perhaps we can get a metric on the number of messages before sending a signal?

I think you might be asking for a task queue backlog count which is not currently supported.
For task queue monitoring best way is to watch SDK metrics, especially workflow_task_schedule_to_start_latency and activity_schedule_to_start_latency.

In it, incoming requests will be processed by the workflow, by transmitting a signal.

Can you provide more info on your particular use case please? Would help with being able to provide better response on best practices with Temporal.

What is max rate of these signals per workflow instance for your use case?

Sorry for my English. I will try to describe the case more accurately.
Let’s imagine that we have 1k clients. Each client has from 1 to 1k processes. Each client has its own queue, so as not to create a race between clients. But thus race between processes of the client can be.
As far as I understand, in such a scenario, we will need to have a dedicated worker for each client?

If we expand the description and get more, then our task is to take all the Signal-With-Start of each client and process them gradually as the cluster performs. This requires 2 queues per client, which means at least 2 workers. See picture.

What is the maximum rate of task creation per client?

It is difficult to say for sure, while the estimate is up to 300-500 RPS peak.
But there may not be 1k clients, but more over time.

Maybe I got excited :). The total RPS is expected to be around 15k.
Here is the question, the client can make 100-300 rps for a short time, we must accept them and put them in the queue for processing. We can process them already at a lower speed.

You can have a worker with a correspondent task queue per client. The complexity will be in managing the resources across all these workers.

The need to provide a fair load distribution across many clients in the same namespace is a pretty common use case. So we certainly will try to come up with a general solution in the future.

Just to put the other responses in perspective, however … we would not consider the request rates you mention (if I understood correctly: 15k rps total, 500 rps per client) “high”. You can easily achieve this kind of throughput (even at steady state) without any intermediate queuing (such as kafka).