We are evaluating Temporal for a webhook service that processes change events (add/remove) received from a GCP Pub/Sub subscription. At peak load, the service must handle 50-100k events per second. The reason why we are interested in temporal is because we have delivery state to manage (success/failed/pending etc).
My primary concern is whether Temporal can scale to this level of throughput. Is Temporal a suitable choice for this use case, or would a custom service for Pub/Sub subscriptions, retries, and state management be more appropriate?
Temporal scales to millions of concurrently executing workflows, but the number of events per second that a single workflow instance can handle is limited.
So one question is, is the processing of each change event independent (not relying on data received in previous change events)? Could, for example, each change event be processed by a different workflow?
Note however that once you handle events in different workflows, there’s no guarantee that the events will be handled in any particular order. If, for example, you might have a change event {"a": 1} followed by {"a": 2} (and the intent is that the final result be that a is 2), if you have different workflows delivering those events then {"a": 1} might be delivered last (with a final result that a is 1).
Thus, if for a particular object that you’re processing change events for, you need to process those events in order, then you’d need to have a workflow instance for each object.
Then you’d need to consider whether the volume of events for a particular object might be too high for a single workflow to handle.