I have a pretty straightforward problem and I’d like to find a solution which is idiomatic and isn’t too ‘stressing’ to Temporal.
Basically I’d like to have something similar to AWS SNS, but implemented via Workflows. I have an SQS Queue receiving lightweight events (~500 bytes each), but the frequency is high (I’d say around 1000/sec, might be increasing 10-100x as business scale ideally goes up).
If we try to be very flexible on event processing we’d say that we have various workflows and they are subscribing to these events via some filters (either by specified id’s or other attributes of the incoming events).
One option is to implement a long running workflow, which is polling the SQS queue, receives subscription events and distributes events according to active subscriptions.
The polling (fetching events from SQS) part could be ergonomically performed from a long running workflow, but the demultiplexing part is problematic. A possible path to implement subscriptions is via Temporal Signals.
- If subscriptions are dynamic (e.g. short lived, one-off callback-like constructs) , that means that the workflow receiving subscriptions will see a lot of Signals and will be forced to perform a Continue-As-New frequently (with the current scale 30 subs/sec, but this can also go up 10-100x in the future). Calculating with 100 subs/sec, that means that the Workflow has to ContinueAsNew roughly every minute.
- If subscribers are ‘topic-based’ long running Workflows, then they will also be forced to Continue-As-New due to a high number of incoming events.
Then the question which comes to me: what can be a reasonable rate of Continue-As-New operations for a single long-running workflow? What are the consequences when I have a workflow which is rotated via Continue-As-New often (let’s say every minute or even more frequently). Or what is the rate when we can say that the design needs to be revised to something more sane or to give up features in favor of scalability? Or is it a poor idea to reinvent something Kafka-like via Temporal
?
Sorry for the generic question, but I think it might be of generic interest to understand the behavior of Temporal under high volume - low complexity events.
Best
Andras