Will temporal fit my scenario

Hello there!

Let’s assume we have around 20,000 requests per second (RPS) incoming as POST requests, with an average body size of 16 kB. Each request goes through a chain of rules (which I assume should be represented as activities in Temporal) that combine into a workflow.

A single request shouldn’t take more than 5 seconds to process, with the average processing time being around 100 milliseconds. Since activities are much faster, they can run in parallel.

Now, I have two scenarios:

  1. Packets are grouped by user session: If I understand correctly, I could use signals to route all packets from the same session to a single worker within the same workflow.
  2. Packets are individual transactions: These must be processed separately.

In the first case, I don’t need guarantees for restoring and replaying workflows, and I’m not overly concerned about losing some packets during sessions.
In the second case, restore and replay functionality could be useful, but I’m concerned it might consume too many resources.

From what I understand, my only option is to use Elasticsearch as the Temporal database. However, if I route requests directly to workflows through Temporal Services, it could consume a lot of database space. Additionally, we already have many rules that save packet information in Elasticsearch, and we want to avoid duplicates.

From the documentation, I’ve discovered a few techniques to reduce the impact:

  1. Avoid passing packets directly to workflows: Instead, store them in intermediate storage and restore them in the workflow by some ID (though I’m not sure this will help much).
  2. Use local activities: This seems like a must-have in my case.
  3. Decrease event storage time to a minimum: For example, one day, though even that might be too much.

So, my questions are:

  1. Does anyone have experience using Temporal under similar conditions?
  2. Is Temporal a good fit for my requirements, or should I consider something else?
  3. How many resources could Temporal potentially consume in my case?

Let me know your thoughts!

I also want to add that my main goal is to rewrite the analytical core of our system, which is currently written in Node.js, while all other components are in Go.

I need a system that allows me to:

  • Define, change, and update rule chains.
  • Dispatch traffic efficiently.

Some rules in these chains are definitely not idempotent, so I need a solution that can handle this gracefully.

Temporal looks like a perfect solution for this, and my only concern is durability

I’ve also examined Redpanda Connect (or Benthos) before, but it doesn’t seem as mature or feature-complete as Temporal.

Thanks in advance.

I also want to add that my main goal is to rewrite the analytical core of our system,

It is hard to give concrete recommendations without understanding the type of aggregations and queries you have to support.

The main limitation is that a single workflow instance has limited throughput. So if your use cases require low throughput per instance (for example, an instance per user), then Temporal is a good fit. If you need high throughput (like global counters), Temporal might have issues with your workload.

I have a few follow-up questions to better understand your response regarding workflow instances. When you mention a workflow instance are you referring to individual workers, an API server, or a group of workers servicing a single topic?

If it’s the first or second case (individual workers or an API server), wouldn’t this simply be resolved through horizontal scaling? If it’s the third case (a group of workers servicing a single topic), I was under the impression that using a well distributed RequestID within the topic should properly distribute the load across workers. In the first scenario, wouldn’t it be possible to direct signals to the specific worker handling a particular session?

Regarding the workflow scenario, it involves aggregating data from the current request as well as historical data stored in microservices and databases. These historical data points are queried during the execution of the Activity.

Additionally, while reading the documentation and reflecting on it, I’ve come to the conclusion that the only acceptable solution for my case might be storing incoming packets in a separate distributed cache and minimizing both the retention time of events and the number of Activities. However, I’m still not entirely confident that this is the right approach, as many of the analytical rules are not idempotent and could return different results in the event of a restart.

I’m also currently considering the library goflow. While it’s a less mature solution compared to Temporal, it seems like it might offer better performance and lower database resource consumption in my specific scenario.

Thank you for your time and insights!

When you mention a workflow instance are you referring to individual workers, an API server, or a group of workers servicing a single topic?

None of the above. I meant Temporal workflow execution. If you have a money transfer workflow, a specific money transfer from account A to B is an execution (instance).

Temporal scales out with the number of parallel workflow executions. It cannot scale up the throughput of a specific execution.

Regarding the workflow scenario, it involves aggregating data from the current request as well as historical data stored in microservices and databases. These historical data points are queried during the execution of the Activity.

This description makes it unclear what type of aggregations are needed. What data should be stored vs preaggregated, and what is the write throughput of a single aggregation?

Sorry, I wasn’t clear enough. Let me explain.
I’m talking about a chain of analytical rules (which can dynamically change)—around 100 rules per request.
Each rule can:

  • Use data from the request directly
  • Enrich data from a database (mostly cached for reads)
  • Enrich data from stateless/stateful microservices

…and then perform logical, mathematical, or other operations on that data.

Each request results in roughly 10–20 database read requests (if there’s no cache) or zero otherwise, about 50 database write requests (we use Cassandra and batches to Elastic, so that’s manageable), and ~100 requests to various microservices.

Initially, I thought each rule should be a separate activity, but now I see it can only be a single combined activity—either local or remote, depending on the scenario.

I see. So, you don’t plan to store any data in the workflow? Then, having a workflow per request is fine. You use Temporal as a queue if you want to put all the processing in a single activity. Its features make it a very powerful queue, but I’m not sure if this is the best use of technology.

Got your point, thanks a lot! will do some practical researchers and come back with feedback