Similarities / Differences between Temporal and Flink Stateful Functions

I’ve been evaluating tools for my use case and I came across

How does Temporal compare to Flink stateful functions?

I sort of understand that Temporal is more about orchestrating different (often third party) microservices in a dynamic (or static) workflow. Event processing on the other hand deals with high throughput applications which is not a focus of Temporal.

Flink stateful functions however are not restricted to DAG based workflows and also support remote execution. (Although function startup latency for a FaaS is another issue).

In an application where workflow execution is being triggered in response to events generated at scale (2-3M peak concurrency) and desired throughput (which also depends on downstream services) is in 100s of thousands, how will Temporal and a system like Flink stateful functions compare?

Disclaimer: The last time I really worked with Flink was more than 3 years ago. So I’m not an expert on the current implementation. So correct me if I’m misrepresenting it.

Flink stateful functions are exactly what they are called. They are functions that can explicitly load and store state through the provided API. They are closer to Akka and Microsoft Orleans than to Temporal. They give you ability to send asynchronous events to other functions by their business ID but don’t help you much with implementing complex business interactions. You are still responsible for programming your business logic fully asynchronously the way you usually do with normal RPC services and databases.

Temporal offers a much higher level of abstraction for developers. There is no explicit persistent state management as all state of a workflow including local variables and stack is always preserved. It allows writing synchronous code that blocks on external operations for an unlimited amount of time.

I don’t know much about internals of how stateful functions are implemented, but my uneducated guess is that they don’t use the optimizations that standard Flink stream processing uses to reduce number of checkpoints by replaying Kafka streams. So they don’t offer many advantages over Temporal in the number of IOPS executed on updates. It would be interesting to see how they perform given similar hardware.

At the current point, I would advise not using Temporal for such a high rate of events. The architecture allows it, but we never had business need or hardware capacity to perform any testing for such a high scale. I also would double-check if the reliability and consistency guarantee that Temporal offers really needed for use case that requires 3 million events per second. Usually, such high rate use cases can tolerate some data loss and inconsistency.

One clarification is that Temporal is an awesome fit for event processing when there is large number of business entities and each of the entities doesn’t get high rate of events. For example it is OK to have hundreds of millions of workflows each individual entity receiving no more than a few requests per second at peak. The use case it doesn’t support (and Flink works much better at this point) when a single entity has to aggregate high rate of events.

1 Like

@maxim thank you for a prompt and comprehensive answer.

Can you please illustrate the above point through an example? I am not sure if I understand it correctly.

As far as I understood -> it is okay to have many workflows but each of these unique workflows should be expecting a limited number of peak execution requests.

For example you can implement a loyalty program (airline points style). Each member would have an always running associated workflow that would receive the notifications (in the form of signals) about each relevant event (like a new completed flight), calculate the point according to the business logic possibly calling to other services. And then take appropriate action based on the number of points or timers. For example notify other services about customer reaching certain status or giving him promotion at the end of the month based on the number of his points.