Similarities / Differences between Temporal and Flink Stateful Functions

f0rr0 · August 6, 2020, 1:29pm

I’ve been evaluating tools for my use case and I came across Apache Flink: Stateful Functions — Event-driven Applications on Apache Flink

How does Temporal compare to Flink stateful functions?

I sort of understand that Temporal is more about orchestrating different (often third party) microservices in a dynamic (or static) workflow. Event processing on the other hand deals with high throughput applications which is not a focus of Temporal.

Flink stateful functions however are not restricted to DAG based workflows and also support remote execution. (Although function startup latency for a FaaS is another issue).

In an application where workflow execution is being triggered in response to events generated at scale (2-3M peak concurrency) and desired throughput (which also depends on downstream services) is in 100s of thousands, how will Temporal and a system like Flink stateful functions compare?

maxim · August 6, 2020, 3:33pm

Disclaimer: The last time I really worked with Flink was more than 3 years ago. So I’m not an expert on the current implementation. So correct me if I’m misrepresenting it.

Flink stateful functions are exactly what they are called. They are functions that can explicitly load and store state through the provided API. They are closer to Akka and Microsoft Orleans than to Temporal. They give you ability to send asynchronous events to other functions by their business ID but don’t help you much with implementing complex business interactions. You are still responsible for programming your business logic fully asynchronously the way you usually do with normal RPC services and databases.

Temporal offers a much higher level of abstraction for developers. There is no explicit persistent state management as all state of a workflow including local variables and stack is always preserved. It allows writing synchronous code that blocks on external operations for an unlimited amount of time.

I don’t know much about internals of how stateful functions are implemented, but my uneducated guess is that they don’t use the optimizations that standard Flink stream processing uses to reduce number of checkpoints by replaying Kafka streams. So they don’t offer many advantages over Temporal in the number of IOPS executed on updates. It would be interesting to see how they perform given similar hardware.

At the current point, I would advise not using Temporal for such a high rate of events. The architecture allows it, but we never had business need or hardware capacity to perform any testing for such a high scale. I also would double-check if the reliability and consistency guarantee that Temporal offers really needed for use case that requires 3 million events per second. Usually, such high rate use cases can tolerate some data loss and inconsistency.

One clarification is that Temporal is an awesome fit for event processing when there is large number of business entities and each of the entities doesn’t get high rate of events. For example it is OK to have hundreds of millions of workflows each individual entity receiving no more than a few requests per second at peak. The use case it doesn’t support (and Flink works much better at this point) when a single entity has to aggregate high rate of events.

f0rr0 · August 6, 2020, 4:02pm

@maxim thank you for a prompt and comprehensive answer.

Can you please illustrate the above point through an example? I am not sure if I understand it correctly.

As far as I understood → it is okay to have many workflows but each of these unique workflows should be expecting a limited number of peak execution requests.

maxim · August 6, 2020, 7:21pm

For example you can implement a loyalty program (airline points style). Each member would have an always running associated workflow that would receive the notifications (in the form of signals) about each relevant event (like a new completed flight), calculate the point according to the business logic possibly calling to other services. And then take appropriate action based on the number of points or timers. For example notify other services about customer reaching certain status or giving him promotion at the end of the month based on the number of his points.

Topic		Replies	Views
Workflow function with loops and long sleeps within each iteration Community Support general-impl	4	2545	September 27, 2022
Why use Temporal over a combination of AWS Step Functions and AWS Lambda? Tech Comparisons use-case-validation , comparisons	11	23502	December 17, 2024
Temporal vs Akka and Lagom Community Support general-impl	7	3842	June 26, 2022
Temporal Vs Camunda 8 Community Support java-sdk	5	6673	February 22, 2024
Temporal compared to Airflow Tech Comparisons comparisons	2	10160	January 23, 2023

Similarities / Differences between Temporal and Flink Stateful Functions

Related topics