Workflow Maintainability: Abstract into a state machine vs. standard workflow function

nathan815 · February 9, 2023, 7:01am

Hi all! Wanted to discuss the topic of state machines and adding an abstraction on Temporal.

I’ve recently joined a sub-team within the product I work on, and we’re in the works of planning a rewrite of workflows utilizing Temporal, and then introducing it to the broader organization.

There is a belief by some in the team that workflow-as-function would become too complex over time and thus it requires a further abstraction. One concern they had is that if the workflows are written as standard functions they might eventually grow to hundreds/thousands of lines and become too complex to unit test.

Their abstraction idea for this is a sort of state machine framework around Temporal, to 1) allow developers to write the workflow without much knowledge on Temporal and 2) attempt to make complex workflows easier to maintain. In this framework, developers will have to define the business logic across separate “states” where each state is a struct implementing an interface:

Execute
GetNextState
GetStatus - returns workflow status while in this state (separate from Temporal’s workflow status)
IsFinalState

Then, there is a generic Temporal workflow which loops and calls an activity which calls the current state’s Execute function, then calls GetNextState to determine the next state.

Personally I’d rather write workflows as regular functions and use standard programming organization techniques (like breaking it up into smaller functions) to manage complexity, but I will need to convince others of this

I’d like to get any thoughts from others in the Temporal community/company on this sort of idea. Has anyone tried something like this yet? Would a state machine framework/abstraction like this make workflows easier to maintain in practice?

And lastly, are there any examples of some more complex Temporal workflows out in the open that we could reference?

maxim · February 9, 2023, 12:38pm

If state machines were a better way to organize complex code, then all software, including the Linux kernel, would be written as them. I’m not claiming that there are no situations when they are useful. These are when an event can apply to many states and requires different handling in each state. BTW the Temporal SDKs heavily rely on state machines internally. Activity cancellation is handled very differently if the ScheduleActivityTask command wasn’t sent to the service yet or the activity already started. Here is the activity state transitions diagram:

I’m yet to encounter a situation in which a business application would benefit from state machine notation as opposed to logic specified in a programming language directly. Most of the complexity of nontrivial workflows is not in the sequencing of function calls. It is in executing multiple branches and callbacks in parallel and in state representation. State machine based solutions use a global bag of properties as shared data. It creates a very tight coupling between different parts of the application. At some point, this complexity makes it practically impossible to maintain. IMHO the only advantage of state machines is the ability to generate a diagram from it. But in my experience, this benefit works in trivial cases. Any nontrivial code has too many possible states and the diagram becomes too complex to make the sense of.

Use standard language techniques to deal with complexity. Temporal allows unit testing individual classes and methods.

There is a reason Temporal is popular. And it is that it uses durable execution abstraction that is very generic and scales to any complexity. State machines as a way to specify workflows existed for 50+ years. How many of them are really popular and used to write complex business flows?

nathan815 · February 11, 2023, 3:48am

Thanks for the reply, Maxim. Good point about most other complex software not using state machines.

Most of the complexity of nontrivial workflows is not in the sequencing of function calls. It is in executing multiple branches and callbacks in parallel and in state representation.

I think this hits the nail on the head. And Temporal pretty much takes care of all these complexities.

David_Khourshid · February 16, 2023, 12:24am

@maxim I hate to be a devil’s advocate here, but if state machines weren’t the best solution for authoring workflows, then why do most workflow tools (most of which are more popular than Temporal) essentially use state machines to define workflows?

Zapier, n8n, Pipedream, Make, Tray.io, and numerous other tools - these are all basic, sequential state machines. It’s a bit disingenuous to claim that plain code is better for defining workflows, especially if you need to share & and explain that logic to the rest of the team, including non-developers.

State machines may not be explicitly mentioned everywhere, but they’re implicitly used in a lot of code, especially in many workflow tools. Code is not sufficient for understanding the logic clearly, especially as the complexity increases.

maxim · February 16, 2023, 6:06am

Using this logic any code is a state machine. Thus any Temporal workflow is indeed a state machine.

I’m talking about explicit state machines used to define application logic. None of the tools mentioned uses state machine definition language. They are more like abstract syntax trees with visual representation.

I’m also discussing tools for developers. None of these tools targets developers and can be used only for applications of very limited complexity.

nathan815 · February 16, 2023, 7:30am

I’d consider all those tools as essentially flow chart builders. They don’t utilize or expose the concept of finite state machines.

Lercher · February 19, 2023, 7:46pm

On „only“:

I agree about the non-benefit of real world diagrams, however for me state machines allow a program to reason about the code and e.g. easily discover and inspect available state transitions during runtime of a WF instance. So if you can live with the limitation of e.g. sequential execution and for some reason the workflows need to be customizable by poorly trained staff, a simple state machine can be a proper solution.

I created a DSL for that incl. an interpreter that uses temporal to execute such state machines and IMO the two play well together, even if the DSL hides a lot of temporal features. But that’s intentional. It makes it easier to use, assuming the limitations don’t hinder planned use cases.

maxim · February 19, 2023, 9:45pm

I believe it works well in your case because of the domain specificity. I love DSLs when they are domain specific which allows for hiding most of the workflow definition complexity. I have 0 objections to a DSL that uses state machine definitions if this is a good fit for the domain.

What I advise against is using state machine definitions as a general-purpose workflow definition language.

Topic		Replies	Views
Workflow and state machine concepts mapping. How? Community Support java-sdk , cadence	3	5084	April 19, 2021
Possible to Implement Finite State Machine using Workflows? Community Support workflow-options	23	4380	September 23, 2021
Design for coordinator workflow with potentially large history Community Support go-sdk , cassandra	2	963	August 31, 2021
Best Practice for Implementing Finite State Machines and Transducers Community Support	2	878	May 14, 2021
Support for serverless workflow Community Support	1	2248	July 27, 2023

Workflow Maintainability: Abstract into a state machine vs. standard workflow function

Related topics