Workflow fuzzing?

Is anyone doing workflow fuzzing (or ideally has done a blog post about it)?

Workflows seem like they have enough internal state and state transitions plus potential for exploitability (if the flow involves commerce, permissions, resources, etc.) to warrant fuzzing, and they seem lightweight enough computationally and constrained enough from an interaction standpoint to be amenable to fuzzing.

As a similar question (or maybe a precursor question?), is it realistic to think about wrapping a replay regression test harness around temporal that uses the existing event logs to automatically run regressions against historical event streams when checking in modified code?

I haven’t seen any efforts here, but I think it’s a good idea and have been putting some thought into it myself.

Like you said, they fit fuzzing requirements well (self-contained state and deterministic execution). In fact, I think with a coverage-guided fuzzer (AFL, upcoming fuzzer in go 1.18, etc), you’d reach the end of possible code paths quite quick. In many cases it might be only one code path, but in cases like Go, technically there are non-deterministic paths happening internally with goroutines even though the resulting workflow executions may have no conditional forks.

Besides just catching bugs w/ abnormal input, one of the neat values of fuzzing here would be to catch non-determinism. While one might naively think workflow-level code paths could be compared to ensure they are the same, technically “Temporal determinism” is not the same as runtime determinism, because Temporal workflows only require the steps are deterministic (so in Go you may do a non-deterministic map iteration just to count something for example which is safe by Temporal standards but not code-path wise).

Then, you reach the idea that you can just loop on the same input to check determinism…

Absolutely. I do this already for debug and others. Running a replayer against existing history for new code can definitely tell you whether alterations would have affected history.

Similarly, in many case you may want to have CI constantly replaying existing production history runs just to confirm you aren’t introducing determinism. Since there is no server involvement, using your existing workflow history as a test corpus is very reasonable.