Recommended approach to testing backwards compatibility?

Hi! I want to ensure changes to my workflows are backwards compatible with existing running workflows in temporal. I know we can use the temporal versioning API to version workflow logic, but inevitably I miss cases where logic should be versioned and running workflows break after rollout.

Is there a recommended pattern for preventing bad rollouts that may break workflows due to backwards incompatible code? Currently the approach I’ve come up with is to download the workflow histories for all running workflows and run them using the workflow replayer against the new worker code - it will throw a replay/nondeterminism exception if they aren’t compatible and I can go back and version the code that I missed.

There is a slight caveat though — the workflow replayer injects a static WorkflowInfo object (sdk-java/WorkflowExecutionHistory.java at 37a0e463fae2312882b5c946f423a0dbf85f9666 · temporalio/sdk-java · GitHub) which can break the replay if the code under test relies on that metadata (e.g. we can’t replay a child workflow which signals back to the parent because the parent workflow id will not match).

I understand we could persist workflow histories in json and run unit tests against them but my concern is that they may not necessarily stay in sync with workflows in production — a pre-release compatibility check with fresh workflow histories seems like the safest way to ensure the workflows won’t break.

The other thing I was thinking of trying was to use ContinueAsNew to discard the old event histories before rolling out to avoid having to version more than just 1 version previous to a release - wonder if that’s something you’d recommend?

Hey there,

So what you’re doing right now by downloading and replaying histories makes pretty good sense. With respect to that WorkflowInfo issue in Java, not sure what that’s about. At first blush it looks wrong to me but @spikhalskiy can comment.

When it comes to the general approach we’re actively working on making things better, see the proposal here: https://github.com/temporalio/proposals/blob/master/versioning/worker-versions.md

It has a section about “safe rollout” which, once we get to implementing it, will automatically do something along the lines of what you’re doing now by hand (as well as possibly some other more advanced techniques). So stay tuned for improvements here!

use ContinueAsNew to discard the old event histories before rolling out

That’s also a pretty decent technique. If you structure your workflows such that all your state is contained in one serializable object, you can pass it along as input when you continue as new, and pick up where you left off however you would like to.