Workflow versioning and worker deployment

Hey all –

I am just starting to learn about workflow versioning and I have a question regarding how it works in conjunction with worker deployment.

Here is the scenario I am thinking about – say I have a workflow that i’m adding a new activity call to, and this new activity is wrapped in a getVersion() check. During the deploy of this change to my workers, production traffic would is routed to both workers with “old” code and workers with “new” code. While in this state, consider a workflow started on a worker with new code, it makes progress and writes the version marker + new activity into the workflow history. Now, if for some reason this new worker dies and the workflow is rescheduled on an old worker, replay on the old worker will fail because the old code does not have the version check + new activity.

Is there a mechanism that prevents this scenario? Or am I missing something here?

Thanks for any help!

Are you maintaining worker pools per version on different task queues?

With workflow versioning you can have any number of versions running from the same worker. It also allows you to update code while workflows are running (for example to fix bugs on already started workflows). Note that with Temporal you don’t version the entire workflow, but just parts of your workflow code independently.

The replay on the old worker will fail which is going to cause the workflow task to be retried until it hits the new worker. So some workflow task failures are expected during rolling worker deployment when workflow code has changed.

Thanks for the responses!

@maxim do these failures look like typical “non deterministic workflow” failures? or does the system see the version marker without corresponding call to getVersion() and report the failure differently?

@maxim do these failures look like typical “non deterministic workflow” failures? or does the system see the version marker without corresponding call to getVersion() and report the failure differently?

It depends on the SDK. The Java and Go I believe are going to report typical “non deterministic” failures. The new Typescript SDK is going to report it better as we had redesigned the version related API for it.

@tihomir no, not doing anything fancy with task queue versioning – trying to do this the prescribed way with workflow versioning (e.g. calls to Workflow.getVersion())