Workflow versioning and worker deployment

skrul · December 3, 2021, 5:41pm

Hey all –

I am just starting to learn about workflow versioning and I have a question regarding how it works in conjunction with worker deployment.

Here is the scenario I am thinking about – say I have a workflow that i’m adding a new activity call to, and this new activity is wrapped in a getVersion() check. During the deploy of this change to my workers, production traffic would is routed to both workers with “old” code and workers with “new” code. While in this state, consider a workflow started on a worker with new code, it makes progress and writes the version marker + new activity into the workflow history. Now, if for some reason this new worker dies and the workflow is rescheduled on an old worker, replay on the old worker will fail because the old code does not have the version check + new activity.

Is there a mechanism that prevents this scenario? Or am I missing something here?

Thanks for any help!

tihomir · December 3, 2021, 6:21pm

Are you maintaining worker pools per version on different task queues?

With workflow versioning you can have any number of versions running from the same worker. It also allows you to update code while workflows are running (for example to fix bugs on already started workflows). Note that with Temporal you don’t version the entire workflow, but just parts of your workflow code independently.

maxim · December 3, 2021, 6:39pm

The replay on the old worker will fail which is going to cause the workflow task to be retried until it hits the new worker. So some workflow task failures are expected during rolling worker deployment when workflow code has changed.

skrul · December 3, 2021, 7:03pm

Thanks for the responses!

@maxim do these failures look like typical “non deterministic workflow” failures? or does the system see the version marker without corresponding call to getVersion() and report the failure differently?

maxim · December 3, 2021, 7:05pm

@maxim do these failures look like typical “non deterministic workflow” failures? or does the system see the version marker without corresponding call to getVersion() and report the failure differently?

It depends on the SDK. The Java and Go I believe are going to report typical “non deterministic” failures. The new Typescript SDK is going to report it better as we had redesigned the version related API for it.

skrul · December 3, 2021, 7:05pm

@tihomir no, not doing anything fancy with task queue versioning – trying to do this the prescribed way with workflow versioning (e.g. calls to Workflow.getVersion())

Topic		Replies	Views
Need help with temporal versioning production issue Community Support java-sdk , general-impl , versioning	1	146	November 26, 2024
Workflow Versioning + Changes to workflow itself Community Support go-sdk	4	1208	July 31, 2022
Versioning activity methods Community Support java-sdk	3	706	March 1, 2022
Versioning workflow changes Community Support versioning	6	1528	January 16, 2022
Workflow versioning corner cases Community Support versioning	2	1972	April 16, 2021

Workflow versioning and worker deployment

Related topics