Versioning activity methods

Hi, we’re using Java SDK (with kotlin if that matters).
I’d like to ask about versioning changes in the activities.

My approach to versioning a change to someFun as of now was:

  1. Introduce new @ActivityMethod named someFunV2
  2. use Workflow.getVersion inside worklofw and call someFunV2 if we’re not on the default version of said change.

Now, this causes error during deployment phase (say 3 different instances of application with the workflow definition.

I have assumed that the issue can be cause by

  1. new workflow version starting on one instance
  2. said workflow then executing on an older instance that has not yet been terminated during deployment

i.t.f.ApplicationFailure: message='Activity Type "someFunV2" is not registered with a worker. Known types are ...

But looking at the logs, the inverse is also true.

The workflow started on old instance and then received above mentioned errors after it executed on a new instance.

  1. My first question then is, how to prevent this and why would it happen? The workflow has started on an old instance, the execution of the someFunV2 is in else branch
if (Workflow.getVersion("some-change-id-59f46953", Workflow.DEFAULT_VERSION, 1) == Workflow.DEFAULT_VERSION) {
} else {

Thus my understand is that the Workflow would have DEFAULT_VERSION of said change and the else branch would be unreachable.

  1. My second question would be how to avoid similar situation in the reverse order. Is that scenario even possible?
    What I mean, the workflow starts on a new instance. What happens if it then executes on an old instance? Or is that not possible and somehow the Workflow.getVersion is checked before it would be replayed there?

Now, back to the original question: are we using the versioning wrong? Or is it no supported to have multiple versions of workers deployed at the same time (we don’t do that long term, but the rolling deployment of N instance takes some time)?

If you are doing rolling deployments, workflow replay on old/not yet updated worker will fail as you mentioned. This failure is going to cause the workflow task to be retried up until it gets assigned to one of your new/updated workers during the restart.
For rollin deployments some task failures are to be expected (when workflow code has changed).

I would understand it in the way you’re describing but that is not the behavior I am seeing.

  1. The failure occurs on an updated worker as well. The one with new activity implementation.

  2. It does not necessarily gets “retried up until it gets assigned to a new one”. I did not change the RetryOptions so the maximumAttempts should be infinite if I read the javadoc correctly but it fails the workflow eventually. I see some failures that clearly get retried but after about 30 seconds (the timeouts for activity tasks here are set to 1 minute) it terminates the workflow execution

    Should I manually set the maximumAttempts to some arbitrary huge number for this case?

  3. As I was typing I’ve noticed something very strange in logs. These are logs, filtered to one of the workflow Ids. Somehow some of the errors got logged just now, even though the workflow already failed yesterday. I am not sure if that is useful information (in case that what I am describing in 1) or 2) is not expected behavior.

    It is also possible that the workflows failed for a different reason (there was a bug that made it fail on a updated worker in some cases) - is it possible that the events just got somehow “lost”, just like the logs, and the workflow in first picture actually failed for a different reason? Because I don’t see this many failures in the history of the workflow.

  1. Should I manually set the maximumAttempts to some arbitrary huge number for this case?

I would keep the default of infinity for maximumAttempts. We recommend relying on ScheduleToStart timeout to limit the duration of retries. In most cases, you don’t want to limit this duration either.