How do I implement a Feature Flag check that can be removed safely?

askreet · November 19, 2024, 1:30pm

We’ve been using Temporal for a while now and have developed a way to have global and per-user feature flags that control the execution of Workflows. In order to ensure that workflows execute consistently to completion, we check the value of the flag as an Activity so that it’s result is recorded in the workflow event history. We’ve developed a little helper function that does this check that looks like the following:

// this function is called at the workflow level to determine
// if a flag is set for a user / globally
func IsEnabled(ctx workflow.Context, name string) (bool, error) {
    // if the workflow executing is before we cared about this flag, assume it is not
    // set.
    v := workflow.GetVersion(ctx, name, workflow.DefaultVersion, 1)
    if v < 1 {
        return false, nil
    }

    var val bool
    err := workflow.ExecuteActivity(ctx, (*Activities).IsFeatureFlagEnabled, name).Get(ctx, &val)

    return val, err
}

The trouble comes in when we want to remove the flag from the system. We assume that it has been set to true for all users and globally, but removing this workflow function would mean determinism errors as in-flight workflow would suddenly not have a replay step for the IsFeatureFlagEnabled activity.

What we’ve been doing is something like the following as an interim step while all in-flight work completes (error handling elided for brevity):

// workflow source, before flag removal
useV2, _ := ff.IsEnabled(ctx, "my-flag-name")
if useV2 { newBehavior() } else { oldBehavior() }

// workflow source, during flag removal
version := workflow.GetVersion(ctx, "removeMyFlag", workflow.DefaultVersion, 1)
if v < 1 {
    // ensure we still do a "flag lookup" for in-flight workflows,
    // but assume they are always returning true, since the flag is
    // rolled out - post-release workflows will stop this check
    _, _ := ff.IsEnabled(ctx, "my-flag-name")
}
newBehavior()

// workflow source, after all in-flight workflows finish
newBehavior()

What I’d like is a programming model similar to workflow.GetVersion itself, where it’s safe for me to simply drop the feature flag check once I am sure no in flight workflows are using the old path, without the absence of an IsFeatureFlagEnabled activity causing a determinism error.

Are SideEffect or LocalActivityExecution a solution here, or do they both risk the same determinism error since they write to the event history as well? Is there some other solution I should look at?

maxim · November 19, 2024, 4:42pm

I might be missing some edge cases but I believe the following might work:

fun IsEnabled(ctx, flagName) bool {
    // This is not an activity, but a direct call to the flag API
    enabled := IsFlagEnabled(flagName)
    var version int
    if enabled {
        version = 2
    } else {
        version = 1
   }
   version := workflow.GetVersion(ctx, flagName, DEFAULT_VERSION, version)
   if version == 2 {
      return true
   } else {
      return false
  }
}

askreet · November 19, 2024, 5:03pm

Is ctx here a workflow.Context or an activity.Context in your example? Is the idea here that each execution would hit the flag API, but the call to GetVersion is cached so even if the flag flips or disappears the version from workflow.GetVersion is stable? I’ll have to think about this a bit.

But it sounds like you’re advocating for some direct interaction with APIs/DBs within the workflow executor if it helps in these cases. If that’s the case - should I also wrap it in a SideEffect to prevent multiple executions?

maxim · November 19, 2024, 5:11pm

workflow.Context. You cannot wrap this direct interaction in a SideEffect as removing it would break determinism. The trick is that the direct interaction result is used only as argument to GetVersion which will produce the same result even if the max version argument changes.

maxim · November 20, 2024, 5:00am

I forgot that GetVersion panics if the maxVersion is below the recorded version. Here is the corrected version of the function:

func IsEnabled(ctx workflow.Context, flagName string) (enabled bool) {
	// This is not an activity, but a direct call to the flag API
	flagEnabled := IsFlagEnabled(flagName)
	maxVersion := workflow.DefaultVersion
	if flagEnabled {
		maxVersion = 2
	} else {
		maxVersion = 1
	}
	// Ignore panic which happens when maxVersion is below the recorded version
	defer func() {
		if r := recover(); r != nil {
			// Handle the panic and set result to false
			enabled = false
		}
	}()
	version := workflow.GetVersion(ctx, flagName, workflow.DefaultVersion, maxVersion)
	if version == 2 {
		enabled = true
	}
	return
}

Note that the assumption is that the flag can be removed if there are no open (or even closed if there are queries against them) workflows that used a different value of the flag.

askreet · November 20, 2024, 11:26am

After looking at using this API directly in the workflow context, I’ve found no good option for dependency injection. I think I’ll keep this simple and just pass the set of flags into the workflow input where I need them, perhaps via a context propagator as a generic solution. This way I don’t have to read the flag database on every workflow task execution.

Topic		Replies	Views
Pass flag from temporal schedule trigger and receive in workflow Community Support go-sdk	0	15	October 10, 2024
Sharing values between workflow and activities Community Support go-sdk , context-propagators	20	4346	November 10, 2023
Workflow gets removed from temporal (can't find it by id after a while). Need help investigating/troubleshooting Community Support java-sdk , general-impl , local-activity	6	991	October 11, 2022
Removing patched function calls in workflow code Community Support typescript-sdk	3	532	September 26, 2023
Workflow versioning to exclude an activity from workflows triggered in future Community Support java-sdk	2	73	June 19, 2024

How do I implement a Feature Flag check that can be removed safely?

Related topics