Temporal Workflow/Activity cancellation non determinism

Looking at the documentation regarding workflow/activity cancellation for the .NET SDK and I noticed that it creates a new CancellationTokenSource to create a token detached from the original workflow token since it has already been cancelled:

However, earlier in the documentation it talks about non-determinism in .NET and to not use CancellationTokenSource in your workflows:

Could someone explain this to me? Is it that once a workflow has been cancelled, you can use a CancellationTokenSource?

I’m currently trying to cancel a workflow but every time it starts, I get a non-determinism error before it can even run the activity I want to use to clean up.

I have the following activity which fails:

[Activity]
public async Task<int> SaveSomeData()
{
    ActivityExecutionContext.Current.Heartbeat();
    
    // This line fails which is when I want to trigger my cancellation
    var recordsSaved = await myRepository.SaveDataAsync(ActivityExecutionContext.Current.CancellationToken);

    return recordsSaved;
}

When I trigger the cancellation, my workflow responds accordingly:

try
{
    ...
}
catch (Exception e) when (TemporalException.IsCanceledException(e))
{
    Workflow.Logger.LogError(e, "Cancellation requested. Performing cleanup.");

    // We use a different cancellation token since the default one on Workflow.CancellationToken
    // is now marked cancelled.
    using var detachedCancelSource = new CancellationTokenSource();
    // Get non determinism error here
    await Workflow.ExecuteActivityAsync((MyActivities act) => act.CancellationCleanup(),
        new ActivityOptions
        {
            StartToCloseTimeout = TimeSpan.FromMinutes(5),
            CancellationToken = detachedCancelSource.Token
        });

    // Rethrow the cancellation
    throw;
}

I have also tried the sample here and am getting the same non-determinism error when cancelling the workflow:

It doesn’t say not to use CancellationTokenSource, it specifically states not to use “timeout-based CancellationTokenSource”. This is because timeout-based cancellation sources are based on system timers which are non-deterministic. Use of CancellationTokenSource is acceptable, especially in the way you’re using it (detached for cleanup).

This should not happen. Can you show the details of your non-determinism error? We will also try to replicate.

This is the stacktrace that was generated:

   at System.Environment.get_StackTrace()
   at Temporalio.Worker.WorkflowTracingEventListener.OnFrameworkEventWritten(WorkflowInstance instance, EventWrittenEventArgs evt)
   at Temporalio.Worker.WorkflowTracingEventListener.OnEventWritten(EventWrittenEventArgs evt)
   at System.Diagnostics.Tracing.EventSource.DispatchToAllListeners(EventWrittenEventArgs eventCallbackArgs)
   at System.Diagnostics.Tracing.EventSource.WriteEventWithRelatedActivityIdCore(Int32 eventId, Guid* relatedActivityId, Int32 eventDataCount, EventData* data)
   at System.Diagnostics.Tracing.FrameworkEventSource.WriteEvent(Int32 eventId, Int64 arg1, Int32 arg2, String arg3, Boolean arg4, Int32 arg5, Int32 arg6)
   at System.Diagnostics.Tracing.FrameworkEventSource.ThreadTransferSendObj(Object id, Int32 kind, String info, Boolean multiDequeues, Int32 intInfo1, Int32 intInfo2)
   at System.Threading.TimerQueueTimer.Change(UInt32 dueTime, UInt32 period)
   at System.Threading.Timer.Change(Int32 dueTime, Int32 period)
   at NLog.Targets.Wrappers.AsyncTargetWrapper.StartLazyWriterThread(Boolean instant)
   at NLog.Targets.Wrappers.AsyncTargetWrapper.StartTimerUnlessWriterActive(Boolean instant)
   at NLog.Targets.Wrappers.AsyncTargetWrapper.StartLazyWriterTimer()
   at NLog.Targets.Wrappers.AsyncTargetWrapper.Write(AsyncLogEventInfo logEvent)
   at NLog.Targets.Wrappers.AsyncTargetWrapper.WriteAsyncThreadSafe(AsyncLogEventInfo logEvent)
   at NLog.Targets.Target.WriteAsyncLogEvent(AsyncLogEventInfo logEvent)
   at NLog.LoggerImpl.WriteToTargetWithFilterChain(Target target, FilterResult result, LogEventInfo logEvent, AsyncContinuation onException)
   at NLog.LoggerImpl.Write(Type loggerType, TargetWithFilterChain targetsForLevel, LogEventInfo logEvent, LogFactory logFactory)
   at NLog.Internal.TargetWithFilterChain.WriteToLoggerTargets(Type loggerType, LogEventInfo logEvent, LogFactory logFactory)
   at NLog.Logger.WriteToTargets(Type wrapperType, LogEventInfo logEvent, ITargetWithFilterChain targetsForLevel)
   at NLog.Logger.Log(Type wrapperType, LogEventInfo logEvent)
   at NLog.Extensions.Logging.NLogLogger.Log[TState](LogLevel logLevel, EventId eventId, TState state, Exception exception, Func`3 formatter)
   at Microsoft.Extensions.Logging.Logger.<Log>g__LoggerLog|14_0[TState](LogLevel logLevel, EventId eventId, ILogger logger, Exception exception, Func`3 formatter, List`1& exceptions, TState& state)
   at Microsoft.Extensions.Logging.Logger.Log[TState](LogLevel logLevel, EventId eventId, TState state, Exception exception, Func`3 formatter)
   at Temporalio.Worker.ReplaySafeLogger.Log[TState](LogLevel logLevel, EventId eventId, TState state, Exception exception, Func`3 formatter)
   at Microsoft.Extensions.Logging.LoggerExtensions.Log(ILogger logger, LogLevel logLevel, EventId eventId, Exception exception, String message, Object[] args)
   at Microsoft.Extensions.Logging.LoggerExtensions.LogInformation(ILogger logger, String message, Object[] args)
   at InstrumentMicroservice.Command.DataAccess.Temporal.Workflows.DataLoad.MyWorkflow.RunAsync() in C:\LGIM\code\c#\ilid-instrument-command-service\src\InstrumentMicroservice.Command.DataAccess\Temporal\Workflows\DataLoad\MyWorkflow.cs:line 17
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
   at System.Runtime.CompilerServices.TaskAwaiter.<>c.<OutputWaitEtwEvents>b__12_0(Action innerContinuation, Task innerTask)
   at System.Threading.Tasks.TaskSchedulerAwaitTaskContinuation.<>c.<Run>b__2_0(Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
   at System.Threading.Tasks.Task.ExecuteEntry()
   at Temporalio.Worker.WorkflowInstance.RunOnce(Boolean checkConditions)
   at Temporalio.Worker.WorkflowInstance.Activate(WorkflowActivation act)
   at Temporalio.Worker.WorkflowWorker.<>c__DisplayClass11_1.<HandleActivationAsync>b__3()
   at System.Threading.Tasks.Task`1.InnerInvoke()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)

I got this using this dotnet sample samples-dotnet/src/ActivityHeartbeatingCancellation at 06303cde4888dcbe929170dba51151404687204a · temporalio/samples-dotnet · GitHub

That meaning I copied the workflow and activities directly from the sample without changing anything.

We were unable to replicate any detached cancellation issues. We noticed there wasn’t a clear test here so we opened a PR to add one.

This is a different error/situation. This is caused because something called inside the workflow is leaving the workflow task scheduler. In this case it looks like the logger implementation starts a thread which our non-determinism event listener catches as non-deterministic.

We do not current have a way to disable these safe-threading checks for specific parts. To workaround currently, you can either set TemporalWorkerOptions.DisableWorkflowTracingEventListener to true or you can see if NLog can eagerly start that thread instead of lazily. We may be able to add an enhancement request to disable the safe-threading check for certain portions of code such as loggers.

Using the DisableWorkflowTracingEventListener option seemed to do the trick. Would you be able to explain what this option does?

From the docs section on .NET task determinism:

In order to help catch wrong scheduler use, by default the Temporal .NET SDK adds an event source listener for info-level task events. While this technically receives events from all uses of tasks in the process, we make sure to ignore anything that is not running in a Workflow in a high performant way (basically one thread local check). For code that does run in a Workflow and accidentally starts a task in another scheduler, an InvalidWorkflowOperationException will be thrown which “pauses” the Workflow (fails the Workflow Task which continually retries until the code is fixed). This is unfortunately a runtime-only check, but can help catch mistakes early. If this needs to be turned off for any reason, set DisableWorkflowTracingEventListener to true in Worker options.

1 Like

We’ve switched to a different logger implementation now that does not spawn lazy threads and the cancellation is working like a charm! Thanks for all your advice @Chad_Retz it’s greatly appreciated.

1 Like