Order of thread execution

I understand workflow threads operate using cooperative multithreading, so the thread is not pre-empted until it yields control. I also understand that await conditions are re-evaluated after a thread makes progress, but I am curious about the following potential race condition.

Let’s say update1 is waiting on a condition. Workflow thread sets the condition which will break the wait of update1. Another update2 came in and is waiting to execute. Is it also guaranteed that update1 will be executed before update2 since it was affected by the progress of the workflow thread?

The event loop diagram ( Handling Signals, Queries, & Updates | Temporal Platform Documentation ) just states that after the workflow thread yields the loop will progress any signals/updates. So it seems possible that update2 can start executing before update1.

This is the logic of the code in question with the condition I’m wondering about.

boolean finishUpdate = false;

void workflow() {
    // stuff
    finishUpdate = true;
    // activity
}

String update1() {
    Workflow.await(() -> finishUpdate);
    return "update1 done";
}

String update2() {
    finishUpdate = false;
    return "update2 done";
}
  1. update1() is called using executeUpdateWithStart.
  2. update1() calls await and yields control.
  3. workflow() runs and sets finishUpdate = true.
  4. workflow() calls an activity and yields control.
  5. update2() arrives and is waiting to be run.
  6. update1() await condition is re-evaluated after workflow() made progress,
    but is not yet scheduled for execution.
  7. update2() gets scheduled before update1().
  8. update2() sets finishUpdate = false, finishes execution and yields control.
  9. update1() await condition is re-evaluated after update2() made progress.
  10. finishUpdate = false and update1() hangs.

Can step 6 happen or is the thread always run after re-evaluation? If so, what is the order if multiple threads are affected by the condition?

Steps 1 through 4 are a single workflow task, and you should move 6 to 4.5 there and phrase it as “update1() await condition is reevaluated (like all wait conditions every event loop tick) so update1()’s coroutine is immediately completed in that same task”.

Your code would have update1 completed in the same task it was started execute-update-with-start’ed in. But yes, it is possible depending on situation that a var can get set to one thing and then set to another before a wait condition is evaluated if that’s what you are asking.

Ok so if I understand correctly, in this case step 6 (the await condition re-evaluation) will always happen after step 4, since step 4 is going to finish the workflow task. This makes sense and I think this was answered to me in a previous question I asked.

You also are saying update1() coroutine will immediately be completed, but we are using Java so my understanding is it has different thread for each update/signal/workflow which it will schedule once a thread makes progress and yields control. Will it be the same in Java? After the await is re-evaluated, this update thread will guaranteed to be scheduled next even if a signal comes in?

I guess more generally, let’s say there are multiple update threads which are awaiting a condition and other threads which are not. The workflow thread updates that condition which would cause the awaits to break. Are the threads awaiting the condition guaranteed to be scheduled before the threads which are not awaiting the condition? And if so, is there a guaranteed ordering among those which are awaiting?

Yes, this should be the same for Java. There is nothing that “comes in” while the workflow task is being processed, though technically the server may reject certain workflow completion situations if SDK tries to complete a task with a workflow completion and something “comes in” server-side while task was processing.

A workflow task for the most part is a CPU bound quick set of work to run the event loop until everything is waiting on external stimulus and no wait conditions evaluate to true.

While the ordering is guaranteed and deterministic, it’s not necessarily predictable (usually coroutines are processed in awaited order). But wait conditions are evaluated on each event loop after all coroutines are processed, and therefore we will not complete the workflow task if any wait conditions are satisfied. So it will always be satisfied after the field is set in that same workflow task (even if it may have also been evaluated before the field was set and was therefore false).

I hope that helps.

Hey thanks Chad, I think this makes sense. My understanding was that the SDK could receive an incoming signal/update and create a thread for the execution while the workflow task was executing, it just wouldn’t be executed until the next event loop tick. Maybe my understanding of how the signals get received by the server and made available to the workers is incorrect. If a signal arrives at temporal server in the middle of a workflow task or activity, is it only made available on the queue once the task/activity is completed?

While I am curious about the thread scheduling, the only thing I really needed to know was that if we had an update awaiting a condition and the workflow task changed that condition, would that be guaranteed to be run next before any signal/update that is not currently awaiting that condition.

This is not correct. The task completes and then the signal comes on a successive task (except for instances where the task completion has a workflow completion, which will be rejected by server, causing the task to be retried with the signal this time at least at first, rejecting future signals from user side if they keep coming).

This statement is mixing up wholly unrelated activity tasks with workflow tasks. Signals come in workflow tasks (as do activity completions, timer firings, etc), it makes no difference whether activities are running.

Only if the next signal/update is in a successive workflow task. If multiple signals/updates are being processed in the same task, we may not explicitly guarantee order of wait condition vs other coroutine scheduling within a task.

But you should not author your workflow as if whether an update arrives on a successive task or same task matters. In your original code, technically a workflow can receive both update1 and update2 updates before workflow is called.

Thanks for this information Chad. I’m still not fully understanding how/when signals can be received by server and worker. Is there any documentation or learning you know of that goes in depth on this topic?

Events and Event History | Temporal Platform Documentation touches on the loop, Handling Signals, Queries, & Updates | Temporal Platform Documentation touches on the processing aspect, and in general our course at Temporal 102: Exploring Durable Execution | Learn Temporal covers how the event processing works, and our course at Interacting with Workflows | Learn Temporal covers signals in particular.

From the latter link, the video for the Java course section “Signals in Your Event History” may be particularly enlightening: https://www.youtube.com/watch?v=k7SVLvAsP-Y (though it is about workflows passing signals to each other, but it still helps understanding signal eventing and workflow tasks towards the end of the video where it talks about the recipient workflow).

I am not sure there are any low-level docs about how signals and workflow tasks interact (signals are not much different than any other event).

Thank you again Chad. I will go through these resources and ask some more questions if needed.

Is a signal always accompanied by a workflow task when looking at the history? And there can be multiple signals with a single workflow task that will execute all of the signals?

Yes (if it gets processed as normal, there are terminations and non-worker-running scenarios that can affect this)

Yes, along with any other events that need to be processed since previous task

Ok I think I’m understanding where I was confused. Let me make sure.

A signal can come to temporal server at any time during a workflow execution, but it only gets picked up by the worker during the first step of the event loop tick. So if a signal arrives at the temporal server after ‘Workflow Task Started’ and before ‘Workflow Task Complete’, it only shows up in the worker (and in the history) once the workflow task completes and the event loop goes back to step 1 (Process Signals and Updates until blocked).

Now at step 1 of the event loop the worker receives any pending signals/updates from the server and processes them first as part of a workflow task. As a part of the same workflow task, it will also try and progress the workflow (step 2 of event loop).

After any of these threads make progress, all await conditions will be re-evaluated. If re-evaluation causes any of these threads to unblock, they will be progressed immediately. Even if it is a signal that is unblocked, it will not wait until step 1 of the event loop to process it since it is an existing signal. It will immediately continue the coroutine until it is blocked again.

Did I get anything wrong?

Mostly correct, though wouldn’t think of it as “picked up during first step of the event loop tick”. Rather, the event loop only runs as the result of a task from the server, and it runs until all yielded (including wait conditions).

Ok understood. The event loop step doesn’t actually pull tasks from the server it gets invoked as a result of tasks from the server.

I just want to be doubly sure on 2 things:

  1. Once the event loop is invoked with ‘N’ signals/updates, any new signals/updates will not run until next event loop tick.
  2. Any signal/update whose await condition changes during the current event loop tick will get continue before the end of the current event loop tick and before the new signals/updates start to process.

I would rephrase this as “once the task is received with N signals/updates, any new signals/updates will not run until the next task (except in certain workflow completion situations where it may fail-then-retry the final task)”. The event loop has nothing to do with this part, it’s just an implementation detail of task processing.

Kinda. A simpler way of saying it is “a workflow task will not complete successfully while any wait conditions evaluate to true”.

Ok I think this is very clear now. Thank you so much for your time and for clarifying all of this for me!!

1 Like