What exactly is a "workflow task" and how often are they scheduled?

What exactly are “workflow tasks”? The documentation did not make that very clear to me. I ran a workflow that generated 50,000 events and terminated itself after hitting the hard limit, and I noticed 17,000 of these events were “WorkflowTasks”. Wondering how I can account for them in my throughput math, if at all.

@subhannadeem See the flow clearly where “Workflow Task” happens in the Event loop diagram → Workflows | Temporal Documentation

The definition is mentioned clearly in the docs → Tasks | Temporal Documentation

You can think of a workflow task as a unit of execution of your workflow code. It’s also the way for server and your workers to communicate together to complete your workflows.

Temporal server manages the overall workflow execution, persistence durable timers etc, but your workflow and activity code is executed by your workers.

A workflow task starts when the server says basically “hey workers I have completed my work so far for this workflow execution, its your turn now to tell me what to do next”. It ends when workers tell server "hey server I’ve executed all the code i can do for now, here are the commands i need you to perform so workflow exec can make progress. (or in case of last workflow task - hey server i have completed execution of all the workflow code here is the result).

Some operations that would cause your worker to block code execution and send commands to server are things like sleep, sync activity/child wf invocation etc.

We can look at a couple of simple code examples and corresponding workflow history maybe that would help.
Samples are in java but would apply to any sdk:

  1. Simple

// Workflow method impl / workflow function
public String simple(String name) {
   return "Hello: "  + name;
}

Corresponding wf history:

  1. WorkflowExecutionStarted (your client requested execution to be created and started)
  2. WorkflowTaskScheduled (server scheduled a task and placed it on the tq that client requested)
  3. WorkflowTaskStarted (your worker picked up the task)

at this point your worker is going to start executing workflow code, it will execute the return command. It then has to report back to the server that the execution has completed.

  1. WorkflowTaskCompleted (worker has completed wf execution and sent a CompleteWorkflowExecution command back to server, note this command is not recorded in history, but some others are which we will see in next sample).
  2. WorkflowExecutionCompleted (server has completed the execution, if there is a result the server reports the result to client if the client is waiting for it)

(will put next sample in another post)

1 Like

Activity (sync invocation sample):

// Workflow method impl / workflow function
public String syncactivity(String name) {
   String result = activities.doSomething(name);
   return result;
}

Corresponding history:

  1. WorkflowExecutionStarted
  2. WorkflowTaskScheduled
  3. WorkflowTaskStarted (worker started executing your workflow code)

At this point your worker executes the sync activity invocation line in your wf code.
Activity invocations are like remote procedure calls, they can be done on a completely different worker. So worker at this point has to block and complete the workflow task and send command(s) to the server, asking it to start invocation of your activity.

  1. WorkflowTaskCompleted
  2. ActivityTaskScheduled

4 and 5 here are important. WorkflowTaskCompleted means that worker has blocked your wf code exec and has sent the ScheduleActivityTask command to server. Server records the ActivityTaskScheduled event in history. Worker at this point is going to wait until it receives a new task on the task queue for this workflow execution that the activity invocation has completed, or failed or timed out.

  1. ActivityTaskStarted (server records that the activity worker again could be completely diffefent one has started activity invocation)
  2. ActivityTaskCompleted (server records the activity has completed)
  3. WorkflowTaskScheduled (server puts another task on tq for your workers to get and continue wf code exec)
  4. WorkflowTaskStarted (your worker picks up the task)

at this point your worker continues execution and just has the return statement left. It has to report to the server that the exec is done as in previous example.

  1. WorkflowTaskCompleted
  2. WorkflowExecutionCompleted
2 Likes

Activities (async) sample.

// Workflow method impl / workflow function
public String asyncactivities(String name) {
      Promise<String> res1 = Async.function(activities::doSomething, name);
      Promise<String> res2 = Async.function(activities::doSomethingElse, name);

       String r1 = res1.get();
       String r2 = res2.get();
       return r1 + r2;
}

History:

  1. WorkflowExecutionStarted
  2. WorkflowTaskScheduled
  3. WorkflowTaskStarted
  4. WorkflowTaskCompleted
  5. ActivityTaskScheduled
  6. ActivityTaskScheduled

4,5,6 are important here. So your worker has executed the three lines:
Promise res1 = Async.function(activities::doSomething, name);
Promise res2 = Async.function(activities::doSomethingElse, name);
which are two futures, and then
res1.get() is a blocking code which requires us to wait for the results of the res1 promise.
Events 5 and 6 are the events server wrote into history that correspond to the two ScheduleActivity commands that worker has sent to it, one for each of the async activity invocations we requested in code.

  1. ActivityTaskStarted (exec of activity started)
  2. ActivityTaskCompleted (exec of activity completed)
  3. WorkflowTaskScheduled (server asks workers to take a look at progress)
  4. ActivityTaskStarted (exec of activity started)
  5. ActivityTaskCompleted (exec of activity completed)

This is a little bit of an optimization that is done by Temporal, in this example the two async activities completed very fast. The first blocking call to res1.get(); was unblocked but since both activities completed they can be handled within the same workflow task. So the second blocking call res2.get(); was unblocked right away since the res2 activity result was already available in history.

  1. WorkflowTaskStarted
  2. WorkflowTaskCompleted

Worker has executed the return statement

  1. WorkflowExecutionCompleted (server records exec completed)

Hope this helps.

3 Likes

Also a StackOverflow post for this question: temporal workflow - What exactly is a Cadence decision task? - Stack Overflow

1 Like

Your explanation is very clear, thanks.

Thank you for the explanation, it’s very clear.

What happens if the Worker that was waiting for a ActivityTaskCompleted dies?

A worker cannot wait for ActivityTaskCompleted. The worker receives an activity task and then reports its completion to the service using the CompleteActivityTask API. If a worker dies before reporting task completion then task will timeout according to its StartToClose timeout. Setting this timeout to a high value (or not setting it at all which defaults it to ScheduleToClose) leads to situations when activities take a long time to be retried.

I would recommend you to read this history protocol to understand: Cadence Internals(2):Cadence Workflow History Protocol · uber/cadence · Discussion #4544 · GitHub

Hi @tihomir that’s really helpful explanation.
two more questions from this:

  1. when the worker finished 1,2,3,4 so it completes the workflow task, is it waiting for the activity to be completed or releasing the resource to poll other tasks from the task queue to work on.
  2. assumed it’s waiting for the activity execution result in 1), my question now is how does it know it’s polling the task exactly the result from that activity. is it a different task queue it’s polling from? other than the one it polls the workflow task initially?