Domain-specific workflow state aggregated from child workflow state

I’m implementing an administrative UI that will display domain-specific state of currently executing root workflows. I define a root workflow as one that was not executed as a child workflow. Child workflows will contribute to the state of their root workflow. Is there a suggested pattern to implement this?

Here are some options I’ve identified:

  1. Each child workflow signals to its root an update of its state. The root stores the aggregate state in its search attributes. A downside to this approach is added latency(1) and load on the temporal server.

  2. Query for root workflows then send a workflow query to each root workflow that would then query all descendants. Downsides are inefficient sort and pagination on descendant workflow state, and the latency/cost of rehydrating stale workflows.

  3. Each workflow (root or child) stores its state along with a correlation ID(2) in its search attributes. The administrative UI would query for all the root workflows’ correlation IDs and then for each correlation ID query for the state of currently running correlated workflows(3). The root’s state would be assembled by aggregating the results. A downside to this approach is the inability to efficiently sort and paginate on state aggregated from descendant workflows.

  4. Use a mechanism outside of Temporal to hold this state

(1) possibly exacerbated by contention on the root

(2) Correlation ID prevents having to walk the linked list of parents back to their root

(3) A single query to retrieve all workflow states wouldn’t allow for efficient pagination as the state of any single root workflow wouldn’t be known until all workflow states have been processed

1 Like

How many child workflows per parent? How frequently the state is updated per child and across all children. For example in (1) how many updates per second and how many per lifetime of a parent will be executed?

Good questions. My wild guess is <100 descendants per root workflow, <10 updates per descendant, updates are likely to come in bursts of <100 per second per root workflow with seconds to months between bursts. Lifetime of workflows will be seconds to months.

I don’t have a sense of how much of our overall operational cost such a decision represents as I don’t yet have a sense of how much Temporal server will cost.

In this case I would make each child workflow signal its progress to the parent. And the parent return the current state of the computation using a query.

1 Like

@maxim I’m in a similar cross road on how to aggregate the domain state across parent and child workflows. Can you please provide examples or guidelines on the below approaches

  1. How to safely mutate the aggregated state in the parent workflow?
  2. Is there any size limit of the aggregated object persisted in the parent workflow?
  3. If we go with option 2, query recursively calling the child queries, is the response could be in sub seconds or is it costly process?
  4. My understanding on option 3, the root workflow instance maintains all the nth level child workflow instance ids in the search attributes. We can make parallel query execution across all ids and aggregate the result. If this understanding is right, I have follow up questions on this 1. How much of a parallel calls we can make from the query to get all the children workflow state? Any limitations? 2. Is search attributes needed here? Can we make workflow graph as one of the workflow variable and persist it rather than search attributes? Any samples to refer for this approach?