Domain-specific workflow state aggregated from child workflow state

I’m implementing an administrative UI that will display domain-specific state of currently executing root workflows. I define a root workflow as one that was not executed as a child workflow. Child workflows will contribute to the state of their root workflow. Is there a suggested pattern to implement this?

Here are some options I’ve identified:

  1. Each child workflow signals to its root an update of its state. The root stores the aggregate state in its search attributes. A downside to this approach is added latency(1) and load on the temporal server.

  2. Query for root workflows then send a workflow query to each root workflow that would then query all descendants. Downsides are inefficient sort and pagination on descendant workflow state, and the latency/cost of rehydrating stale workflows.

  3. Each workflow (root or child) stores its state along with a correlation ID(2) in its search attributes. The administrative UI would query for all the root workflows’ correlation IDs and then for each correlation ID query for the state of currently running correlated workflows(3). The root’s state would be assembled by aggregating the results. A downside to this approach is the inability to efficiently sort and paginate on state aggregated from descendant workflows.

  4. Use a mechanism outside of Temporal to hold this state

(1) possibly exacerbated by contention on the root

(2) Correlation ID prevents having to walk the linked list of parents back to their root

(3) A single query to retrieve all workflow states wouldn’t allow for efficient pagination as the state of any single root workflow wouldn’t be known until all workflow states have been processed

How many child workflows per parent? How frequently the state is updated per child and across all children. For example in (1) how many updates per second and how many per lifetime of a parent will be executed?

Good questions. My wild guess is <100 descendants per root workflow, <10 updates per descendant, updates are likely to come in bursts of <100 per second per root workflow with seconds to months between bursts. Lifetime of workflows will be seconds to months.

I don’t have a sense of how much of our overall operational cost such a decision represents as I don’t yet have a sense of how much Temporal server will cost.

In this case I would make each child workflow signal its progress to the parent. And the parent return the current state of the computation using a query.

1 Like