Long-running workflow with significant fan-out of child workflows

andrei.hu · July 22, 2025, 8:45am

Hi Everyone,

I have implemented a workflow for test purposes. The main workflow implementation is fairly simple: it just spawns child workflows, waits for them to be completed and the aggregates the result (the actual work is negligible here). Child workflows are not terribly complex either (consisting of 4-5 activities), however they can take hours or even days to complete, since they are coupled to real-world activities and events (which are represented as signals to the child workflows). So far so good.

The real world usage of this workflow would mean fanning out potentially hundreds of thousands of child workflows. As far as i read documentations and forums, this is not an issue to Temporal itself, but reading this article (Managing Long-Running Workflows with Temporal | Temporal) made me wonder that if i’d spawn that many child workflows, they would cause the parent to hit the event history limit and ultimately i need to incorporate continue-as-new into my parent workflow implementation.

I have two questions:
Q1: Is this idiomatic in temporal to implement a workflow like described above?
Q2: The main part of the implementation of the parent workflow looks like the following:

        location_insight_futures = [asyncio.create_task(workflow.execute_child_workflow(LocationReport.run, args=[location, params.report_id], id = location.id)) for location in params.locations]
        (completed, location_insight_futures) = await workflow.wait(location_insight_futures)
        location_insights = [fut.result() for fut in completed]

Do i need to worry about hitting the event history size limit (assuming e.g. 500000 spawned child workflows)? If yes, what is the best strategy to implement continue-as-new? The best that i can come up with would consist of two elements:

Break up the fanout list comprehensions to smaller batches and check if continue-as-new is suggested between spawning child workflows.
Changing workflow.wait from the default ALL_COMPLETED to FIRST_COMPLETED, and then also check for continue-as-new before waiting again.

While 2. would make some sense, since result aggregation could overlap with child execution, 1. is really mixing up underlying infra concerns (working around limits) with mundane business logic.

Any thoughts on this?

Thanks in advance
Andras

update: With some experimenting i already see a limitation biting the use-case: the number of outstanding child workflows cannot exceed 2000. That’s a big problem for this line of implementation approach.

maxim · July 22, 2025, 2:58pm

You can create a tree of children. A parent spawns 1k children, each of them in turn spawns 1k children, and you get 1 million children total.

andrei.hu · July 23, 2025, 11:25am

Hi Maxim,

Thanks for your response, i have implemented a tail-recursive variation of your idea. Works perfectly. There is a minor concern around mixing this logic with business intelligence, but the technical issue is solved for the time being.

Thanks for your help!

Best
Andras

Topic		Replies	Views
Reasoning about amount of child workflows Show & Tell general-impl , child-workflow	0	2395	June 5, 2023
Is there a maximum number of child workflows Community Support child-workflow	1	3434	February 9, 2022
Parent workflow trigger Async child workflows Community Support java-sdk	2	161	August 1, 2024
Child Workflow Continue as New and maximum event limit Community Support python-sdk , child-workflow , performance , activity	1	180	January 17, 2025
Design for coordinator workflow with potentially large history Community Support go-sdk , cassandra	2	983	August 31, 2021

Long-running workflow with significant fan-out of child workflows

Related topics