Greetings,
One of the teams at our company is working on a use case where they want to fan out a large number of activities within a single workflow in a scatter gather type fashion (sample code below)
What we observe however is that a single workflow is only able to execute about 30 activities per second no matter what worker\server knobs we turn despite resource utilization being seemingly low and 100% sync match rate (happy to share any other relevant metrics)
In the workflow event history - we see that all of the ActivityTaskScheduled events are all dispatched at the same time (presumably the SDK sends all of the commands in a single workflow task execution which is expected)
The ‘ActivityTaskStarted’ events, however, are staggered by about 10-30ms each
We tried increasing number of workers, maximum concurrent activities and pollers, and task queue partitions.
Our question is whether this is some sort of inherent limitation of the platform (i.e. all scheduled activities within a workflow must be delivered to workers sequentially) or whether there are knobs to tune we are not aware of (or if we tuned something incorrectly).
My understanding is that Temporal’s unit of concurrency is a history shard and that a given workflow always maps to the same history shard so I wonder if higher concurrency is just not possible within a single workflow.
We are currently on server version 1.21.6
.
This is how we scatter-gather these activities in Python:
@workflow.run
async def execute(self) -> None:
activities = [workflow.start_activity(dummy_fanout, start_to_close_timeout=timedelta(seconds=11))
for _ in range(1000)
]
await asyncio.gather(*activities)
It also looks like temporal.lock_latency.99percentile
goes up to 3.3 (presumably milliseconds though not sure what unit the server emits. The code just takes ‘Duration’)
I tried searching slack and forum and did not find a definitive answer to this but let me know if I missed a post.
Thank you!