Hi, me again
I have the following use case:
Given a list of .parquet files located on S3, my workflow processes each row and makes gRPC calls to an external service. I’m using a variation of the pattern used in the Sliding Window Example:
- My main workflow creates the partitions to be processed, using a configurable
partition_size
. - For each partition, it spawns a child workflow that goes over the entire partition and spawns another child workflow for each row that does the actual reading, processing, and gRPC calling (the first two workflows only use offsets and limits).
For example, with 200 rows total and partition_size
of 10, there would be 1 main workflow, 20 child workflows for partitions, and 200 child workflows processing individual rows.
The issue I’m struggling with is that there seems to be a bottleneck in the process, since the external gRPC service is getting requests at around 15 RPS. This is despite having around 200 workflows concurrently making gRPC calls.
I’m looking for leads on how and where I should look to find the bottleneck.
Thanks!