How can I know my worker is busy enough?

Tal_Wanish · January 8, 2025, 7:46pm

Hi, me again

I have the following use case:

Given a list of .parquet files located on S3, my workflow processes each row and makes gRPC calls to an external service. I’m using a variation of the pattern used in the Sliding Window Example:

My main workflow creates the partitions to be processed, using a configurable partition_size.
For each partition, it spawns a child workflow that goes over the entire partition and spawns another child workflow for each row that does the actual reading, processing, and gRPC calling (the first two workflows only use offsets and limits).

For example, with 200 rows total and partition_size of 10, there would be 1 main workflow, 20 child workflows for partitions, and 200 child workflows processing individual rows.

The issue I’m struggling with is that there seems to be a bottleneck in the process, since the external gRPC service is getting requests at around 15 RPS. This is despite having around 200 workflows concurrently making gRPC calls.

I’m looking for leads on how and where I should look to find the bottleneck.

Thanks!

tihomir · January 14, 2025, 12:42am

can you share your worker options? how many worker processes do you have running supporting your fanout use case?
do you have service and sdk worker metrics available?

Tal_Wanish · January 15, 2025, 8:54am

Hi, thanks for responding!

I recently started exploring the metrics after reading some past posts here, which were very helpful.

It turns out the issue was with my implementation. I had initially used the approach from the Sliding Window Batching example, where the “grandson” workflow responsible for the actual processing was implemented as a single activity containing all the logic. After splitting this activity into separate activities, I saw the performance boost I was looking for. This makes sense, as it gives Temporal more opportunities to parallelize the process.

I’m still new to Temporal, so this is all part of my learning curve!

Topic		Replies	Views
How To Identify And Tune Worker Bottlenecks Community Support java-sdk	2	1937	January 23, 2023
Suggestions to increase worker throughput Community Support	7	2046	December 10, 2020
Bad performance when deployed in Kubernetes - How to diagnose bottleneck? Community Support java-sdk , helm	2	1669	March 25, 2022
Why temporal/cadence chose rpc over REST? Community Support	6	3717	August 27, 2020
Tuning Temporal setup for better performance Community Support cassandra , performance , kubernetes	5	8971	November 13, 2021

How can I know my worker is busy enough?

Related topics