Workflow Performance with Java SDK

java_dev · February 20, 2023, 8:31am

Hello!
Together with the team, we are implementing the temporal. The workflow consists of 4 activities (1 local). When generating a load of 200 RPS, the temporal works normally, but I’m not satisfied with its speed. When testing, 12000 workflows are launched. Most of which get stuck in the WorkflowTaskScheduled status.

And the time according to the temporal_workflow_task_schedule_to_start_latency_seconds metric is continuously growing depending on the duration of the test.

Increasing maxConcurrentWorkflowTaskExecutionSize , maxConcurrentActivityExecutionSize , maxWorkflowThreadCount and so on does not affect the result.

Additional info:
frontend replica count - 3
matching replica count - 3
history replica count - 5
worker replica count - 2
cassandra cluster size-5
numHistoryShards - 4096

Please, help.

tihomir · February 20, 2023, 3:08pm

Would start by looking at sync match rate (serve metrics):

sum(rate(poll_success_sync{}[1m])) / sum(rate(poll_success{}[1m]))

it should stay around 1 (100%) if you see it dip or not even reach 100% then most likely you need either more worker pods or more workflow task/activity task pollers on your workers.

Another server metric to look at is

sum(rate(persistence_requests{operation="CreateTask"}[1m]))

if this graph shows values > 0 then again seems as you might have unprovisioned workers (this metric shows tasks written to db when there are no pollers available).

On your worker side look at cpu and mem utilization % and share please. We need to see if we need to add more workers or add more poller counts.
If you can share results of all this and also share your dynamic config values of

numTaskqueueWritePartitions
numTaskqueueReadPartitions

as well as number of pollers you define on your worker config.

Share also your graph for SDK metrics workflow task and activity schedule to start latencies.

Topic		Replies	Views
Temporal performance issues Community Support java-sdk , performance , worker , kubernetes	1	1833	April 26, 2023
Workflow Task Schedule To Start Latency High Community Support java-sdk , deployment	11	3938	February 8, 2025
Temporal is slow to start burst of 1000s of workflows Server Deployment go-sdk	0	102	January 22, 2025
Temporal throughput not improving Community Support cassandra , metrics	2	1112	October 2, 2022
Very big schedule to start workflow latency (Java SDK) Community Support java-sdk	10	3056	March 22, 2024

Workflow Performance with Java SDK

Related topics