As you can see temporal_workflow_task_schedule_to_start_latency_seconds_bucket, temporal_workflow_task_execution_latency_seconds_sum, temporal_workflow_endtoend_latency_seconds_bucket, temporal_activity_schedule_to_start_latency_seconds_sum, temporal_activity_execution_latency_seconds_sum are very low. But still as you see in web ui, all the workflow tasks are getting timed out. Can you please help?
When you say “add more workers/ increase worker capacity”, does that mean increasing concurrent pollers/ increasing execution size/ increasing pods of application where we create workers or something else?
We had tried this with 512 shards, and had observed low latency with default matching partitions, but as we try the same run with 8 matching service partitions, the latency again increases. We want to understand why this is happening.
@tihomir Also, some followup questions regarding above setup.
We were able to achieve decent response time of workflow execution till 60 tps…but it goes very worse above that despite of changing no of shards, increasing temporal services pods, increasing no. of partitions. No luck yet.
Another thing, looks like when it goes above 60 tps…workflow active thread count drastically increases, sync match rate drops and the service latency that invokes async workflow execution increases. Is there any parameter to check the latency of a workflow invocation?
@Wenquan_Xing We do have one pod each per temporal service i.e frontend, history, matching etc
MaxConns is defined as 200…it is the value provided as part of yaml file for default and visibility databases…any recommendation you have for this value?
We are trying to hit workflows around 70 TPS…our target is to achieve more than 3000 TPS.