- try to use 3 frontend pods; 3 matching pods; 3 history pods; 1 worker pods;
- check CPU / mem utilization of above pods when running the load
- make
sum(rate(poll_success_sync[1m]))/sum(rate(poll_success[1m]))
as close to 100% as possible by increasing theMaxConcurrentActivityTaskPollers
andMaxConcurrentWorkflowTaskPollers
- use 4 for both
matching.numTaskqueueReadPartitions
andmatching.numTaskqueueWritePartitions
- try to use local activity, which will improve the latency (reduce unnecessary round trip)
- assuming my understanding is correct: per workflow there will be 18 normal activities, target is 100 workflow per sec: ← translates to roughly 5K DB transaction
- check if your DB is overloaded (DB CPU / mem)
- check the metrics emitted from temporal persistence layer
histogram_quantile(0.99, sum(rate(persistence_latency_bucket[1m])) by (operation, le))
← p99 latency, target is 50ms ish (create workflow execution, update workflow execution)
- try to use more workflow for testing, 1000 workflow start to completion rate may not be accurate (long tail?), try 10,000 or maybe more
1 Like