Temporal performance with golang microservice, Cassandra & Elasticsearch

  1. try to use 3 frontend pods; 3 matching pods; 3 history pods; 1 worker pods;
  2. check CPU / mem utilization of above pods when running the load
  3. make sum(rate(poll_success_sync[1m]))/sum(rate(poll_success[1m])) as close to 100% as possible by increasing the MaxConcurrentActivityTaskPollers and MaxConcurrentWorkflowTaskPollers
  4. use 4 for both matching.numTaskqueueReadPartitions and matching.numTaskqueueWritePartitions
  5. try to use local activity, which will improve the latency (reduce unnecessary round trip)
  6. assuming my understanding is correct: per workflow there will be 18 normal activities, target is 100 workflow per sec: ← translates to roughly 5K DB transaction
    • check if your DB is overloaded (DB CPU / mem)
    • check the metrics emitted from temporal persistence layer histogram_quantile(0.99, sum(rate(persistence_latency_bucket[1m])) by (operation, le)) ← p99 latency, target is 50ms ish (create workflow execution, update workflow execution)
  7. try to use more workflow for testing, 1000 workflow start to completion rate may not be accurate (long tail?), try 10,000 or maybe more
1 Like