How to get best Temporal Performance?

Can you describe your use case for load test? Basically what do your workflows that you are running do? Is it indicative or close to the real use case you are intending to run on your cluster?

  • persistence latency seems pretty high. should be in 100s of ms typically, might need to look at size of your db

  • sync match rate: looks like need to look at your workers next after we are done with server side

  • workflow lock contention - seems pretty high as in your single executions seem to have a lot of updates (signals? async activities/child wf completing at same time? timers firing at same time? combination of all mentioned?)

  • shard lock contention - whats numHistoryShards you set in static config? seems we need to try to increase that, how many history hosts are you running? i would first try to look into this and the persistence latencies to try to reduce those first and them move on to next steps
    (note if you change numHistoryShards you need to stand up a new cluster, including persistence store (need to re-index))

  • resource exhausted on BusyWorkflow, this is related to your high workflow lock contention, meaning you are probably starting too many activities/child workflows from single workflow execution or your activities might be all heartbeating at very high rate. typical recommendation could be to start less activities/child workflows concurrently or if issue is heartbeats up the heartbeat timeout by some small value and test again.