Result differences in AWS and on-premise when running workflow

Hi All,

Need help here, I am doing testing in 2 different environment:

  • in AWS, the clients are in AWS EKS connecting to the Temporal instances and cassandra in the same EC2 VM (32 CPU 64GB RAM)
  • on-premise, the clients are in Openshifts pods connecting to Temporal instances and cassandra in the same VM (also 32 CPU 64GB RAM)

Using the same settings I got different result where in AWS shows much better result (about 48 workflows per second) while on-premise only can generate 15 workflows per second.

I suspect the latency between client to temporal is the cause of this: in AWS the latency is less than 1 ms (between 0.877 ms to 0.984) while in on-premise varies between 3.90 ms to even 15.8 ms.

Can this latency cause the slower performance ? Or maybe any other things to check?

Please help.

Thanks & regards,

ridwan

Yes network latencies can affect db latencies and can affect performance.
Server emits persistence_latency metrics that you could check and compare in both scenarios for different operations:

histogram_quantile(0.95, sum(rate(persistence_latency_bucket{}[1m])) by (operation, le))

operations to maybe focus on are CreateWorkflowExecution, UpdateWorkflowExecution, UpdateShard