Result differences in AWS and on-premise when running workflow

ridwan.santoso · July 2, 2022, 3:15pm

Hi All,

Need help here, I am doing testing in 2 different environment:

in AWS, the clients are in AWS EKS connecting to the Temporal instances and cassandra in the same EC2 VM (32 CPU 64GB RAM)
on-premise, the clients are in Openshifts pods connecting to Temporal instances and cassandra in the same VM (also 32 CPU 64GB RAM)

Using the same settings I got different result where in AWS shows much better result (about 48 workflows per second) while on-premise only can generate 15 workflows per second.

I suspect the latency between client to temporal is the cause of this: in AWS the latency is less than 1 ms (between 0.877 ms to 0.984) while in on-premise varies between 3.90 ms to even 15.8 ms.

Can this latency cause the slower performance ? Or maybe any other things to check?

Please help.

Thanks & regards,

ridwan

tihomir · July 2, 2022, 4:10pm

Yes network latencies can affect db latencies and can affect performance.
Server emits persistence_latency metrics that you could check and compare in both scenarios for different operations:

histogram_quantile(0.95, sum(rate(persistence_latency_bucket{}[1m])) by (operation, le))

operations to maybe focus on are CreateWorkflowExecution, UpdateWorkflowExecution, UpdateShard

Topic		Replies	Views
Temporal Latency Community Support	3	2738	December 15, 2020
Running Temporal + Postgres - Benchmark Community Support java-sdk	7	6189	July 24, 2025
250ms latency for a workflow with 2 empty activities Community Support go-sdk , helm , postgresql	5	74	March 26, 2025
Lowest achievable latency Community Support general-impl , best-practices	2	1636	April 21, 2023
Temporal and Cadence performance comparison Community Support go-sdk , cadence , testing	4	2683	April 11, 2022

Result differences in AWS and on-premise when running workflow

Related topics