Temporal test bench by maru

haojie · December 20, 2022, 11:17am

I try to test temporal cluster throughput by maru. I deploy the temporal cluster by helm chart in three k8s nodes, and each service have two pods instance, db is postgresql.
In order to limit history service memory usage，I have modified the history cache size:
history.cacheInitialSize: 50
history.cacheMaxSize: 100
history.eventsCacheInitialSize: 50
history.eventsCacheMaxSize: 100
and I also modified the nums of shards to 4

my configuration of the maru yaml is:
step.count: 1000
step.ratePerSecond: 50
step.concurrency: 100
and I get the follow table, I found workflow execution total time is 760s, actually it took 170s to execute 962 workflows, the rest of time is spent to process remaining 38 workflows(backlog).
My question is how to locate the performance bottleneck of the cluster. In other word, What key performance metrics can I refer to, and what parameters should be adjusted to improve the throughput of the cluster.

Acording to other topic, there are some metric pictures in my test env.

tihomir · December 20, 2022, 4:52pm

Could you show your persistence latencies by operation?

histogram_quantile(0.95, sum(rate(persistence_latency_bucket{}[1m])) by (operation, le))

Also seems your sync match rate is not optimal (from last two graphs), can you show

sum(rate(poll_success_sync{}[1m])) / sum(rate(poll_success{}[1m]))

Is there a reason to set number of histoy shards to 4? There is a nice writeup on this here if it helps.
Related with history cache size (dynamic config) reduction from default, these are per-shard configurations, were you running out of resources on your history pods during load test? What was cpu/mem % during test?

haojie · December 21, 2022, 6:23am

Hi,@tihomir, thanks a lot for your reply.

And I try to run two workers(before is just one worker), but ‘sum(rate(poll_success_sync{}[1m])) / sum(rate(poll_success{}[1m]))’ doesn’t seem to have changed much, and the cluster throughput seems not be better.

The reason I set shards nums to 4, include modifying the history cache size is to limit the memory consumption of temporal cluster to the greatest extent while meeting our performance requirements,
my purpose is to find a balance between performance and resource consumption.

haojie · December 22, 2022, 3:52am

Hi，@tihomir, I have a very confused place where both PollWorkflowTaskQueue latency and PollActivityTaskQueue latency are at the minute level, however other operations latency is at the millisecond level, and PollWorkflowTaskQueue latency and PollActivityTaskQueue latency will drop when workflow start executing, so how to understand these two metrics?

Topic		Replies	Views
Improving Temporal cluster performance Server Deployment go-sdk , aws , scylla , kubernetes	1	994	November 21, 2022
Workflow backlog while running maru 12k test in kubernetes cluster Community Support	3	692	March 3, 2023
Bottleneck at scaling Temporal server Community Support mysql , performance	1	79	March 11, 2025
Recommendation for K8S Cluster; currently using default values Server Deployment helm , general-impl	11	1099	October 30, 2023
Tuning Temporal setup for better performance Community Support cassandra , performance , kubernetes	5	8732	November 13, 2021

Temporal test bench by maru

Related topics