Cluster Info
I have deployed temporal using helm chart in EKS cluster. Cluster has 1024 shards, 2 history, 2 matching and 2 frontend pods, docker image is temporalio/server:1.18.4. The persistence storage is AWS RDS mysql with r6g.large instance ( 2cpu, 16 gb ram).
Problem
While running the 12k maru run, workflows are executed at a good rate until we hit a short period where no workflows are getting executed. After this period, backlog workflows are executed at a slow workflow closing rate. What could be the possible reason for this behaviour? Any pointers for avoidance is much appreciated.
histogram csv
download
maru config
{
"steps": [{
"count": 12000,
"ratePerSecond": 100,
"concurrency": 10
}],
"workflow": {
"name": "basic-workflow",
"args": {
"sequenceCount": 3
}
},
"report": {
"intervalInSeconds": 10
}
}