We are using rds postgres for our db in our OSS temporal cluster. And we saw that that DB was the bottleneck. So we upped it twice and now it is better. But we aren’t really doing that much in my opinion. We are now using a db.r6g.4xlarge instance, 512 shards. We are giving it 50 workflows per second. And the workflow has 11 activities. 47 state transitions. About 30 second runtime. DB CPU hit about 60%. When we were using db.r6g.2xlarge we were pegging the CPU and getting lots of problems.
Are there any rules of thumb on how much you can run per db size? And what are the major contributors to DB cpu in general. The connections look pretty stable, so I don’t think it is making lots of connections to eat up the CPU. The insights say commit is the big deal. But not a ton of detail there. The next item on the list is insert into cluster_membership, then insert into history_node.
My guess here is that something about how we developed our workflows is causing undue load on the db. But I just don’t understand what all is happening under the hood to know what that might be… So I wanted to get a general idea of what size db we should expect to need, and what types of things add cpu load to the db.
2 Likes
One thing to look into is that server does background jobs per shard, even when server is idle that can result in db read at some frequencies. Have seen some users increase the default frequencies in cases where they have added unwanted loads (such as bump db cpu use) at times. Useful dynamic configs that you can look into increasing for this are:
- history.transferProcessorMaxPollInterval (default 1m)
- history.timerProcessorMaxPollInterval (default 5m)
- history.visibilityProcessorMaxPollInterval (default 1m)
- history.shardUpdateMinInterval (default 5m)
Another thing you should look into is how much extra pressure are possibly unprovisioned sdk workers you might have adding on the db.
For this you can look at your service metric:
sum(rate(persistence_requests{operation="CreateTask"}[1m]))
When a workflow/activity task is created and moved to matching service task queue partition, if there is no available sdk worker poller to pick up this task, matching has to persist it to db, then read it back out when a poller is available. So provisioning your sdk workers to minimize need of service to write/read task to/from db can help lessen the db load.
You can also look at your workflow execution event histories, as in how large they are, can use visibility search attributes HistorySizeBytes
and HistoryLength
.
If you have very large event histories in terms of size, this can result on larger read/writes and if you have large event histories in terms of event count it could result in your workers having to call GetWorkflowExecutionHistory api more since this api is paginated.
Tuning your sdk worker cache size given the memory (heap size) you give your worker pods can help with lowering need to fetch event histories that could help on db end.
Protect your db, meaning regardless of db size you have configuration settings in Temporal that can help you not overload it which can cause bigger issues.
Useful dynamic configs for this:
frontend.persistenceMaxQPS
for frontend persistence max qps
history.persistenceMaxQPS
for history persistence max qps
matching.persistenceMaxQPS
for matching persistence max qps
watch your resource exhausted server metrics:
sum(rate(service_errors_resource_exhausted{}[1m])) by (operation, resource_exhausted_cause)
for resource exhausted cause SystemOverloaded
which is typically related to db overload
Would also look at your persistence latencies:
histogram_quantile(0.95, sum(rate(persistence_latency_bucket{}[1m])) by (operation, le))
and if you can share the graph so can take a look at possible culprits that can maybe help give more recommendations.
overall your workload size will heavily drive the db size you need now and can predict you will need in the near/far future as your workloads are growing.
note that temporal service replication feature would allow you to set up a bigger cluster with bigger db and the “migrate” existing workloads to this bigger cluster that can have shard number that is a multiple of the currently running one you have, so upgrading to bigger env is possible