We’re using AWS Aurora PostgreSQL (db.r6g.2xlarge
) as the underlying database for our temporal-server. When around 6,000 workflows were triggered within 4 minutes, DB CPU utilization spiked. We’re seeking best practices to prevent such spikes.
Can we configure temporal-service to throttle requests (e.g., X requests/second)? If so, how?
Additionally, are there other recommended approaches (apart from upgrading the DB instance) to handle this more efficiently?
Would check how much extra pressure on db is created by unprovisioned sdk workers:
sum(rate(persistence_requests{operation="CreateTask"}[1m]))
You can also look at adjusting your qps limit to “protect” your db, dynamic configs info:
- Per host dynamic configs:
- frontend.persistenceMaxQPS - default 2400
- matching.persistenceMaxQPS - default 2400
- history.persistenceMaxQPS - 1default 6000
- Per service type dynamic configs:
- history.persistenceGlobalMaxQPS - default 36000
- Per shard dynamic configs:
- history.persistencePerShardNamespaceMaxQPS - default 500