High CPU Utilization in DB Due to Timer Fires

Hi,

We are 1M workflows that triggers timers simultaneously, and this is causing a significant spike in DB CPU utilization—reaching ~90% for about an hour. Our Temporal DB is running on a db.r8g.2xlarge instance.

Is there a recommended way to handle or mitigate this issue?

Can you show this spike(s)?
sum by (operation) (rate(task_requests{operation=~"TimerActive.*"}[1m]))

(add namespace filter if needed)

Can you also show resource exhausted errors:
sum(rate(service_errors_resource_exhausted{}[1m])) by (operation, resource_exhausted_cause)

Probably best would be to spread out these user timers over period of time by always adding a jitter to sleep duration.

DB CPU utilization—reaching ~90% for about an hour

might want to look at lowering history qps limits



I have attached the requested metric screenshots.

Yes adding jitter is definitely an option and we can implement that.
Are there any additional approaches or best practices you would recommend for handling large timer bursts ??

Also, are there any configuration changes we could consider if firing all timers at the same time is unavoidable?