Database spike because of retention deletion process

ehsan.qlub · February 21, 2025, 1:25pm

Hi,

We are running Temporal in our production environment, but we occasionally experience significant CPU spikes in our PostgreSQL database (specifically, temporal_visibility). Given the high number of DELETE queries, we suspect this is due to the retention cleanup process.

Is there a way to optimize this process or restrict its execution to avoid peak hours? At times, it runs during our busiest periods, which impacts performance.

Best regards,
Ehsan

tihomir · February 22, 2025, 8:28pm

Do you have server metrics and can share your persistence requests graph during your peak times when you think retention deletion kicked in?

sum by(operation) (rate(persistence_requests[1m]))

you can try increasing dynamic config
history.retentionTimerJitterDuration (default 30mins)
to spread out deletion of event histories for workflow executions that completed when their retention period is reached. ty setting it maybe to 3 hours and see if that helps

ehsan.qlub · February 24, 2025, 11:44am

looks like this is the problematic SQL:

DELETE FROM executions_visibility WHERE namespace_id = $1 AND run_id = $2

We had to offload our workflows from temporal, as you can see it produced an spike in our database. And looks like this DELETE command is causing the issue.

maxim · February 24, 2025, 3:32pm

We recommend elastic search for visibility for high load environments. If ES is not an option you can use a separate database instance just for visibility.

Topic		Replies	Views
How to curb & delete history but keep workflows running? Community Support	3	2780	November 22, 2020
History_node keeps growing Community Support postgresql	12	2066	January 16, 2023
DB migration support for Temporal deployment Server Deployment elasticsearch , cassandra , postgresql	4	1203	April 1, 2023
Database CPU spike when multiple workflows are triggered simultaneously Community Support go-sdk , aws , postgresql	1	40	March 25, 2025
Unexpected spikes in Postgres DB iops Community Support performance , aws , postgresql	6	1267	November 12, 2022

Database spike because of retention deletion process

Related topics