PostgreSQL (AWS RDS) - Scale up

natan.araujo · September 15, 2022, 1:37pm

Hi, actually we have temporal.io running on AWS RDS PostgreSQL 13.7, and we are fronting some bottlenecks with history_node table elevating our CPU cost drastically.

If possible, can you guys give me some insights to solve this problem?

Below we have some metrics collected at high load time.

SELECT’s: 6345/min
INSERT’s: 15685/min
DELETE’s: 758/min (14589 rows/min)

Observation: Our project are going to add many times this workload.

Row Octet Size

MIN: 246 (246 B)
AVG: 5613 (5.48 KB)
MAX: 73538 (71.81 KB)

We notice that is generating too much toast data.

If we isolate this inserts event type at AWS performance insight, we can find these values.

LWLock:WALWrite: 58%
IO:WALSync: 24%
CPU: 9%
Client:ClientRead: 9%

Can you guys help me to remove that bottleneck and scale up temporal application?

tihomir · October 1, 2022, 12:02am

Can you show your persistence requests:

sum by (operation) (rate(persistence_requests[1m]))

and check for “UpdateWorkflowExecution” operation.

Also can you share the DB instance type, the number of CPUs, and memory size set for your db?

natan.araujo · October 21, 2022, 12:50pm

Hi, @tihomir.
Thank you for your answer.

Now my instance is an AWS RDS m5.2xlarge with 8 vCPU and 32 GB of RAM.

Bellow I have a print showing persistence_requests by operation.

natan.araujo · October 21, 2022, 12:51pm

Bellow we have a print exclusive of UpdateWorkflowExecution.

emafuma · January 28, 2023, 8:47am

Hello @natan.araujo, I’m seeing something similar using the same RDS Postgres, did you managed to find the bottleneck ?

natan.araujo · January 30, 2023, 12:35pm

In my case, sometimes, we have more than 40KB per line in history_node, so PostgreSQL uses toast to store data and this is bad on some scenarios.

After 2000B PostgreSQL writes data to toast and compress it.

Cons:
Every time pg have to write full page at wal log even if tuple it is not related chunk.
In my comprehension, if you have a page with 2 chunks of id1 and 2 chunks of id2, and you updated only id1, pg will write id2 chunks at wal log too and because of that generates to much IOWAL WRITE and WALWRITE LOCK.
Cant read data without decompress, even partially (I’m at pg13 and can’t).

joyc · April 9, 2024, 7:57pm

Hey @natan.araujo did you find any solution to this?

Even I see this issue. The pattern I see between problematic and non-problematic clusters in our setup is that the pg_statistic and pg_toast tables have too many dead tuples in the problematic clusters. This doesn’t exist in other clusters.

Topic		Replies	Views
Postgres db instance sizing recommendations Community Support database	1	3168	December 9, 2021
Temporal studying - various questions Community Support	5	1562	February 9, 2021
Unexpected spikes in Postgres DB iops Community Support performance , aws , postgresql	6	1297	November 12, 2022
Temporal database resource consumption Community Support	3	1008	April 6, 2023
Experiencing sudden increase in Postgresql CPU consumption Community Support postgresql	1	709	September 14, 2022

PostgreSQL (AWS RDS) - Scale up

Related topics