Why so many corrupted workflows running tctl admin db scan

cg1972 · May 7, 2024, 2:09am

We have set our history shards at 2048 and 3 replicas in kubernetes for the history service. These services are consistently using between 2GB-3GB memory each. Our retention period is 3 days. I am looking for reasons why the memory usage appears so high in the history services?

I noticed there is a tctl command

tctl admin db scan

which when run produces a lot of information about corrupted workflow executions. My questions are:

If there are a lot of corrupted workflow executions could this explain why the history service memory usage is so high i.e. history hanging around that is not being cleaned up?
What is the cause of corrupted workflow executions and is there a guide on how to avoid these with proper configuration?
Would running the scan and then the db clean resolve any of these issues. What does the clean actually do?

We are using cassandra for the DB engine.

Thanks.

Topic		Replies	Views
Temporal History Service Memory Usage Community Support history , metrics	4	1847	June 29, 2022
History_node keeps growing Community Support postgresql	12	1665	January 16, 2023
Memory leak in Temporal History service v1.18.3 Community Support history , server	3	1269	November 2, 2022
Optimize history records for a workflow with all local activities Community Support java-sdk , cassandra	18	1663	September 2, 2021
How to curb & delete history but keep workflows running? Community Support	3	2399	November 22, 2020

Why so many corrupted workflows running tctl admin db scan

Related Topics