We now have a memory leak problem

alchemya · June 17, 2024, 9:09am

Our backend uses the Python flask framework and temporal as an asynchronous task queue to handle our scheduled or asynchronous tasks. In our system, temporal is treated as a Celery-like component. We redeployed a server to run temporal workers.
Then we encountered a memory leak on tenporal, which caused the memory of the server running the temporal worker to continue to grow.
I would like to ask for troubleshooting measures and whether anyone has encountered similar problems.

alchemya · June 17, 2024, 10:47am

We now use the method of restarting Docker with a scheduled task to temporarily solve this problem, but the server memory still accounts for nearly 100%

Chad_Retz · June 17, 2024, 1:14pm

We are not aware of any memory leaks, but memory usage can grow until limits are reached. Specifically, running activities use memory but the amount of running activities can be bounded by max_concurrent_activities. Also, workflows use memory in a cache for optimization and will fill the cache when it can, and that can be controlled by max_cached_workflows.

alchemya · June 18, 2024, 3:55am

I have a question. I feel that what we are facing now is this situation. If our worker activity is always running, and after the old worker ends, a new worker comes out, then there will be workflows using memory in a cache for optimization. In this case, our memory is gradually growing to 100%. Is this the reason?

alchemya · June 18, 2024, 5:45am

We configured the values of “max_cached_workflows” and “max_concurrent_activities” in the old code, but the server’s memory continued to grow, which confused me, so I wondered if it was the cause of the memory leak problem.

Chad_Retz · June 18, 2024, 12:33pm

Usually workers live for the life of the process, but regardless, workflow cache is per worker and is collected when the worker is. So yes, more workers mean more memory usage.

alchemya · June 19, 2024, 2:11pm

In the past few months, our server memory has continued to grow due to temporal‘s reasons. We have upgraded the memory of the cloud server from 8G to 16G, and now it is 64G.

If we don’t upgrade the physical memory, the proportion of our temporal memory will continue to increase until it occupies full memory space (this did happen)

We now reduce the memory consumption of temporary by restarting our docker with daily scheduled tasks. So I suspect that there may be a problem with our temporal settings, or maybe there is a memory leak in our system.

alchemya · June 20, 2024, 2:20am

As time goes by, our memory usage will eventually reach 100%, even if we use a server with 64G memory. This is a serious problem we are facing now.

Chad_Retz · June 20, 2024, 1:50pm

This could also be due to activities not completing and continually increasing memory. Are you properly heartbeating in activities? Granted the number of activities will still be bound by max concurrent activities.

We are aware of no memory leaks. If you are able to replicate and then continually reduce this replication to a small standalone replication, we can help debug and see if there is an issue with the SDK.

alchemya · June 22, 2024, 11:36am

In our code, our activities will be automatically terminated after a certain duration. There is no historical residual activity that causes memory increase. I can guarantee that there is no situation where the activity is not completed and continues to occupy memory.
So this is why I suspect there is a memory leak problem.

alchemya · June 22, 2024, 11:38am

So this is where I’m confused now.

Chad_Retz · June 24, 2024, 1:19pm

I am afraid from these posts alone I cannot diagnose where the leak is. We are unaware of any leaks in the SDK at this time (but it is possible they could exist in certain circumstances we haven’t found before). We would need a small standalone replication to debug. If you can reliably replicate, can you continually reduce your replication down to a small replication and provide it so that we can debug the memory leak?

ant31 · January 24, 2025, 4:41pm

Hi,

I’m having the same pattern than @alchemya .
Workers memory goes up and up, it’s not even fluctuating, until it consumes all available memory.

Topic		Replies	Views
Temporal Server Memory Bloat: Anyone Found a Fix? Community Support typescript-sdk , server	1	38	June 1, 2025
Temporal memory leak Community Support java-sdk , springboot	3	928	January 15, 2024
Worker Memory usage Community Support go-sdk , mysql	17	4482	June 26, 2024
Memory leak in Temporal History service v1.18.3 Community Support history , server	3	1481	November 2, 2022
@temporalio/worker-1.7.4: Memory occupied by worker not getting released on AWS ECS (OS: Linux) Server Deployment typescript-sdk	1	324	August 8, 2023

We now have a memory leak problem

Related topics