First of all, congrats on the V1 release of Temporal.
We were using only 1 worker for a couple of running workflows ( 50 000 ) approximately. Workflows instances are mostly waiting for signals to continue. We also have a poller Workflow ( only one instance ) with the NewContinueAsNewError every 2 minutes.
All seems to be working as expected, but the tasks stopped being processed by the worker.
Looking to metrics, Memory usage of the worker usually climbs all the way to 100% and drops down a few times. But one time it stopped executing tasks and the memory stayed on 100% until restart. After restart all tasks execution was normal again. The memory slowly climbed all the way up again, so we started another instance of the worker ( date : 10/02 ), but the result was the same, slowly climbed all the way up again . Bellow the graph of memory usage :
I believe that workers keeps caches from workflows so that the bootstrap is faster when it needs to handle the same workflow afterwards. So the memory will always climb all the way to the top if the worker is not restarted?
If my assumption above is correct, is there any way to limit the memory available to be used by workers? Or maybe to change the way on how the “recycle” of workers cache is handled? Just to avoid reaching the 100% ?
would like to ask regarding this function as well PurgeStickyWorkflowCache
is it recommended to set this during the worker of workflow termination in pod?
worker.SetStickyWorkflowCacheSize(cacheSize) sets the cache size for sticky workflow cache. This cache is shared between all workers running in the same process and must be called before any workers are started.
worker.PurgeStickyWorkflowCache() resets this sticky workflow cache across all workers. It can/should be only called when all workers are stopped.
Worker cache is in memory, and calling PurgeStickyWorkflowCache after worker service terminated is not needed cause it would have no effect.
Once your workers are back up the server will treat them as completely new workers.
Hi @maxim,
I have an problem, when I send 500 request in the same time, It will consume a lot of memory, then I wait for a long time, but memory not decrease, please help me, thank you.
We configured the values ​​of “max_cached_workflows” and “max_concurrent_activities” in our python code,but our memory is also increasing.
I searched all the posts about memory leaks in the forum and found this answer, but I don’t know how to configure the workflow.SetStickyWorkflowCacheSize method in the python sdk,Where can I set it?
Thanks,
alchemya
I don’t think anything changed about this. We are not aware of any memory leaks in the SDKs. Setting the cache size is a way to limit a worker’s memory usage.
If you believe an SDK has a memory leak, we would need a reproduction to investigate.