Temporal memory leak

Hello,
We are encountering a memory leak in a workflow. The code behaves as follows:

Optional nextObject = getNextObjectActivity.getNextObject();
while(!Objects.isNull(nextObject) && nextObject.isPresent()){
	someActivity.run(nextObject);
	nextObject = getNextObjectActivity.getNextObject();
}

The getNextObject() method uses a store method that wraps the response in a Mono using PostgresReactiveWrapper.
The problem is that this workflow is causing a memory spike and we don’t have a clear replication scenario. The following was observed in the logs:

Any insights on the possible reason of this issue and how to fix it?

Did you look at the heap dump? It would show what exactly retains memory. From the Temporal point of view, you can reduce the number of cached workflows by reducing the size of the WorkerFactoryOptions.maxWorkflowThreadCount. You can also reduce the parallelism of activities by reducing WorkerOptions.maxConcurrentActivityExecutionSize.

I looked into the heap dump file and found thousands of instances of nextObject with references from CompletablePromiseImpl instances that are retaining a significant amount of memory. Also, I’d like to know what are the cashed workflows, are these the ones that already completed or terminated or what?

It looks like your workflows keep references to these objects, and these objects are large. To improve the caching performance, you can avoid passing large objects through workflow. Store them somewhere (like a database or S3) and pass references to them.

I’d like to know what are the cashed workflows, are these the ones that already completed or terminated or what?

This is a subset of workflows that are currently executing. Once workflow code needs to handle new events, it is rehydrated in a worker process and then cached there. After a while, if it is not receiving new events, it is pushed out of the cache by other workflows.