My understanding is that memory consumed by history pod is essentially a LRU cache. We’re seeing a huge uptick in memory usage by history pods. We have lot of short running workflows and my theory is that it’s because of this.
Is there a config to restrict this history size?
You can tune history (mutablestate) and events cache sizes for your cluster through dynamic config. Relevant knobs are the following:
history.cacheInitialSize
history.cacheMaxSize
history.eventsCacheInitialSize
history.eventsCacheMaxSize
These sizes represent number of cached entries per shard. So you might need to take into account the size of the cluster (number of history hosts) before setting a value which makes sense for your cluster deployment. For instance lets say you have 100 shards and 4 history host, then on average 25 shards would be placed on each history host. So if you set history.cacheMaxSize to 10 then it would mean 250 cached items on a single host on average.