Hi,
I have some questions regarding memory usage of history pods.
Scenario
I use 1 pod frontend, 1 pod history, 1 pod matching, and 1 pod worker.
I use dynamic config below and not assign the history cache size config at all (use default), and use NUM_HISTORY_SHARDS=4096.
I test the pods with load with 100 workflow/secs to see the behaviour.
Question 1
History pod used big memory usage during load (as expected).
But when idle for 1 hours, the memory of the history pod did not decrease at all and stay around 60% mem usage.
I read the default config for HistoryCacheTTL & EventsCacheTTL is 1 hour (time.Hour), but why history mem usage did not decrease at all after 1 hour idle?
I use docker temporalio/server:1.16.2 for the pods.
Question 2
So, regarding this post ,
my 1 pod history should be using 4096 * 512 (default HistoryCacheMaxSize) = 2,097,152 cached items.
How to calculate the required memory for the max cached items?
Cached items should be removed if cache size limit is reached.
if the cached item not removed within some interval, then what are HistoryCacheTTL & EventsCacheTTL in dynamic config for? I thought it was for cache TTL in dynamic conf.
// HistoryCacheTTL is TTL of history cache
HistoryCacheTTL = history.cacheTTL
// EventsCacheTTL is TTL of events cache
EventsCacheTTL = history.eventsCacheTTL
Can you provide the example query how to query the metrics history_size ?
So, currently I have problem history pods always OOM.
I already increased the history pods to 2 pods, and max mem 24GB, but still got OOM.
current configuration I use 4096 shards, and default HistoryCacheMaxSize.
Do you have any recommendation how to manage the history pods so do not get OOM with 4096 shards?
Hi we are also experiencing OOM errors on history service
Any update here?
Specifically if ttl is not working how do we remove cached items from history service?
It looks like that temporal history service do need not acceptable high memory.
There’re many issues about high memory, and he team of temporal has not yet given a valid feedback.
Do you have any recommendation how to manage the history pods so do not get OOM with 4096 shards?
How many history hosts do you deploy? Temporal tries to evenly distribute shards across history hosts.
team of temporal has not yet given a valid feedback
Can you give more info, server version, persistence store used, namespace retention duration. I dont think that we haven’t provided “valid” feedback rather that often solution is pretty dependent on user deployment setup.
Temporal does provide dynamic configs: