Recently, we got the following error when trying to create a new schedule:
StatusRuntimeException: UNAVAILABLE: Not enough hosts to serve the request
After doing a bit of research, it turns out that the temporaltest-history pods were being killed with OOMKiller(137) and restarting over and over again.
The deployment is on AWS EKS, and the pods have the minimum amount or resources for memory and cpu; 0.25 units for CPU and 0.5 Gib for Memory.
As we are just starting to use Temporal, our avg/peak WPS is < 10 WPS.
These values might be too high for your setup. These cache sizes are per shard, so if you have event payloads which are larger then it could quickly add up and take up all the memory on your history service. Based on the numbers you provided above:
MutableStateCacheSize = 512 (shards) X 512 (Default Max Items) = 256K Items
HistoryEvent Cache Size = 512 (shards) X 512 (Default Max Items) = 256K Items
So if you use an avg of 10KB per item then these 2 caches could take upto 5 Gigs of memory. Can you share how many history service pods you have and what is the memory limit on those pods?
Can you try configuring the cache sizes to smaller number and see if this helps with the OOM?
We have workstreams in progress to make this simpler by configuring it per history host and setting the limit in bytes rather the cache items.
And yes, please use dynamic_config to override these limits.
Would Item in this case be a single line(an Event) in the workflow, or the entire workflow(comprised of multiple events)? I clicked on download on a workflow, and it showed 30K for the json file.
Also, is the cache here for the workers to be able to continue from where the last execution left off, or is it for the Web UI to display results quick?
What are the ramifications of the following:
No cache
Small cache
Large cache we know takes up too much memory in the history pods.
The setup from above works perfectly. I just had to escape the . when using --set during helm install. The . is part of the config name and not indicative of a sub property.