Hi,
I am looking for some recommendations around Temporal History configuration. We are running Temporal 1.26 against a PostgreSQL backend (AWS RDS). Every day we experience two latency spikes, caused by increased traffic due to lots of schedules triggering. After some time latencies recover and the cluster can comfortably serve requests even though the actual frontend request rate does only drop slightly.
We also use horizontal autoscaling on CPU and Memory and auto-scaling does kick in these situations, however, we are wondering whether auto-scaling makes things in this case worse than better? The cluster already is under higher load and DB latencies go up. Bringing up another history node will cause shard rebalancing which means even more traffic to the database to load the required context.
Anyone having similar experience? Any other idea around what the problem could be?
Thanks for any feedback,
–Hardy