We have a K8S deployment with Front end - 3 pods, 4 cores 4GB each ; History - 5 pods, 8 cores 8GB each; Matching - 3 pods, 4 cores 4GB each. When we scale down and then scale up frontend pods back to 3 pods, observed that service_pending_requests metric was being emitted only from a single pod even when all the pods were in running state, which we believe in turn was degrading our response times during load testing since same load was doing good before scale down activity of frontend pods.
Later, we scaled down back to 0 and scaled gradually one after another, and this time the metric was being emitted from all pods.
Need help, unable to point out the reason behind this behavior…
Also, is there any recommendation to verify for readiness of all the temporal services during deployment?