Rate limit exceeded when replica count > 1


I am deploying Temporal via Helm Charts to Kubernetes (with Istio service mesh), and I ran into a strange problem. At first, all services deploy properly, they seem to communicate with each other using both membership ports and the regular ones for gRPC communication. But when I launch a test workflow to check if everything is running properly, the services become very “chatty”, and often I get a “rate limit exceeded” error from the history service when I try to view workflow details in the UI. I’m not running any load tests, it’s just a single workflow instance that seem to cause this. Also, I haven’t changed the default RPS settings.

After reading this, I thought I’ll try to reduce replica count for all services from 2 to 1. After this change, everything went back to normal, and the workflows started to execute properly, I can also view the details without any problem.

So the question is, what am I doing wrong? I haven’t touched settings like broadcastAddress, so it’s by default using the pod IP. Also I use default port numbers. Maybe it’s some routing or load balancing problem in my services definitions or istio configuration that causes some redundant loops in the network traffic?

I’m using server version 1.22.3.

Thank you! :slight_smile: