Hi everyone,
We recently deployed Temporal on Kubernetes, with each service (frontend, history, matching, and worker) running in its own Deployment. Each service is configured with 2 replicas.
While observing the cluster, we noticed that only one replica of some services (the worker service in particular) appears to handle most of the traffic, while the second replica remains almost idle.
My understanding is that communication between Temporal services relies on gRPC. Since gRPC uses long-lived HTTP/2 connections, traffic is typically pinned to a single backend once a connection is established. As a result, Kubernetes Services alone may not provide effective load balancing across replicas unless additional mechanisms (service mesh, L7 load balancer, etc.) are introduced.
In our environment, deploying a service mesh is not an option. However, we do use Cilium, which can perform load balancing for ClusterIP Services through specific annotations, more details can be found here
The issue is that the Temporal Helm chart creates headless Services for inter-service communication. Because headless Services return pod IPs directly, traffic appears to remain pinned to a single replica.
This raises a few questions:
- What is the rationale behind using headless Services for Temporal’s internal services?
- Can these headless Services be safely replaced with regular ClusterIP Services?
- Is there a supported configuration (Helm chart or otherwise) to deploy ClusterIP Services instead of, or in addition to, the headless Services?
- If headless Services are required, shouldn’t the client-side gRPC implementation perform load balancing across the returned endpoints?
For example, something along these lines:
target := fmt.Sprintf("dns:///%s:%s", host, port)
dialOpts := []grpc.DialOption{
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithDefaultServiceConfig(`{"loadBalancingPolicy":"round_robin"}`),
}
conn, err := grpc.NewClient(target, dialOpts...)
if err != nil {
log.Fatalf("Failed to connect to gRPC server: %v", err)
}
defer conn.Close()
From what I can see, this would allow the gRPC client to resolve multiple pod IPs from the headless Service and distribute requests across them.
Am I misunderstanding how Temporal’s internal service discovery and load balancing work, or is there another mechanism that already handles this?
Thanks!