Traffic is not load balanced among all worker-service instances deployed in Kubernetes

Rami · June 24, 2026, 2:50pm

Hi everyone,

We recently deployed Temporal on Kubernetes, with each service (frontend, history, matching, and worker) running in its own Deployment. Each service is configured with 2 replicas.

While observing the cluster, we noticed that only one replica of some services (the worker service in particular) appears to handle most of the traffic, while the second replica remains almost idle.

My understanding is that communication between Temporal services relies on gRPC. Since gRPC uses long-lived HTTP/2 connections, traffic is typically pinned to a single backend once a connection is established. As a result, Kubernetes Services alone may not provide effective load balancing across replicas unless additional mechanisms (service mesh, L7 load balancer, etc.) are introduced.

In our environment, deploying a service mesh is not an option. However, we do use Cilium, which can perform load balancing for ClusterIP Services through specific annotations, more details can be found here

The issue is that the Temporal Helm chart creates headless Services for inter-service communication. Because headless Services return pod IPs directly, traffic appears to remain pinned to a single replica.

This raises a few questions:

What is the rationale behind using headless Services for Temporal’s internal services?
Can these headless Services be safely replaced with regular ClusterIP Services?
Is there a supported configuration (Helm chart or otherwise) to deploy ClusterIP Services instead of, or in addition to, the headless Services?
If headless Services are required, shouldn’t the client-side gRPC implementation perform load balancing across the returned endpoints?

For example, something along these lines:

target := fmt.Sprintf("dns:///%s:%s", host, port)

dialOpts := []grpc.DialOption{
    grpc.WithTransportCredentials(insecure.NewCredentials()),
    grpc.WithDefaultServiceConfig(`{"loadBalancingPolicy":"round_robin"}`),
}

conn, err := grpc.NewClient(target, dialOpts...)
if err != nil {
    log.Fatalf("Failed to connect to gRPC server: %v", err)
}
defer conn.Close()

From what I can see, this would allow the gRPC client to resolve multiple pod IPs from the headless Service and distribute requests across them.

Am I misunderstanding how Temporal’s internal service discovery and load balancing work, or is there another mechanism that already handles this?

Thanks!

Topic		Replies	Views
Load balancing of gRPC connections in temporal client Community Support go-sdk	7	3683	October 18, 2022
HA of Temporal Server(Cluster) Community Support deployment , kubernetes	5	1630	June 2, 2022
Temporal on kubernetes Community Support java-sdk , deployment	4	1744	October 28, 2022
Running temporal across multiple Kubernetes clusters Community Support multicluster , kubernetes	6	2251	September 1, 2022
Problems with a basic local deployment of services Server Deployment	3	330	January 29, 2026

Traffic is not load balanced among all worker-service instances deployed in Kubernetes

Related topics