Load balancing of gRPC connections in temporal client

vikas · October 24, 2020, 3:01am

Hi,
I want to understand how we are load balancing the grpc connections to temporal server internally. So, let’s say if we use a kubernetes service to expose an endpoint for the underlying pods of temporal server and then if we configure this endpoint in temporal client, then how are we establishing/maintaining grpc connections and making sure that load to temporal server pods is evenly distributed .

Also, please tell me if there is any other configuration required to make sure of this. Currently I am supplying a DNS:port in the hostport config.

I went through the list of questions posted already till July and didn’t find the topic. So, posting it as a new topic.

maxim · October 24, 2020, 3:08am

Temporal is a normal gRPC service. Its frontends expect that requests to them are more or less evenly distributed. Any existing techniques that applies to gRPC services should work with Temporal frontends as well.

By default, the temporal SDKs use the gRPC round_robin load balancing policy.

It is also possible to use a load balancer or service mesh with load balancing capabilities to distribute the load.

vikas · October 24, 2020, 3:13am

That was a super fast response. Thanks
So, this means that if we just configure a single DNS in grpc client, we would need an additional service mesh/load balancer to cycle through the grpc connections. Either this or we have to configure the pod ips directly for the round robin load balance policy you are referring to.

maxim · October 24, 2020, 3:33am

I’m not a networking/gRPC/k8s expert, but it looks like there needs to be an additional load balancer mostly because HTTP/2 long-lived connections. This article may be a little outdated but gives some reasons.

ProximaB · October 25, 2020, 6:58pm

The simple answer is use L7 (application layer) load balancer like Istio, Linkerd.
As @maxim said HTTP/2 uses long-lived connections with is not compatible with TCP protocol (L4, network layer).

A_J_Molano · February 12, 2021, 11:18am

Hi, All
@vikas Did you solved your issue? what service mesh did you use?
@ProximaB your post is clarifying but I still feel like having gaps in my tech stack.

I posted a community post to get some feedback. I’ll be glad you would take the time to share your thoughts on it. Please.

aon108 · August 9, 2021, 6:28pm

I was also looking into this by using Linkerd as the service mesh because other services in my cluster benefit from using it.
I think it would suite your use case of load balancing the server in a way to minimise the overall latency.
In my system I have some activities that call internal services and linkerd should bring some benefits like reliability etc.
Hopefully all this wont affect the running of the temporal system.

brian · October 18, 2022, 5:48am

Did anyone end up using linkerd? I’ve got it setup with default configuration and I see about 3% of “RespondWorkflowTaskCompleted” requests fail from my worker. If I disable linkerd, the failures disappear. I’d love some tips if someone has this working!

Topic		Replies	Views
Temporal Workers not able to access Temporal Cluster deployed in HTTP based front door Community Support java-sdk , worker , grpc	8	456	February 14, 2024
Why temporal/cadence chose rpc over REST? Community Support	6	3723	August 27, 2020
Get ingress working for frontend gRPC service Community Support	1	468	September 13, 2024
Is there a example of nginx proxying request to frontend service Community Support	3	1938	February 12, 2021
Can anybody help on the setup of Temporal frontend exposed with emissary ingress and GCP http(s) LB? Community Support go-sdk , temporal-031	4	1144	May 23, 2023

Load balancing of gRPC connections in temporal client

Related topics