Hi,
I want to understand how we are load balancing the grpc connections to temporal server internally. So, let’s say if we use a kubernetes service to expose an endpoint for the underlying pods of temporal server and then if we configure this endpoint in temporal client, then how are we establishing/maintaining grpc connections and making sure that load to temporal server pods is evenly distributed .
Also, please tell me if there is any other configuration required to make sure of this. Currently I am supplying a DNS:port in the hostport config.
I went through the list of questions posted already till July and didn’t find the topic. So, posting it as a new topic.
Temporal is a normal gRPC service. Its frontends expect that requests to them are more or less evenly distributed. Any existing techniques that applies to gRPC services should work with Temporal frontends as well.
That was a super fast response. Thanks
So, this means that if we just configure a single DNS in grpc client, we would need an additional service mesh/load balancer to cycle through the grpc connections. Either this or we have to configure the pod ips directly for the round robin load balance policy you are referring to.
I’m not a networking/gRPC/k8s expert, but it looks like there needs to be an additional load balancer mostly because HTTP/2 long-lived connections. This article may be a little outdated but gives some reasons.
The simple answer is use L7 (application layer) load balancer like Istio, Linkerd.
As @maxim said HTTP/2 uses long-lived connections with is not compatible with TCP protocol (L4, network layer).
Hi, All @vikas Did you solved your issue? what service mesh did you use? @ProximaB your post is clarifying but I still feel like having gaps in my tech stack.
I posted a community post to get some feedback. I’ll be glad you would take the time to share your thoughts on it. Please.
I was also looking into this by using Linkerd as the service mesh because other services in my cluster benefit from using it.
I think it would suite your use case of load balancing the server in a way to minimise the overall latency.
In my system I have some activities that call internal services and linkerd should bring some benefits like reliability etc.
Hopefully all this wont affect the running of the temporal system.
Did anyone end up using linkerd? I’ve got it setup with default configuration and I see about 3% of “RespondWorkflowTaskCompleted” requests fail from my worker. If I disable linkerd, the failures disappear. I’d love some tips if someone has this working!