Not enough hosts to serve request in temporal

We are continuously seeing this error in frontend and history Pods. We don’t see any resource issues

{“level”:“error”,“ts”:“2022-12-08T06:34:43.200Z”,“msg”:“Internal service error”,“service”:“frontend”,“error”:“Not enough hosts to serve the request”,“logging-call-at”:“workflowHandler.go:3406”,“stacktrace”:“go.temporal.io/server/common/log/loggerimpl.(*loggerImpl).Error\n\t/temporal/common/log/loggerimpl/logger.go:138\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).error\n\t/temporal/service/frontend/workflowHandler.go:3406\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).PollActivityTaskQueue\n\t/temporal/service/frontend/workflowHandler.go:1122\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollActivityTaskQueue.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:502\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:116\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollActivityTaskQueue\n\t/temporal/service/frontend/dcRedirectionHandler.go:498\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollActivityTaskQueue_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v1.4.0/workflowservice/v1/service.pb.go:1137\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/temporal/common/authorization/interceptor.go:136\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:1051\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:100\ngoogle.golang.org/grpc.chainUnaryServerInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:1037\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollActivityTaskQueue_Handler\n\t/go/pkg/mod/go.temporal.io/api@v1.4.0/workflowservice/v1/service.pb.go:1139\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:1210\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:1533\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:871”}

2 Likes

Not enough hosts to serve the request

This means frontend ringpop is not able to find at least one host for history / matching service.
Do you health check each service?

./grpc-health-probe -addr=<frontend-host>:<frontend-grpc-port> -service=temporal.api.workflowservice.v1.WorkflowService

./grpc-health-probe -addr=<matching-host>:<matching-grpc-port> -service=temporal.api.workflowservice.v1.MatchingService

./grpc-health-probe -addr=<history-host>:<history-grpc-port> -service=temporal.api.workflowservice.v1.HistoryService

Worth maybe also trying:

tctl adm cl d

and looking at “rings” to make sure you see at least one pod for each of the roles.

Maybe some of your pods are restarting / failing due to low resources / any other reasons? How many history/matching pods do you have running?
Do you have a single or multi-cluster deployments? If you have multiple clusters check maybe a service pod is by mistake joining another clusters membership ring?