Not enough hosts to serve the request in temporal

Hi. We’re trying to deploy Temporal (1.29.2) to the company cloud. We are using PostgreSQL as both the primary database and the visibility database. The migrations ran without issues, and a namespace was created. However, the Temporal server terminates with an error:

{"level":"error","ts":"2026-01-19T15:20:19.751+0300","msg":"Unable to call matching.PollActivityTaskQueue.","service":"frontend","wf-task-queue-name":"temporal-sys-tq-scanner-taskqueue-0","timeout":"1m8.942612726s","error":"Not enough hosts to serve the request","logging-call-at":"/home/runner/work/temporal/temporal/service/frontend/workflow_handler.go:1132","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/runner/work/temporal/temporal/common/log/zap_logger.go:132\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).PollActivityTaskQueue\n\ ...}

In matching server other issue

{"level":"error","ts":"2026-01-19T16:13:23.159+0300","msg":"error fetching user data from parent","component":"matching-engine","wf-task-queue-name":"/_sys/temporal-sys-processor-parent-close-policy/3","wf-task-queue-type":"Activity","wf-namespace":"temporal-system","error":"Not enough hosts to serve the request","logging-call-at":"/home/runner/work/temporal/temporal/service/matching/user_data_manager.go:292","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/runner/work/temporal/temporal/common/log/zap_logger.go:151\ngo.temporal.io/server/service/matching.(*userDataManagerImpl).fetchUserData.func1\n\t/home/runner/work/temporal/temporal/service/matching/user_data_manager.go:292\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/runner/work/temporal/temporal/common/backoff/retry.go:65\ngo.temporal.io/server/service/matching.(*userDataManagerImpl).fetchUserData\n\t/home/runner/work/temporal/temporal/service/matching/user_data_manager.go:325\ngo.temporal.io/server/common/goro.(*Group).Go.func1\n\t/home/runner/work/temporal/temporal/common/goro/group.go:34"}

This is currently a stage cluster, so each service (frontend, matching, history, worker) has two replicas.

In dynamicconfig, we have only specified this:

system.enableNexus:
- value: false

numHistoryShards is set to 512.

Full config is:

log:
  stdout: true
  level: info

persistence:
  defaultStore: postgres-default
  visibilityStore: postgres-visibility
  numHistoryShards: 512
  datastores:
    postgres-default:
      sql:
        pluginName: "postgres12_pgx"
        databaseName: "temporal"
        connectAddr: {{ app["temporal-server-matching"].db.default.host }}
        connectProtocol: "tcp"
        user: {{ app["temporal-server-matching"].db.default.user }}
        password: {{ app["temporal-server-matching"].db.default.password }}
        connectAttributes:
          target_session_attrs: read-write
        maxConns: 20
        maxIdleConns: 20
        maxConnLifetime: "1h"
    postgres-visibility:
      sql:
        pluginName: "postgres12_pgx"
        databaseName: "temporal_visibility"
        connectAddr: {{ app["temporal-server-matching"].db.visibility.host }}
        connectProtocol: "tcp"
        user: {{ app["temporal-server-matching"].db.visibility.user }}
        password: {{ app["temporal-server-matching"].db.visibility.password }}
        connectAttributes:
          target_session_attrs: read-write
        maxConns: 20
        maxIdleConns: 20
        maxConnLifetime: "1h"

global:
  membership:
    maxJoinDuration: 30s
    broadcastAddress: "{% raw %}{{ env('cloud_lan4') }}{% endraw %}"
  pprof:
    port: 7936
  metrics:
    prometheus:
      framework: "tally"
      timerType: "histogram"
      listenAddress: "0.0.0.0:23569"

services:
  frontend:
    rpc:
      grpcPort: 7233
      membershipPort: 6939
      bindOnIP: 0.0.0.0

  matching:
    rpc:
      grpcPort: 7233
      membershipPort: 6939
      bindOnIP: 0.0.0.0

  history:
    rpc:
      grpcPort: 7233
      membershipPort: 6939
      bindOnIP: 0.0.0.0

  worker:
    rpc:
      grpcPort: 7233
      membershipPort: 6939

clusterMetadata:
  enableGlobalNamespace: false
  failoverVersionIncrement: 10
  masterClusterName: "active"
  currentClusterName: "active"
  clusterInformation:
    active:
      enabled: true
      initialFailoverVersion: 1
      rpcName: "frontend"
      rpcAddress: "{{ app["temporal-server-matching"].frontend.address }}"

dcRedirectionPolicy:
  policy: "noop"

dynamicConfigClient:
  filepath: "/etc/temporal/dynamicconfig.yaml"
  pollInterval: "10s"

Where should we look to resolve this issue?

Not enough hosts to serve the request

means Ringpop was not able to find at least once instance of hosts (history, matching, worker, frontend) in cluster
try checking with tctl

tctl adm cl d | jq .membershipInfo.rings

and check if you see at least one member of each service role up

are you deploying the auto-setup image or your use server image and create host per service role? can you bash into lets say your frontend pod and look at the generated static config in /etc/temporal/config/docker.yaml? asking because your services: section? maybe worker service role also needs bindOnIp?

Apologies for the trouble. We found an error in the port opening policy, which was causing the pod to be terminated.