On start hook failed : context deadline exceeded in k8s namespace 'temporal'

Hello @tihomir & team,

I’m facing a weird issue that when I deploy temporal services in k8s namespace named ‘temporal’, the deployment failed with following error.

error logs from frontend service.

│ temporal-frontend {"level":"error","ts":"2023-01-02T07:17:03.642Z","msg":"start failed","component":"fx","error":"OnStart hook added by go.temporal.io/server/common/resource.MembershipMonitorProvider failed: co │
│ ntext deadline exceeded\n\ngo.temporal.io/server/common/namespace.RegistryLifetimeHooks.func1() took 2.837115ms from:\n\tgo.temporal.io/server/common/namespace.RegistryLifetimeHooks (/home/builder/temporal/comm │
│ on/namespace/fx.go:46)\ngo.temporal.io/server/common/cluster.MetadataLifetimeHooks.func1() took 2.748064ms from:\n\tgo.temporal.io/server/common/cluster.MetadataLifetimeHooks (/home/builder/temporal/common/clus │
│ ter/fx.go:42)\ngo.temporal.io/server/common/metrics.RuntimeMetricsReporterLifetimeHooks.func1() took 247.734µs from:\n\tgo.temporal.io/server/common/metrics.RuntimeMetricsReporterLifetimeHooks (/home/builder/te │
│ mporal/common/metrics/fx.go:44)\n","logging-call-at":"fx.go:1030","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:143\ngo.temporal.io/server/ │
│ temporal.(*fxLogAdapter).LogEvent\n\t/home/builder/temporal/temporal/fx.go:1030\ngo.uber.org/fx.(*App).Start.func1\n\t/go/pkg/mod/go.uber.org/fx@v1.17.1/app.go:666\ngo.uber.org/fx.(*App).Start\n\t/go/pkg/mod/go │
│ .uber.org/fx@v1.17.1/app.go:674\ngo.temporal.io/server/temporal.(*ServerImpl).Start.func1\n\t/home/builder/temporal/temporal/server_impl.go:116"}                                                                  │
│ temporal-frontend {"level":"error","ts":"2023-01-02T07:17:29.760Z","msg":"unable to bootstrap ringpop. retrying","service":"frontend","error":"join duration of 41.10510962s exceeded max 30s","logging-call-at":" │
│ ringpop.go:109","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:143\ngo.temporal.io/server/common/membership.(*RingPop).bootstrap.func1\n\t/h │
│ ome/builder/temporal/common/membership/ringpop.go:109\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/builder/temporal/common/backoff/retry.go:170\ngo.temporal.io/server/common/backoff.Thrott │
│ leRetryContext\n\t/home/builder/temporal/common/backoff/retry.go:194\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/builder/temporal/common/backoff/retry.go:171\ngo.temporal.io/server/common/membe │
│ rship.(*RingPop).bootstrap\n\t/home/builder/temporal/common/membership/ringpop.go:114\ngo.temporal.io/server/common/membership.(*RingPop).Start\n\t/home/builder/temporal/common/membership/ringpop.go:84\ngo.temp │
│ oral.io/server/common/membership.(*ringpopMonitor).Start\n\t/home/builder/temporal/common/membership/rpMonitor.go:135\ngo.temporal.io/server/common/resource.MembershipMonitorProvider.func1\n\t/home/builder/temp │
│ oral/common/resource/fx.go:268\ngo.uber.org/fx/internal/lifecycle.(*Lifecycle).runStartHook\n\t/go/pkg/mod/go.uber.org/fx@v1.17.1/internal/lifecycle/lifecycle.go:120\ngo.uber.org/fx/internal/lifecycle.(*Lifecyc │
│ le).Start\n\t/go/pkg/mod/go.uber.org/fx@v1.17.1/internal/lifecycle/lifecycle.go:85\ngo.uber.org/fx.(*App).start\n\t/go/pkg/mod/go.uber.org/fx@v1.17.1/app.go:683\ngo.uber.org/fx.withTimeout.func1\n\t/go/pkg/mod/ │
│ go.uber.org/fx@v1.17.1/app.go:773"}

This is happening when istio mtls enabled and deployed in k8s namespace ‘temporal’ however when we deploy it in any other namespace, it just works fine. Any idea on this behaviour?

Hi, sorry for late response,

unable to bootstrap ringpop

Temporal uses gossip-based membership library ringpop .
Each service when it starts up writes its host addressand port to the cluster_membership table.

You can get this info from tctl, for example:

tctl adm membership list_db

In your static config take a look at (for each service):

bindOnIP used to bind a service on specific ip (for example. 0.0.0.0)
broadcastAddress can be used when bindOnIP is same across several nodes (again, for example. 0.0.0.0)

My guess here is that when istio is enabled you might need to specify specific broadcastAddress for each of your services.