Unable to bootstrap ringpop

Fredric_Doddridge · February 26, 2021, 5:36pm

I was wondering if anyone familiar with the ringpop service has run into this issue: I’ve set up separate kubernetes deployments for each service [frontend, history, matching, worker] based off of the helm chart, and think I’m close to getting the kinks worked out… but I’m getting the following error in the logs for each service that I hope someone can help with:

{“level”:“error”,“ts”:“2021-02-26T16:23:50.091Z”,“msg”:“unable to bootstrap ringpop. retrying”,“service”:“matching”,“error”:“join duration of 42.351297515s exceeded max 30s”,“logging-call-at”:“ringpop.go:114”,“stacktrace”:“go.temporal.io/server/common/log/loggerimpl.(*loggerImpl).Error\n\t/temporal/common/log/loggerimpl/logger.go:138\ngo.temporal.io/server/common/membership.(*RingPop).bootstrap\n\t/temporal/common/membership/ringpop.go:114\ngo.temporal.io/server/common/membership.(*RingPop).Start\n\t/temporal/common/membership/ringpop.go:83\ngo.temporal.io/server/common/membership.(*ringpopMonitor).Start\n\t/temporal/common/membership/rpMonitor.go:120\ngo.temporal.io/server/common/resource.(*Impl).Start\n\t/temporal/common/resource/resourceImpl.go:371\ngo.temporal.io/server/service/matching.(*Service).Start\n\t/temporal/service/matching/service.go:100\ngo.temporal.io/server/temporal.(*Server).Start.func1\n\t/temporal/temporal/server.go:187”}

On the 6th attempt it gives up and the service restarts. I suspect that the ringpop timeout has to do with a connection issue, but I don’t know how it tries to connect. I’ve verified that each service can connect to the others via name resolution (eg. from temporal-history I can ‘ping temporal-matching-headless’ successfully). When I run a netstat on temporal-frontend I see a lot of close_wait connections to the other services once the recv-q reaches 34:

tcp 34 0 temporal-frontend-7967c57655-ng7r4:7233 172-17-0-9.temporal-history-headless.default.svc.cluster.local:55062 CLOSE_WAIT
tcp 33 0 temporal-frontend-7967c57655-ng7r4:7233 172-17-0-14.temporal-matching-headless.default.svc.cluster.local:58796 ESTABLISHED
tcp 34 0 temporal-frontend-7967c57655-ng7r4:7233 172-17-0-14.temporal-matching-headless.default.svc.cluster.local:57544 CLOSE_WAIT

Because I don’t know what the ringpop service does I’m confused about how or why it’s failing… I could start digging into the code, but thought I’d see if anyone here could point me in the right direction. Does the ringpop service need additional ports open on the containers? Is there a specific config that I’m missing?

I’m using docker image ‘temporalio/auto-setup:1.7.0’

Some of the config properties I’m using:

# Services settings
BIND_ON_IP=0.0.0.0
# Frontend deployment settings
FRONTEND_GRPC_PORT=7233
FRONTEND_MEMBERSHIP_PORT=6933
# Matching deployment settings
MATCHING_GRPC_PORT=7235
MATCHING_MEMBERSHIP_PORT=6935
# History deployment settings
HISTORY_GRPC_PORT=7234
HISTORY_MEMBERSHIP_PORT=6934
# Worker deployment settings
WORKER_GRPC_PORT=7239
WORKER_MEMBERSHIP_PORT=6939
# To override the public client host port. (default is $BIND_ON_IP:$FRONTEND_GRPC_PORT)
PUBLIC_FRONTEND_ADDRESS=temporal-frontend.default.svc.cluster.local:$FRONTEND_GRPC_PORT

Fredric_Doddridge · February 26, 2021, 7:34pm

Ah, I think I resolved it. For anyone who has this issue, check the TEMPORAL_BROADCAST_ADDRESS env (global.membership.broadcastAddress) for each deployment. I had set it to ‘status.hostIP’ which is the node’s IP address when I needed to set it to ‘status.podIP’. The ringpop service was attempting to connect to each service on the same wrong IP…

So in each Deployment manifest I define the env:

- name: TEMPORAL_BROADCAST_ADDRESS
  valueFrom:
    fieldRef:
      fieldPath: status.podIP

And the cluster seems happy.

Topic		Replies	Views
Temporal Servers are going in crash loop in Kube because of "unable to bootstrap ringpop" Community Support ringpop , kubernetes , bootstrap	1	766	December 3, 2021
Running Temporal services independently though docker run throws error: "unable to bootstrap ringpop" Community Support	2	511	April 12, 2023
Temporal Server cannot starts and unable to bootstrap ringpop Community Support ringpop , bootstrap	3	1449	April 7, 2022
Temporal services deadline exeeded Server Deployment general-impl	5	868	May 30, 2023
Crash loop of history service in K8s cluster Community Support history , kubernetes	19	3716	April 30, 2021

Unable to bootstrap ringpop

Related topics