Hi Team,
We are currently trying to introduce temporal workflows into our production application and for that I am trying to deploy a self hosted temporal server using the helm charts on to a GKE cluster with postgresql DB.
The server deployment is successful and all the services are talking to each other without any issues when the Istio MTLS is set to PERMISSIVE mode.
But the Temporal services are unable to talk to each other when Istio mtls set in STRICT mode. Its failing with Unable to bootstrap Ringpop error.
{"level":"info","ts":"2023-08-23T16:39:04.597Z","msg":"Membership heartbeat upserted successfully","address":"10.68.0.151","port":6939,"hostId":"92680b58-41d3-11ee-b3b3-56731cb242a4","logging-call-at":"monitor.go:256"}
{"level":"info","ts":"2023-08-23T16:39:04.601Z","msg":"bootstrap hosts fetched","bootstrap-hostports":"10.68.0.148:6934,10.68.0.147:6933,10.68.0.149:6935,10.68.0.151:6939","logging-call-at":"monitor.go:298"}
{"level":"warn","ts":"2023-08-23T16:39:50.772Z","msg":"unable to bootstrap ringpop. retrying","error":"join duration of 46.17039065s exceeded max 30s","logging-call-at":"ringpop.go:110"}
{"level":"info","ts":"2023-08-23T16:39:59.947Z","msg":"bootstrap hosts fetched","bootstrap-hostports":"10.68.0.148:6934,10.68.0.149:6935,10.68.0.151:6939","logging-call-at":"monitor.go:298"}
{"level":"error","ts":"2023-08-23T16:40:04.587Z","msg":"start failed","component":"fx","error":"context deadline exceeded","logging-call-at":"fx.go:1120","stacktrace":"go.temporal.io/server/common/log
I looked at the other discussions about setting the POD_IP as the broadcast address and also the other suggestion to set the appProtocol: tcp but still seeing the same issue.
https://community.temporal.io/t/temporal-workload-unable-to-talk-to-each-other-when-strct-mtls-enabled-in-istio/6650
https://community.temporal.io/t/unable-to-bootstrap-ringpop/1597
Here are the config, service and deployments manifests
temporaltest-worker-headless service
apiVersion: v1
kind: Service
metadata:
annotations:
cloud.google.com/neg: '{"ingress":true}'
meta.helm.sh/release-name: temporaltest
meta.helm.sh/release-namespace: temp1
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
creationTimestamp: "2023-08-23T15:32:50Z"
labels:
app.kubernetes.io/component: worker
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: temporal
name: temporaltest-worker-headless
namespace: temp1
resourceVersion: "685687"
uid: 858fc87e-afff-440c-9e6c-69aad7a270ea
spec:
clusterIP: None
clusterIPs:
- None
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- appProtocol: tcp
name: grpc-rpc
port: 7239
protocol: TCP
targetPort: rpc
- appProtocol: http
name: metrics
port: 9090
protocol: TCP
targetPort: metrics
publishNotReadyAddresses: true
selector:
app.kubernetes.io/component: worker
app.kubernetes.io/name: temporal
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
ConfigMap
global:
membership:
name: temporal
maxJoinDuration: 30s
broadcastAddress: {{ default .Env.POD_IP "0.0.0.0" }}
pprof:
port: 7936
metrics:
tags:
type: worker
prometheus:
timerType: histogram
listenAddress: "0.0.0.0:9090"
services:
frontend:
rpc:
grpcPort: 7233
membershipPort: 6933
bindOnIP: "0.0.0.0"
history:
rpc:
grpcPort: 7234
membershipPort: 6934
bindOnIP: "0.0.0.0"
matching:
rpc:
grpcPort: 7235
membershipPort: 6935
bindOnIP: "0.0.0.0"
worker:
rpc:
grpcPort: 7239
membershipPort: 6939
bindOnIP: "0.0.0.0"
deployment
spec:
containers:
- args:
- sleep 10
env:
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: ENABLE_ES
- name: ES_SEEDS
value: elasticsearch-master-headless
- name: ES_PORT
value: "9200"
- name: ES_VERSION
value: v7
- name: ES_SCHEME
value: http
- name: ES_VIS_INDEX
value: temporal_visibility_v1_dev
- name: ES_USER
- name: ES_PWD
- name: SERVICES
value: worker
- name: SQL_TLS
value: "true"
- name: SQL_TLS_DISABLE_HOST_VERIFICATION
value: "true"
- name: TEMPORAL_STORE_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: temporal-default-store
- name: TEMPORAL_VISIBILITY_STORE_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: temporal-visibility-store
image: temporalio/server:1.21.3
imagePullPolicy: IfNotPresent
name: temporal-worker
ports:
- containerPort: 7239
name: rpc
protocol: TCP
- containerPort: 9090
name: metrics
protocol: TCP
Please check and advice if something needs to be fixed in the config. Let me know if any additional information is needed.
Thanks in advance.