In our Temporal deployment (not helm, plain k8s manifests through Flux) the worker pod was being restarted regularly due to liveness probe failures.
Workflows run fine so it’s not causing any problems.
~ $ kubectl get po -n temporal
NAME READY STATUS RESTARTS AGE
temporal-admin-tools-5df46645cb-7fvzm 1/1 Running 0 33h
temporal-frontend-7787fd7d48-l4qzv 1/1 Running 0 20h
temporal-history-f6f948749-szx98 1/1 Running 0 20h
temporal-matching-d98cd66dd-nt2kk 1/1 Running 0 20h
temporal-web-65bc777746-f4tvb 1/1 Running 0 20h
temporal-worker-57b4557498-9gh26 1/1 Running 48 20h
~ $ kubectl describe po -n temporal temporal-worker-57b4557498-9gh26
Name: temporal-worker-57b4557498-9gh26
Namespace: temporal
[...]
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 5m14s (x144 over 19h) kubelet Liveness probe failed: dial tcp 11.32.104.6:7239: i/o timeout
Normal Killing 5m14s (x48 over 19h) kubelet Container temporal-worker failed liveness probe, will be restarted
Normal Pulled 4m44s (x48 over 19h) kubelet Container image "remote-docker.artifactory.swisscom.com/temporalio/server:1.10.5" already present on machine
Pod logs:
~ $ kubectl logs -n temporal temporal-worker-57b4557498-9gh26
2021/07/09 05:20:39 Loading config; env=docker,zone=,configDir=config
2021/07/09 05:20:39 Loading config files=[config/docker.yaml]
{"level":"info","ts":"2021-07-09T05:20:39.619Z","msg":"Updated dynamic config","logging-call-at":"file_based_client.go:235"}
{"level":"info","ts":"2021-07-09T05:20:39.619Z","msg":"Starting server for services","value":["worker"],"logging-call-at":"server.go:117"}
{"level":"info","ts":"2021-07-09T05:20:39.624Z","msg":"Get dynamic config","name":"system.advancedVisibilityWritingMode","value":"off","default-value":"off","logging-call-at":"config.go:79"}
{"level":"info","ts":"2021-07-09T05:20:39.713Z","msg":"PProf listen on ","port":7936,"logging-call-at":"pprof.go:73"}
{"level":"info","ts":"2021-07-09T05:20:39.739Z","msg":"Get dynamic config","name":"frontend.validSearchAttributes","value":{},"default-value":{},"logging-call-at":"config.go:79"}
{"level":"info","ts":"2021-07-09T05:20:39.774Z","msg":"Get dynamic config","name":"worker.throttledLogRPS","value":20,"default-value":20,"logging-call-at":"config.go:79"}
{"level":"info","ts":"2021-07-09T05:20:39.774Z","msg":"Created gRPC listener","service":"worker","address":"0.0.0.0:7239","logging-call-at":"rpc.go:135"}
{"level":"info","ts":"2021-07-09T05:20:39.774Z","msg":"Get dynamic config","name":"worker.persistenceGlobalMaxQPS","value":0,"default-value":0,"logging-call-at":"config.go:79"}
{"level":"info","ts":"2021-07-09T05:20:39.774Z","msg":"Get dynamic config","name":"worker.persistenceMaxQPS","value":500,"default-value":500,"logging-call-at":"config.go:79"}
{"level":"info","ts":"2021-07-09T05:20:39.803Z","msg":"worker starting","service":"worker","component":"worker","logging-call-at":"service.go:160"}
{"level":"info","ts":"2021-07-09T05:20:39.804Z","msg":"RuntimeMetricsReporter started","service":"worker","logging-call-at":"runtime.go:154"}
{"level":"info","ts":"2021-07-09T05:20:39.817Z","msg":"Membership heartbeat upserted successfully","service":"worker","address":"11.32.104.6","port":6939,"hostId":"665bf752-e075-11eb-a4de-c23e58f1f794","logging-call-at":"rpMonitor.go:222"}
{"level":"info","ts":"2021-07-09T05:20:39.826Z","msg":"bootstrap hosts fetched","service":"worker","bootstrap-hostports":"11.32.104.6:6939,11.32.104.5:6935,11.32.104.3:6934,11.32.104.4:6933","logging-call-at":"rpMonitor.go:263"}
{"level":"info","ts":"2021-07-09T05:20:39.832Z","msg":"Current reachable members","service":"worker","component":"service-resolver","service":"worker","addresses":["11.32.104.6:7239"],"logging-call-at":"rpServiceResolver.go:266"}
{"level":"info","ts":"2021-07-09T05:20:39.833Z","msg":"Current reachable members","service":"worker","component":"service-resolver","service":"frontend","addresses":["11.32.104.4:7233"],"logging-call-at":"rpServiceResolver.go:266"}
{"level":"info","ts":"2021-07-09T05:20:39.833Z","msg":"Current reachable members","service":"worker","component":"service-resolver","service":"history","addresses":["11.32.104.3:7234"],"logging-call-at":"rpServiceResolver.go:266"}
{"level":"info","ts":"2021-07-09T05:20:39.833Z","msg":"Current reachable members","service":"worker","component":"service-resolver","service":"matching","addresses":["11.32.104.5:7235"],"logging-call-at":"rpServiceResolver.go:266"}
{"level":"info","ts":"2021-07-09T05:20:39.849Z","msg":"Service resources started","service":"worker","address":"11.32.104.6:7239","logging-call-at":"resourceImpl.go:396"}
{"level":"info","ts":"2021-07-09T05:20:39.857Z","msg":"Get dynamic config","name":"worker.executionsScannerEnabled","value":false,"default-value":false,"logging-call-at":"config.go:79"}
{"level":"info","ts":"2021-07-09T05:20:39.857Z","msg":"Get dynamic config","name":"worker.taskQueueScannerEnabled","value":true,"default-value":true,"logging-call-at":"config.go:79"}
{"level":"info","ts":"2021-07-09T05:20:40.173Z","msg":"Started Worker","Namespace":"temporal-system","TaskQueue":"temporal-sys-tq-scanner-taskqueue-0","WorkerID":"12@temporal-worker-57b4557498-9gh26@","logging-call-at":"scanner.go:139"}
{"level":"info","ts":"2021-07-09T05:20:40.173Z","msg":"Get dynamic config","name":"worker.enableBatcher","value":true,"default-value":true,"logging-call-at":"config.go:79"}
{"level":"info","ts":"2021-07-09T05:20:40.189Z","msg":"Started Worker","Namespace":"temporal-system","TaskQueue":"temporal-sys-batcher-taskqueue","WorkerID":"12@temporal-worker-57b4557498-9gh26@","logging-call-at":"batcher.go:94"}
{"level":"info","ts":"2021-07-09T05:20:40.189Z","msg":"Get dynamic config","name":"system.enableParentClosePolicyWorker","value":true,"default-value":true,"logging-call-at":"config.go:79"}
{"level":"info","ts":"2021-07-09T05:20:40.208Z","msg":"Started Worker","Namespace":"temporal-system","TaskQueue":"temporal-sys-processor-parent-close-policy","WorkerID":"12@temporal-worker-57b4557498-9gh26@","logging-call-at":"processor.go:86"}
{"level":"info","ts":"2021-07-09T05:20:40.238Z","msg":"Started Worker","Namespace":"temporal-system","TaskQueue":"temporal-sys-add-search-attributes-task-queue","WorkerID":"12@temporal-worker-57b4557498-9gh26@","logging-call-at":"addsearchattributes.go:85"}
{"level":"info","ts":"2021-07-09T05:20:40.239Z","msg":"worker started","service":"worker","component":"worker","logging-call-at":"service.go:182"}
{"level":"info","ts":"2021-07-09T05:20:43.908Z","msg":"temporal-sys-tq-scanner-workflow workflow successfully started","service":"worker","logging-call-at":"scanner.go:186"}
~ $ kubectl logs --previous -n temporal temporal-worker-57b4557498-9gh26
2021/07/09 04:55:39 Loading config; env=docker,zone=,configDir=config
2021/07/09 04:55:39 Loading config files=[config/docker.yaml]
{"level":"info","ts":"2021-07-09T04:55:39.621Z","msg":"Updated dynamic config","logging-call-at":"file_based_client.go:235"}
{"level":"info","ts":"2021-07-09T04:55:39.621Z","msg":"Starting server for services","value":["worker"],"logging-call-at":"server.go:117"}
{"level":"info","ts":"2021-07-09T04:55:39.626Z","msg":"Get dynamic config","name":"system.advancedVisibilityWritingMode","value":"off","default-value":"off","logging-call-at":"config.go:79"}
{"level":"info","ts":"2021-07-09T04:55:39.715Z","msg":"PProf listen on ","port":7936,"logging-call-at":"pprof.go:73"}
{"level":"info","ts":"2021-07-09T04:55:39.743Z","msg":"Get dynamic config","name":"frontend.validSearchAttributes","value":{},"default-value":{},"logging-call-at":"config.go:79"}
{"level":"info","ts":"2021-07-09T04:55:39.780Z","msg":"Get dynamic config","name":"worker.throttledLogRPS","value":20,"default-value":20,"logging-call-at":"config.go:79"}
{"level":"info","ts":"2021-07-09T04:55:39.781Z","msg":"Created gRPC listener","service":"worker","address":"0.0.0.0:7239","logging-call-at":"rpc.go:135"}
{"level":"info","ts":"2021-07-09T04:55:39.781Z","msg":"Get dynamic config","name":"worker.persistenceGlobalMaxQPS","value":0,"default-value":0,"logging-call-at":"config.go:79"}
{"level":"info","ts":"2021-07-09T04:55:39.781Z","msg":"Get dynamic config","name":"worker.persistenceMaxQPS","value":500,"default-value":500,"logging-call-at":"config.go:79"}
{"level":"info","ts":"2021-07-09T04:55:39.810Z","msg":"worker starting","service":"worker","component":"worker","logging-call-at":"service.go:160"}
{"level":"info","ts":"2021-07-09T04:55:39.810Z","msg":"RuntimeMetricsReporter started","service":"worker","logging-call-at":"runtime.go:154"}
{"level":"info","ts":"2021-07-09T04:55:39.826Z","msg":"Membership heartbeat upserted successfully","service":"worker","address":"11.32.104.6","port":6939,"hostId":"e84aff55-e071-11eb-a37e-c23e58f1f794","logging-call-at":"rpMonitor.go:222"}
{"level":"info","ts":"2021-07-09T04:55:39.836Z","msg":"bootstrap hosts fetched","service":"worker","bootstrap-hostports":"11.32.104.5:6935,11.32.104.3:6934,11.32.104.4:6933,11.32.104.6:6939","logging-call-at":"rpMonitor.go:263"}
{"level":"info","ts":"2021-07-09T04:55:39.842Z","msg":"Current reachable members","service":"worker","component":"service-resolver","service":"worker","addresses":["11.32.104.6:7239"],"logging-call-at":"rpServiceResolver.go:266"}
{"level":"info","ts":"2021-07-09T04:55:39.843Z","msg":"Current reachable members","service":"worker","component":"service-resolver","service":"frontend","addresses":["11.32.104.4:7233"],"logging-call-at":"rpServiceResolver.go:266"}
{"level":"info","ts":"2021-07-09T04:55:39.843Z","msg":"Current reachable members","service":"worker","component":"service-resolver","service":"history","addresses":["11.32.104.3:7234"],"logging-call-at":"rpServiceResolver.go:266"}
{"level":"info","ts":"2021-07-09T04:55:39.843Z","msg":"Current reachable members","service":"worker","component":"service-resolver","service":"matching","addresses":["11.32.104.5:7235"],"logging-call-at":"rpServiceResolver.go:266"}
{"level":"info","ts":"2021-07-09T04:55:39.858Z","msg":"Service resources started","service":"worker","address":"11.32.104.6:7239","logging-call-at":"resourceImpl.go:396"}
{"level":"info","ts":"2021-07-09T04:55:39.865Z","msg":"Get dynamic config","name":"worker.executionsScannerEnabled","value":false,"default-value":false,"logging-call-at":"config.go:79"}
{"level":"info","ts":"2021-07-09T04:55:39.865Z","msg":"Get dynamic config","name":"worker.taskQueueScannerEnabled","value":true,"default-value":true,"logging-call-at":"config.go:79"}
{"level":"info","ts":"2021-07-09T04:55:40.162Z","msg":"Started Worker","Namespace":"temporal-system","TaskQueue":"temporal-sys-tq-scanner-taskqueue-0","WorkerID":"13@temporal-worker-57b4557498-9gh26@","logging-call-at":"scanner.go:139"}
{"level":"info","ts":"2021-07-09T04:55:40.162Z","msg":"Get dynamic config","name":"worker.enableBatcher","value":true,"default-value":true,"logging-call-at":"config.go:79"}
{"level":"info","ts":"2021-07-09T04:55:40.184Z","msg":"Started Worker","Namespace":"temporal-system","TaskQueue":"temporal-sys-batcher-taskqueue","WorkerID":"13@temporal-worker-57b4557498-9gh26@","logging-call-at":"batcher.go:94"}
{"level":"info","ts":"2021-07-09T04:55:40.184Z","msg":"Get dynamic config","name":"system.enableParentClosePolicyWorker","value":true,"default-value":true,"logging-call-at":"config.go:79"}
{"level":"info","ts":"2021-07-09T04:55:40.201Z","msg":"Started Worker","Namespace":"temporal-system","TaskQueue":"temporal-sys-processor-parent-close-policy","WorkerID":"13@temporal-worker-57b4557498-9gh26@","logging-call-at":"processor.go:86"}
{"level":"info","ts":"2021-07-09T04:55:40.224Z","msg":"Started Worker","Namespace":"temporal-system","TaskQueue":"temporal-sys-add-search-attributes-task-queue","WorkerID":"13@temporal-worker-57b4557498-9gh26@","logging-call-at":"addsearchattributes.go:85"}
{"level":"info","ts":"2021-07-09T04:55:40.224Z","msg":"worker started","service":"worker","component":"worker","logging-call-at":"service.go:182"}
{"level":"info","ts":"2021-07-09T04:55:43.910Z","msg":"temporal-sys-tq-scanner-workflow workflow successfully started","service":"worker","logging-call-at":"scanner.go:186"}
We did not see this behavior in Helm deployments.
I compared the manifests and noticed that the worker pod deployment in the Helm chart does not contain a livess probe.
Is it intended that the worker pod does not perform any liveness checks?