I have a Temporal cluster running on AWS EKS. I’ve managed to successfully set up cluster ingress for the frontend service w/o mTLS.
❯ env | grep "TEMPORAL_CLI"
TEMPORAL_CLI_ADDRESS=temporal-frontend.ssokolin-test.com:443
TEMPORAL_CLI_TLS_SERVER_NAME=temporal-frontend.ssokolin-test.com
TEMPORAL_CLI_TLS_CA=/Users/ssokolin/workplace/tls-certs/mtls-ca-cert.pem
TEMPORAL_CLI_TLS_KEY=/Users/ssokolin/workplace/tls-certs/mtls-client-key.pem
TEMPORAL_CLI_TLS_CERT=/Users/ssokolin/workplace/tls-certs/mtls-client-cert.pem
TEMPORAL_CLI_SHOW_STACKS=1
❯ tctl cluster health
temporal.api.workflowservice.v1.WorkflowService: SERVING
However when I add the following server config to the config map in helm:
tls:
frontend:
server:
requireClientAuth: true
certFile: /etc/mtls/server/tls.crt
keyFile: /etc/mtls/server/tls.key
clientCaFiles:
- /etc/mtls/ca/tls.crt
client:
serverName: temporal-frontend.ssokolin-test.com
rootCaFiles:
- /etc/mtls/ca/tls.crt
worker:
certFile: /etc/mtls/server/tls.crt
keyFile: /etc/mtls/server/tls.key
client:
serverName: temporal-frontend.ssokolin-test.com
rootCaFiles:
- /etc/mtls/ca/tls.crt
I run into the following error:
❯ tctl cluster health
Error: Unable to get "temporal.api.workflowservice.v1.WorkflowService" health check status.
Error Details: rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 502 (Bad Gateway); transport: received unexpected content-type "text/html"
Stack trace:
goroutine 1 [running]:
runtime/debug.Stack()
runtime/debug/stack.go:24 +0x88
runtime/debug.PrintStack()
runtime/debug/stack.go:16 +0x20
go.temporal.io/server/tools/cli.printError({0x14000190240, 0x54}, {0x1060bc2a0, 0x14000386050})
go.temporal.io/server/tools/cli/util.go:391 +0x238
go.temporal.io/server/tools/cli.ErrorAndExit({0x14000190240, 0x54}, {0x1060bc2a0, 0x14000386050})
go.temporal.io/server/tools/cli/util.go:402 +0x40
go.temporal.io/server/tools/cli.HealthCheck(0x140000c5340)
go.temporal.io/server/tools/cli/clusterCommands.go:50 +0x1ac
go.temporal.io/server/tools/cli.newClusterCommands.func1(0x140000c5340)
go.temporal.io/server/tools/cli/cluster.go:36 +0x28
github.com/urfave/cli.HandleAction({0x105ccb660, 0x10609ace0}, 0x140000c5340)
github.com/urfave/cli@v1.22.5/app.go:526 +0x60
github.com/urfave/cli.Command.Run({{0x1057e2eb5, 0x6}, {0x0, 0x0}, {0x14000089250, 0x1, 0x1}, {0x10581ded0, 0x20}, {0x0, ...}, ...}, ...)
github.com/urfave/cli@v1.22.5/command.go:173 +0x610
github.com/urfave/cli.(*App).RunAsSubcommand(0x14000479880, 0x140000c5080)
github.com/urfave/cli@v1.22.5/app.go:405 +0x99c
github.com/urfave/cli.Command.startApp({{0x1057e534b, 0x7}, {0x0, 0x0}, {0x14000089330, 0x1, 0x1}, {0x105809c25, 0x18}, {0x0, ...}, ...}, ...)
github.com/urfave/cli@v1.22.5/command.go:372 +0x664
github.com/urfave/cli.Command.Run({{0x1057e534b, 0x7}, {0x0, 0x0}, {0x14000089330, 0x1, 0x1}, {0x105809c25, 0x18}, {0x0, ...}, ...}, ...)
github.com/urfave/cli@v1.22.5/command.go:102 +0x7a4
github.com/urfave/cli.(*App).Run(0x140004796c0, {0x1400019e090, 0x3, 0x3})
github.com/urfave/cli@v1.22.5/app.go:277 +0x60c
main.main()
./main.go:37 +0x44
My mTLS config is in-line w/ the sample tls-simple config:
That is, I have one root CA which I’ve used to issue both the client and server certificate pairs. I’ve set this all up using a cert-manager CA issuer (CA | cert-manager):
Issuer Template:
apiVersion: v1
kind: Secret
metadata:
name: mtls-ca-key-pair
namespace: "{{ $.Release.Namespace }}"
data:
tls.crt: <REDACTED>
tls.key: <REDACTED>
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: mtls-ca-issuer
namespace: "{{ $.Release.Namespace }}"
spec:
ca:
secretName: mtls-ca-key-pair
Certificate Templates:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: cert-manager-mtls-client-cert
namespace: "{{ $.Release.Namespace }}"
spec:
secretName: cert-manager-mtls-client-secret
secretTemplate:
annotations: {}
labels: {}
duration: 2160h # 90d
renewBefore: 360h # 15d
isCA: false
privateKey:
algorithm: RSA
encoding: PKCS1
size: 2048
usages:
- server auth
- client auth
dnsNames:
- client.temporal-frontend.ssokolin-test.com
issuerRef:
name: mtls-ca-issuer
kind: Issuer
group: cert-manager.io
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: cert-manager-mtls-server-cert
namespace: "{{ $.Release.Namespace }}"
spec:
secretName: cert-manager-mtls-server-secret
secretTemplate:
annotations: {}
labels: {}
duration: 2160h # 90d
renewBefore: 360h # 15d
isCA: false
privateKey:
algorithm: RSA
encoding: PKCS1
size: 2048
usages:
- server auth
- client auth
dnsNames:
- temporal-frontend.ssokolin-test.com
issuerRef:
name: mtls-ca-issuer
kind: Issuer
group: cert-manager.io
Notably, when I connect to the frontend service pod, I’m able to successfully run tctl using the certs and keys I’ve created:
❯ kubectl exec --stdin --tty temporaltest-frontend-76d4b9457c-tlpbq -- /bin/bash
bash-5.1$ export TEMPORAL_CLI_TLS_CERT=/etc/mtls/client/tls.crt
export TEMPORAL_CLI_TLS_KEY=/etc/mtls/client/tls.key
export TEMPORAL_CLI_TLS_SERVER_NAME=temporal-frontend.ssokolin-test.com
export TEMPORAL_CLI_TLS_CA=/etc/mtls/ca/tls.crt
bash-5.1$ tctl cluster health
temporal.api.workflowservice.v1.WorkflowService: SERVING
Likewise, if I port forward the frontend service on my localhost, I’m also able to successfully run tctl:
❯ kubectl port-forward service/temporaltest-frontend 7233:7233
Forwarding from 127.0.0.1:7233 -> 7233
Forwarding from [::1]:7233 -> 7233
❯ export TEMPORAL_CLI_ADDRESS=localhost:7233
❯ env | grep "TEMPORAL_CLI"
TEMPORAL_CLI_ADDRESS=localhost:7233
TEMPORAL_CLI_TLS_SERVER_NAME=temporal-frontend.ssokolin-test.com
TEMPORAL_CLI_TLS_CA=/Users/ssokolin/workplace/tls-certs/mtls-ca-cert.pem
TEMPORAL_CLI_TLS_KEY=/Users/ssokolin/workplace/tls-certs/mtls-client-key.pem
TEMPORAL_CLI_TLS_CERT=/Users/ssokolin/workplace/tls-certs/mtls-client-cert.pem
TEMPORAL_CLI_SHOW_STACKS=1
❯ tctl cluster health
temporal.api.workflowservice.v1.WorkflowService: SERVING
Lastly, when I run:
kubectl get pods
I see that my worker pod is in a crash loop. When I check its logs, I see the following:
{"level":"fatal","ts":"2022-02-23T01:53:11.771Z","msg":"error starting scanner","service":"worker","error":"context deadline exceeded","logging-call-at":"service.go:432","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Fatal\n\t/temporal/common/log/zap_logger.go:150\ngo.temporal.io/server/service/worker.(*Service).startScanner\n\t/temporal/service/worker/service.go:432\ngo.temporal.io/server/service/worker.(*Service).Start\n\t/temporal/service/worker/service.go:340\ngo.temporal.io/server/service/worker.ServiceLifetimeHooks.func1.1\n\t/temporal/service/worker/fx.go:79"}
Any help here would be much appreciated!