Getting 502 (Bad Gateway) errors when setting up mTLS for Temporal client

I have a Temporal cluster running on AWS EKS. I’ve managed to successfully set up cluster ingress for the frontend service w/o mTLS.

❯ env | grep "TEMPORAL_CLI"
TEMPORAL_CLI_ADDRESS=temporal-frontend.ssokolin-test.com:443
TEMPORAL_CLI_TLS_SERVER_NAME=temporal-frontend.ssokolin-test.com
TEMPORAL_CLI_TLS_CA=/Users/ssokolin/workplace/tls-certs/mtls-ca-cert.pem
TEMPORAL_CLI_TLS_KEY=/Users/ssokolin/workplace/tls-certs/mtls-client-key.pem
TEMPORAL_CLI_TLS_CERT=/Users/ssokolin/workplace/tls-certs/mtls-client-cert.pem
TEMPORAL_CLI_SHOW_STACKS=1
❯ tctl cluster health
temporal.api.workflowservice.v1.WorkflowService: SERVING

However when I add the following server config to the config map in helm:

tls:
  frontend:
    server:
      requireClientAuth: true
      certFile: /etc/mtls/server/tls.crt
      keyFile: /etc/mtls/server/tls.key
      clientCaFiles:
        - /etc/mtls/ca/tls.crt
    client:
      serverName: temporal-frontend.ssokolin-test.com
      rootCaFiles:
        - /etc/mtls/ca/tls.crt
  worker:
    certFile: /etc/mtls/server/tls.crt
    keyFile: /etc/mtls/server/tls.key
    client:
      serverName: temporal-frontend.ssokolin-test.com
      rootCaFiles:
          - /etc/mtls/ca/tls.crt

I run into the following error:

❯ tctl cluster health
Error: Unable to get "temporal.api.workflowservice.v1.WorkflowService" health check status.
Error Details: rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 502 (Bad Gateway); transport: received unexpected content-type "text/html"
Stack trace:
goroutine 1 [running]:
runtime/debug.Stack()
	runtime/debug/stack.go:24 +0x88
runtime/debug.PrintStack()
	runtime/debug/stack.go:16 +0x20
go.temporal.io/server/tools/cli.printError({0x14000190240, 0x54}, {0x1060bc2a0, 0x14000386050})
	go.temporal.io/server/tools/cli/util.go:391 +0x238
go.temporal.io/server/tools/cli.ErrorAndExit({0x14000190240, 0x54}, {0x1060bc2a0, 0x14000386050})
	go.temporal.io/server/tools/cli/util.go:402 +0x40
go.temporal.io/server/tools/cli.HealthCheck(0x140000c5340)
	go.temporal.io/server/tools/cli/clusterCommands.go:50 +0x1ac
go.temporal.io/server/tools/cli.newClusterCommands.func1(0x140000c5340)
	go.temporal.io/server/tools/cli/cluster.go:36 +0x28
github.com/urfave/cli.HandleAction({0x105ccb660, 0x10609ace0}, 0x140000c5340)
	github.com/urfave/cli@v1.22.5/app.go:526 +0x60
github.com/urfave/cli.Command.Run({{0x1057e2eb5, 0x6}, {0x0, 0x0}, {0x14000089250, 0x1, 0x1}, {0x10581ded0, 0x20}, {0x0, ...}, ...}, ...)
	github.com/urfave/cli@v1.22.5/command.go:173 +0x610
github.com/urfave/cli.(*App).RunAsSubcommand(0x14000479880, 0x140000c5080)
	github.com/urfave/cli@v1.22.5/app.go:405 +0x99c
github.com/urfave/cli.Command.startApp({{0x1057e534b, 0x7}, {0x0, 0x0}, {0x14000089330, 0x1, 0x1}, {0x105809c25, 0x18}, {0x0, ...}, ...}, ...)
	github.com/urfave/cli@v1.22.5/command.go:372 +0x664
github.com/urfave/cli.Command.Run({{0x1057e534b, 0x7}, {0x0, 0x0}, {0x14000089330, 0x1, 0x1}, {0x105809c25, 0x18}, {0x0, ...}, ...}, ...)
	github.com/urfave/cli@v1.22.5/command.go:102 +0x7a4
github.com/urfave/cli.(*App).Run(0x140004796c0, {0x1400019e090, 0x3, 0x3})
	github.com/urfave/cli@v1.22.5/app.go:277 +0x60c
main.main()
	./main.go:37 +0x44

My mTLS config is in-line w/ the sample tls-simple config:

That is, I have one root CA which I’ve used to issue both the client and server certificate pairs. I’ve set this all up using a cert-manager CA issuer (CA | cert-manager):

Issuer Template:

apiVersion: v1
kind: Secret
metadata:
  name: mtls-ca-key-pair
  namespace: "{{ $.Release.Namespace }}"
data:
  tls.crt: <REDACTED>
  tls.key: <REDACTED>
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: mtls-ca-issuer
  namespace: "{{ $.Release.Namespace }}"
spec:
  ca:
    secretName: mtls-ca-key-pair

Certificate Templates:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: cert-manager-mtls-client-cert
  namespace: "{{ $.Release.Namespace }}"
spec:
  secretName: cert-manager-mtls-client-secret
  secretTemplate:
    annotations: {}
    labels: {}

  duration: 2160h # 90d
  renewBefore: 360h # 15d
  isCA: false
  privateKey:
    algorithm: RSA
    encoding: PKCS1
    size: 2048
  usages:
    - server auth
    - client auth
  dnsNames:
    - client.temporal-frontend.ssokolin-test.com
  issuerRef:
    name: mtls-ca-issuer
    kind: Issuer
    group: cert-manager.io
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: cert-manager-mtls-server-cert
  namespace: "{{ $.Release.Namespace }}"
spec:
  secretName: cert-manager-mtls-server-secret
  secretTemplate:
    annotations: {}
    labels: {}

  duration: 2160h # 90d
  renewBefore: 360h # 15d
  isCA: false
  privateKey:
    algorithm: RSA
    encoding: PKCS1
    size: 2048
  usages:
    - server auth
    - client auth
  dnsNames:
    - temporal-frontend.ssokolin-test.com
  issuerRef:
    name: mtls-ca-issuer
    kind: Issuer
    group: cert-manager.io

Notably, when I connect to the frontend service pod, I’m able to successfully run tctl using the certs and keys I’ve created:

❯ kubectl exec --stdin --tty temporaltest-frontend-76d4b9457c-tlpbq -- /bin/bash                                                                 
bash-5.1$ export TEMPORAL_CLI_TLS_CERT=/etc/mtls/client/tls.crt
export TEMPORAL_CLI_TLS_KEY=/etc/mtls/client/tls.key
export TEMPORAL_CLI_TLS_SERVER_NAME=temporal-frontend.ssokolin-test.com
export TEMPORAL_CLI_TLS_CA=/etc/mtls/ca/tls.crt
bash-5.1$ tctl cluster health
temporal.api.workflowservice.v1.WorkflowService: SERVING

Likewise, if I port forward the frontend service on my localhost, I’m also able to successfully run tctl:

❯ kubectl port-forward service/temporaltest-frontend 7233:7233
Forwarding from 127.0.0.1:7233 -> 7233
Forwarding from [::1]:7233 -> 7233
❯ export TEMPORAL_CLI_ADDRESS=localhost:7233
❯ env | grep "TEMPORAL_CLI"
TEMPORAL_CLI_ADDRESS=localhost:7233
TEMPORAL_CLI_TLS_SERVER_NAME=temporal-frontend.ssokolin-test.com
TEMPORAL_CLI_TLS_CA=/Users/ssokolin/workplace/tls-certs/mtls-ca-cert.pem
TEMPORAL_CLI_TLS_KEY=/Users/ssokolin/workplace/tls-certs/mtls-client-key.pem
TEMPORAL_CLI_TLS_CERT=/Users/ssokolin/workplace/tls-certs/mtls-client-cert.pem
TEMPORAL_CLI_SHOW_STACKS=1
❯ tctl cluster health
temporal.api.workflowservice.v1.WorkflowService: SERVING

Lastly, when I run:

kubectl get pods

I see that my worker pod is in a crash loop. When I check its logs, I see the following:

{"level":"fatal","ts":"2022-02-23T01:53:11.771Z","msg":"error starting scanner","service":"worker","error":"context deadline exceeded","logging-call-at":"service.go:432","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Fatal\n\t/temporal/common/log/zap_logger.go:150\ngo.temporal.io/server/service/worker.(*Service).startScanner\n\t/temporal/service/worker/service.go:432\ngo.temporal.io/server/service/worker.(*Service).Start\n\t/temporal/service/worker/service.go:340\ngo.temporal.io/server/service/worker.ServiceLifetimeHooks.func1.1\n\t/temporal/service/worker/fx.go:79"}

Any help here would be much appreciated!

Figured it out! Turns out you need to provide the --enable-ssl-passthrough flag to your NGINX ingress controller in addition to setting the nginx.ingress.kubernetes.io/ssl-passthrough: true annotation on your Ingress object. In order to add this flag to your ingress controller, run:

kubectl edit deploy -n ingress-nginx ingress-nginx-controller

and add the flag under the args spec like so:
Screen Shot 2022-02-24 at 5.39.26 PM

Relevant docs:
https://kubernetes.github.io/ingress-nginx/user-guide/tls/#ssl-passthrough
https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#ssl-passthrough

2 Likes