Tctl won't use TLS config specified in env

Hi, I’ve been trying to get tctl to talk to the frontend server over mTLS. I followed the instructions at samples-server/tls/tls-full at main · temporalio/samples-server · GitHub to generate the mTLS certs.

I can connect to Temporal using mTLS using a Go client so the certs are working. But for some reason when I exec into admintools container and try to use tctl, I keep getting an error that likely occurs if no certs are specified when initiating connection.

I’m using Temporal 1.17.0 and also tried with 1.16.2.

bash-5.1# tctl cluster health
Error: Unable to get "temporal.api.workflowservice.v1.WorkflowService" health check status.
Error Details: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: tls: first record does not look like a TLS handshake"
('export TEMPORAL_CLI_SHOW_STACKS=1' to see stack traces)

The following are the env vars for admintools container:

bash-5.1# echo $TEMPORAL_CLI_ADDRESS
temporal-frontend:7233
bash-5.1# echo $TEMPORAL_CLI_TLS_SERVER_NAME
interservice.server.temporal.contoso.com
bash-5.1# echo $TEMPORAL_CLI_TLS_DISABLE_HOST_VERIFICATION
false
bash-5.1# echo $TEMPORAL_CLI_TLS_CERT
/var/secrets/temporal/certs/server-interservice/tls.crt
bash-5.1# echo $TEMPORAL_CLI_TLS_KEY
/var/secrets/temporal/certs/server-interservice/tls.key
bash-5.1# echo $TEMPORAL_CLI_TLS_CA
/var/secrets/temporal/certs/server-intermediate-ca/tls.crt
bash-5.1#

Are these env vars incorrect? Any reason for tctl to not pick them up?

Thanks!

Hi, did you set these env vars yourself? Ran the tls-full sample locally and here are the preset vars on temporalio/admin-tools container:

docker exec -it 3ad9e2dc9b6d bash

bash-5.1# tctl cl h

creating config dir: /root/.config/temporalio

creating config file: /root/.config/temporalio/tctl.yaml

temporal.api.workflowservice.v1.WorkflowService: SERVING

bash-5.1# echo $TEMPORAL_CLI_ADDRESS
temporal:7233
bash-5.1# echo $TEMPORAL_CLI_TLS_SERVER_NAME
internode.cluster-x.contoso.com
bash-5.1# echo $TEMPORAL_CLI_TLS_DISABLE_HOST_VERIFICATION

bash-5.1# echo $TEMPORAL_CLI_TLS_CERT
/etc/temporal/config/certs/cluster/internode/cluster-internode.pem
bash-5.1# echo $TEMPORAL_CLI_TLS_KEY
/etc/temporal/config/certs/cluster/internode/cluster-internode.key
bash-5.1# echo $TEMPORAL_CLI_TLS_CA
/etc/temporal/config/certs/cluster/ca/server-intermediate-ca.pem

also which admintools container did you run these commands from? I ran from samples-server/docker-compose.yml at main · temporalio/samples-server · GitHub but did the same from the other two and it worked ok as well.

Thanks @tihomir for your response and checking the TLS setup on compose. I’ve verified that I can get it working in the compose setup.

However the deployment method I’m using here is Helm charts and I’ve updated the admintools-deployment.yaml to pick up additional envs from values.

          env:
            - name: TEMPORAL_CLI_ADDRESS
              value: {{ include "temporal.fullname" . }}-frontend:{{ include "temporal.frontend.grpcPort" . }}
          {{- if $.Values.admintools.additionalEnv }}
          {{- toYaml $.Values.admintools.additionalEnv | nindent 12}}
          {{- end }}

And then in values:

admintools:
  additionalEnv:
    - name: TEMPORAL_CLI_TLS_SERVER_NAME
      value: interservice.server.temporal.contoso.com
    - name: TEMPORAL_CLI_TLS_DISABLE_HOST_VERIFICATION
      value: "false"
    - name: TEMPORAL_CLI_TLS_CERT
      value: /var/secrets/temporal/certs/server-interservice/tls.crt
    - name: TEMPORAL_CLI_TLS_KEY
      value: /var/secrets/temporal/certs/server-interservice/tls.key
    - name: TEMPORAL_CLI_TLS_CA
      value: /var/secrets/temporal/certs/server-intermediate-ca/tls.crt

The certs are loaded from secrets using volume/mounts and they are loading correctly.

Finally this is the TLS config in the values:

server:
  config:
    global:
      tls:
        internode:
          server:
            certFile: /var/secrets/temporal/certs/server-interservice/tls.crt
            keyFile: /var/secrets/temporal/certs/server-interservice/tls.key
            requireClientAuth: true
            clientCaFiles:
              - /var/secrets/temporal/certs/server-intermediate-ca/tls.crt
          client:
            serverName: interservice.server.temporal.contoso.com
            rootCaFiles:
              - /var/secrets/temporal/certs/server-intermediate-ca/tls.crt
        frontend:
          server:
            certFile: /var/secrets/temporal/certs/server-interservice/tls.crt
            keyFile: /var/secrets/temporal/certs/server-interservice/tls.key
            requireClientAuth: true
            clientCaFiles:
              - /var/secrets/temporal/certs/server-intermediate-ca/tls.crt
          client:
            serverName: interservice.server.temporal.contoso.com
            rootCaFiles:
              - /var/secrets/temporal/certs/server-intermediate-ca/tls.crt

I didn’t include all this info in the original post because I thought regardless of how I deploy, tctl should be able to pick up the env vars and connect. But now I’m starting to think the TLS global config is not being picked up by Temporal frontend from the values file. I have a similar Helm/mTLS setup working in 1.14.2 but trying to upgrade.

Thanks for the detailed info. Will try to set this up locally as well given your configs and see if can reproduce and see what could be going on/what to advice.
There is an open issue to add tls config to helm charts repo just fyi.

Thanks @tihomir I was just able to figure it out. In the newer Helm chart version, config is read from server-configmap.yaml, so I updated that file to read mTLS config from values:

    global:
      membership:
        name: temporal
        maxJoinDuration: 30s
        broadcastAddress: {{ `{{ default .Env.POD_IP "0.0.0.0" }}` }}

      {{- if $.Values.server.config.global.tls }}
      tls:
        {{- toYaml $.Values.server.config.global.tls | nindent 8}}
      {{- end }}

Now it works as expected:

$ kc exec -it temporal-admintools-7d5f8698b-68w5t -n temporal -- bash
bash-5.1# tctl cluster health
temporal.api.workflowservice.v1.WorkflowService: SERVING

For anyone else in the community setting up mTLS with Helm charts, I also have additionalEnvs applied for web:

web:
  additionalEnv:
    - name: TEMPORAL_TLS_SERVER_NAME
      value: interservice.server.temporal.contoso.com
    - name: TEMPORAL_TLS_ENABLE_HOST_VERIFICATION
      value: "true"
    - name: TEMPORAL_TLS_CERT_PATH
      value: /var/secrets/temporal/certs/server-interservice/tls.crt
    - name: TEMPORAL_TLS_KEY_PATH
      value: /var/secrets/temporal/certs/server-interservice/tls.key
    - name: TEMPORAL_TLS_CA_PATH
      value: /var/secrets/temporal/certs/server-intermediate-ca/tls.crt

And then in web-deployment.yaml:

      containers:
        - name: {{ .Chart.Name }}-web
          image: "{{ .Values.web.image.repository }}:{{ .Values.web.image.tag }}"
          imagePullPolicy: {{ .Values.web.image.pullPolicy }}
          env:
            - name: TEMPORAL_ADDRESS
              value: "{{ include "temporal.fullname" . }}-frontend.{{ .Release.Namespace }}.svc:{{ .Values.server.frontend.service.port }}"
          {{- if $.Values.web.additionalEnv }}
          {{- toYaml $.Values.web.additionalEnv | nindent 12}}
          {{- end }}
          {{- with $.Values.admintools.volumeMounts }}

I’ll try to find some time to package these up into a Helm chart PR that gets the chart up to speed with mTLS.

Thanks again for your kind help @tihomir.

1 Like

@tareque thanks a lot for posting your info! This will definitely help a lot of people in community.

Some more update with the Helm chart TLS situation, it seems that the worker TLS config changed somewhat in a recent update and although admintools now works, I keep getting the following error when connecting a client to the Helm based frontend.

This is happening for clients connecting to server names in the hostOverride section.

failed reaching server: last connection error: connection closed before server preface received

I tried adding the new systemWorker section to the TLS config with no luck. The docker compose setup works just fine with the same set of certs for host overrides, and without specifying a separate system worker cert config. However the compose does not differentiate between frontend/worker.

Finally got everything working with Temporal 1.17.0 + Helm + mTLS.

I had some CA cert mismatch when creating K8s secrets (to load the certs as volumes) for the deployments that was causing the above issue.