Extending the custom authorisation (JWT) to the System Worker

Hi, my team is building the internal multi-tenant platform for the microservices orchestration and I’m currently looking at options in getting the authorisation done via the JWT access token. I hope someone can help bringing more clarity to the matter and see if this feature is worth adding.

We are using the default claim mapper which does a good job for the Client to Frontend access scenario. At the moment we haven’t yet added the mTLS as this would complicate the onboarding story as there will be a number of internal teams using the solution in a long run which makes the certificate distribution possible but complex.

What is not entirely clear is how to deal with the System Worker as even with the internode running on the mTLS System Worker is still going to go via the Frontend (all points to that) and is going to be a subject to the same Authorisation policy and this is the same likely reason on why the default authoriser have exclude rule on the temporal-system namespace with a comment don’t do this in PROD.

Options considered so-far:

If we combine TLS and JWT pipelines in the claim mapper and assign the claim based on the well known certificate attribute of the System Worker without enabling the mTLS for everything then we are opening Frontend up to the exploit cause fake cert with the same attributes will pass the auth as client certs are not getting validated. I guess the only way for that story to happen is to enable mTLS and issue client certificates and internode certificates and have a client cert validation which would give the platform team control over the certificate attributes so nobody could pretend to be a system worker.

Integrate JWT flow into the Temporal server. I’m not an expert nor know the code base quite well and there could be a much simple way to do that… anyway what I could come up with is adding the logic via WithHeaderProvider enhancement similar way it is done for the Claim mapper and extending the newBootstrapParams client option with a HeadersProvider. This would enable inserting a custom Frontend authorisation logic into the System Worker so there won’t be a need in getting mTLS enabled as all workers are going to be using the same authorisation pipeline.

Which one of those stories makes more sense? Is there a better way?

If we combine TLS and JWT pipelines in the claim mapper and assign the claim based on the well known certificate attribute of the System Worker without enabling the mTLS for everything then we are opening Frontend up to the exploit cause fake cert with the same attributes will pass the auth as client certs are not getting validated.

Instead of using a well-known certificate, the approach here is to configure the server endpoint with a client CA cert, so that only certificates issued by that CA will be accepted by the server. That way you are not open to any exploits, as long as you keep the CA private key a secret.
Maybe that’s exactly what you meant by the next sentence? Can you clarify?

I guess the only way for that story to happen is to enable mTLS and issue client certificates and internode certificates and have a client cert validation which would give the platform team control over the certificate attributes so nobody could pretend to be a system worker.

what I could come up with is adding the logic via WithHeaderProvider enhancement similar way it is done for the Claim mapper and extending the newBootstrapParams client option with a HeadersProvider.

You mean change the system worker code to inject additional http headers for authorization? That should work I think.

Which one of those stories makes more sense? Is there a better way?

Here’s another alternative to consider. I don’t remember if we tested enabling TLS just for frontend and not for internode. I would try that.

  1. Enable TLS for the frontend, but turn off RequireClientAuth. This will make it TLS instead of mTLS. You can even set DisableHostVerification in the client settings if you trust your infra environment.
  2. Configure a separate host override for system workers (via hostOverrides).
  3. Configure SystemWorker TLS settings to match the host override config.
  4. Create a custom authorizer to handle tokens and certificates the way you need.

Thanks Sergey,

By well-known certificate I refer to the cert + key which is distributed to all the clients as part of the onboarding and packaged with the worker runtime. In this case we can enable mTLS client auth for the authorisation flow saying we trust that client to call the Frontend, Client will also have to supply authorisation JWT token to get to the next stage. Internode certificate is going to have a different claim to the client cert so we could implement the claim mapper logic in a way it only trust the TLS claim from the internode cert where the client certs have to provide JWT or get denied. Getting certificates distributed and rotated would be an overhead as we would have to come up with a geo-distributed system for that, which we would do if we can’t find another way…

Here’s another alternative to consider. I don’t remember if we tested enabling TLS just for frontend and not for internode. I would try that.

  1. Enable TLS for the frontend, but turn off RequireClientAuth . This will make it TLS instead of >mTLS. You can even set DisableHostVerification in the client settings if you trust your infra >environment.
  2. Configure a separate host override for system workers (via hostOverrides ).
  3. Configure SystemWorker TLS settings to match the host override config.
  4. Create a custom authorizer to handle tokens and certificates the way you need.

I was also thinking about that option but was not 100% sure if this would achieve the goal. Do you know what would happen if the client auth is not required and a client provides the client certificate regardless, will authInfo.TLSConnection be nil in this case? If authInfo.TLSConnection==nil when RequireClientAuth=false then the Frontend hostOverride setup will enable us to split the External and Internal flows and map a System worker claim accordingly to the TLS cert attribute leaving the rest to the JWT and rejecting all the cases without the JWT or TLS.

Do you know what would happen if the client auth is not required and a client provides the client certificate regardless, will authInfo.TLSConnection be nil in this case?

I believe this should work. There was a recent PR that added a ForceTLS flag to client settings.

If authInfo.TLSConnection==nil when RequireClientAuth=false then the Frontend hostOverride setup will enable us to split the External and Internal flows and map a System worker claim accordingly to the TLS cert attribute leaving the rest to the JWT and rejecting all the cases without the JWT or TLS.

I think this is worth a try.

Thanks Sergey, I’ll give it a go (probably this week) as this may be the least invasive way of implementing the feature. Will post back if this does the job or not.

1 Like

Here are some results as promised

With a setup below all the connection defaults are going to go via mTLS which would include the SystemWorker as well. Claim mapper then trigger with TLS authInfo and internode can be granted the admin claim.

Public clients are going to call Temporal via the frontend FQDN which would send them to the non mTLS path, all public clients have to supply authorization header in order to receive the claim. In such case claim mapper will be triggered with the JWT path.

      frontend:
          server:
              requireClientAuth: true
              certFile: ./certificates/internode/cluster-internode.pem
              keyFile: ./certificates/internode/cluster-internode.key
              clientCaFiles:
                - ./certificates/internode/server-intermediate-ca.pem
          hostOverrides:
              frontend.dev-westeurope.purplesea.maersk.com:
                  requireClientAuth: false
                  certFile: ./certificates/frontend/cluster-internode.pem
                  keyFile: ./certificates/frontend/cluster-internode.key
                  clientCaFiles:
                      - ./certificates/frontend/server-intermediate-ca.pem
      systemWorker:
          certFile: ./certificates/internode/cluster-internode.pem
          keyFile: ./certificates/internode/cluster-internode.key
          client:
              serverName: internode.dev-westeurope.purplesea.maersk.com
              rootCaFiles:
                  - ./certificates/internode/server-intermediate-ca.pem

I’m yet to test the scenario when authorization header is provided together with the client TLS certificate to see what the claim mapper is going to do.

When requireClientAuth: false it does not matter if the client cert is provided or not provided authInfo.TLSSubject is going to be nil. For the default authorizer it will result in the system crash so I had to add another check for the TLSSubject to handle the scenario like below

if authInfo.TLSConnection != nil && authInfo.TLSSubject != nil {
		// Add claims based on client's TLS certificate
		claims.Subject = authInfo.TLSSubject.CommonName
	}

Otherwise it is looking good and we can split the public path which is configured with the normal TLS and provides JWT for the Authorization and Internode path which requires mTLS.

Hi,
I’m trying to implement a scenario using the steps you seggested:

  1. enable TLS for frontend and turn off RequireClientAuth
  2. I try to use the tctl cli with option --tls_disable_host_verification
  3. I implement a custom authorizer with JWT

I obtain the tctl error (the request is blocked by server):

Error Details: rpc error: code = Unavailable desc = connection closed before server preface received

If instead I use client certificates in tctl, then the same request pass successfully.

As I understood should be possible for server to accept all incoming requests (is the custom authorizer to understand if mTLS is present and if a JWT header is present).

Is this correct?
Thanks for your support

hi, it may or may not be the issue you are facing but tctl is not using the tls by default and you have to use at least one flag to toggle the tls ON (disable host verufy is not one of the flags which can activate the tls) - Toggle the TLS configuration based on the Frontend URL format · Issue #2290 · temporalio/temporal · GitHub