I have enabled mtls for both internode and front-end, but now when I try to start a workflow, I get a Unable to execute workflow context deadline exceeded error. This error is not very informative, and I’m not really sure how to troubleshoot from here - Is there a way to get more verbose errors, or to diagnose the problem further?
Im using the go sdk, running on my local workstation, connecting to the front-end over a kubectl port-forwarded connection.
Im able to run some tctl commands (for example tctl wf l) using the same cert, key, and root ca
The logs on the worker pod (which is also running in k8s) show similar errors:
The fact that the error is coming from scanner makes me think that system workers are unable to connect to the frontends. Can you share how exactly you configured TLS?
I realize that the internode and front-end certs ultimately should be different, but im just trying to get it basically working first…
I have created a private CA issuer in cert manager, and for each server, generated a certificate signed by that issuer. Each server then has the relevant secret mapped to the pods in ‘/certs’, where tls.cert is the cert, tls.key is the private key, and ca.crt is the ca cert. I verified that these are mapped to the worker, and the following command runs properly from the worker:
Try adding an explicit systemWorker section similar to this example. That should configure system worker (which includes scanner) to connect to the frontend.
That didn’t seem to resolve the errors. Is there anything i can do to get more detailed error messages? Is there an environment variable to enable verbose logging or anything?
That’s why I suggested above to add a systemWorker: section to your config.
I wonder what’s the delta between my config and yours. I’m not seeing any. Even if I add (unnecessary with systemWorker:) client: section within frontend:, Temporal still starts fine for me.
Did you get this to work? I also enable MTLS and see the workers crashing with “error starting scanner”/“context deadline exceeded”. I only configured the server frontend (no internode), similar to
Note: In the case that client authentication is enabled, the internode.server certificate is used as the client certificate among services. This adds the following requirements:
The internode.server certificate must be specified on all roles, even for a frontend-only configuration.
Internode server certificates must be minted with either no Extended Key Usages or both ServerAuth and ClientAuth EKUs.
If your Certificate Authorities are untrusted, such as in the previous example, the internode server Ca will need to be specified in the following places:
internode.server.clientCaFiles
internode.client.rootCaFiles
frontend.server.clientCaFiles
and also this (cant remember where i saw is):
`client.serverName` - The server name that is validated against the server's certificate. Because Temporal connects via IP addresses and the ip addresses are ephemeral in kubernetes, we MUST set this value and it MUST match a name in the DNS section of the certificates for the relevant services.
In my case, I had to add a DNS name in my certs for the history and matching servers that matched the internode.client.serverName setting. Hope this helps…