Worker fails with Error:invalid peer certificate: UnknownIssuer

Hello,

We are exploring Temporal as a potential candidate for running our workflows, I have created certificate and keys to connect to temporal cloud using docker utility temporalio/client-certificate-generation.

This certificates work good when we use them on local machine for testing., But as soon as we deploy the solution in Kubernetes, only workflows are executed using same certificates without worker.

The workers are started as a part of application startup and it fails with following error,

TransportError: tonic::transport::Error(Transport, hyper::Error(Connect, Custom { kind: InvalidData, error: InvalidCertificateData("invalid peer certificate: UnknownIssuer") }))
[0]     at NativeConnection.connect (/usr/src/app/node_modules/@temporalio/worker/lib/connection.js:55:23)
[0]     at async run (/usr/src/app/temporal/src/worker.js:13:22)
[0] TransportError: tonic::transport::Error(Transport, hyper::Error(Connect, Custom { kind: InvalidData, error: InvalidCertificateData("invalid peer certificate: UnknownIssuer") }))
[0]     at NativeConnection.connect (/usr/src/app/node_modules/@temporalio/worker/lib/connection.js:55:23)
[0]     at async run (/usr/src/app/temporal/src/worker.js:13:22)

This results in no Workflows are able to complete the execution.

And the main problem is, if same certificate configuration works for workflows as a part of same application, why it fails for worker.

1 Like

The client and worker packages use different gRPC implementations, one in pure JS (grpc-js), the other uses a Rust client (tonic), these may have different behavior.

If you’re running in a docker container, you might want to verify that you have ca-certificates installed.
Otherwise, can you please share your connection creation code?

Hello @bergundy, thanks for the response.

Yes I have ca-certificates installed as part of docker build. Following is my DockerFile

FROM node:19.7.0-bullseye-slim

RUN apt-get -qq update && apt-get -qq -y install bzip2 fontconfig

RUN apt-get update \
    && apt-get install -y ca-certificates \
    && rm -rf /var/lib/apt/lists/*

ENV NODE_ENV production

# Create app directory
WORKDIR /usr/src/app

# Install app dependencies
COPY package*.json ./

USER root

RUN npm install -g concurrently

RUN npm install

# Bundle app source
COPY --chown=node:node  . .

EXPOSE 3000

USER node

CMD ["npm", "start"]

Following is how I am creating connection for Worker.

async function run() {
  console.log('=== Worker starting... ===');
  const address = process.env.TEMPORAL_ADDRESS;
  const namespace = process.env.TEMPORAL_NAMESPACE;
  const clientCertPath = process.env.TEMPORAL_CLIENT_CERT_PATH;
  const clientKeyPath = process.env.TEMPORAL_CLIENT_KEY_PATH;
  console.log("CERT ==== ",fs.readFileSync(clientCertPath));
  console.log("KEY ==== ",fs.readFileSync(clientKeyPath));
  const connection = await NativeConnection.connect({
    address: address,
    tls: {
      // See docs for other TLS options
      clientCertPair: {
        crt: fs.readFileSync(clientCertPath),
        key: fs.readFileSync(clientKeyPath),
      },
    },
  });

  console.log('=== Worker connected... ===');
  // Step 1: Register Workflows and Activities with the Worker and connect to
  // the Temporal server.
  const provisionWorker = await Worker.create({
    connection,
    namespace: namespace,
    workflowsPath: require.resolve('./workflows/newCustomerProvisionWorkflows'),
    activities,
    taskQueue: 'new-customer-provision-workflows',
  });
  console.log('=== Worker customer provision created... ===');

  const clientWorker = await Worker.create({
    connection,
    namespace: namespace,
    workflowsPath: require.resolve('./workflows/clientProvisionWorkflows'),
    activities,
    taskQueue: 'client-provision-workflows',
  });
  console.log('=== Worker client provision created... ===');

  // Step 2: Start accepting tasks on the `hello-world` queue

  console.log('=== Worker Started ===');
  await Promise.all([provisionWorker.run(), clientWorker.run()]);
}

The console.log statement print the certificate correctly, also the workflows are running as a part of same k8s pod and they run fine with same certificate configuration.

The certificate, key and other connection information is passed through environment variables. Please suggest if anything is wrong here.

If I run this code on local this works fine.

Based on the error message, AFAICT, the client is failing to verify the server certificate, it’s not an issue with the client certs.

You’re saying this doesn’t happen when creating a @temporalio/client Connection and when connecting from a laptop?

Yes that is correct.

Worker starts and runs flawlessly when we run from laptop.
Workflows are running flawlessly on k8s as well as local. But on k8s they keep on running as there is no worker.

One thing to try is running the container from your laptop and seeing if the same thing happens.
If it does, you might want to try a different image.
Otherwise, I’m out of ideas.

Hello @bergundy
I think I resolved this using different node image. Thanks for your suggestions. I would really like to know the best ways to put this configuration together, do you know where and whom shall I contact on Temporal side?

What do you mean by “put this configuration together”? Is there anything missing or not working for you?

I’m linking this guide here for future reference:

I am referring to what are best practices. As I am new to Temporal and want to run it in production there are questions like,

  1. How exactly we should run our workers? Shall we start them before running the workflows and terminate them after each of our workflow execution is over? Or shall we run worker as a part of application startup?

  2. Shall we maintain the one client connection across application lifecycle?

  3. What are best ways for scalability? Does scaling workers handles smooth execution? What are metrics we take into consideration to decide when to scale? For example, one worker can handle x number of workflows.

These are some of the questions we have. I am continuously searching for material around this.

hi, I think I am facing the same problem, what was the image you used?

hi, I think I am facing the same problem, what was the image you used?

Before anything else, try using node:20-bullseye as your base image.

See this doc page for details on alternatives.

1 Like