Temporal worker is unable to connect to Cluster

I am using Typescript SDK & docker-compose TLS set-up.

I initially created certs by running “generate-test-certs.sh” present in “tls-simple” project.

I am using “NativeConnection” in worker:

    const connection = await NativeConnection.connect(temporalConnectionOptions);
    
    const worker = await Worker.create({
      connection,
      workflowsPath: require.resolve('./workflows'),
      activities,
      taskQueue
    });

Temporal worker is running inside a docker container whose entry is present in the same docker-compose file as the temporal cluster.

Here is the value of “temporalConnectionOptions”

{
  address: 'temporal:7233',
  tls: {
    serverNameOverride: 'tls-sample',
    serverRootCACertificate: <Buffer 2d 2d 2d 2d 2d 42 45 47 49 4e 20 43 45 52 54 49 46 49 43 41 54 45 2d 2d 2d 2d 2d 0a 4d 49 49 46 52 54 43 43 41 79 32 67 41 77 49 42 41 67 49 55 61 4a ... 1837 more bytes>,
    clientCertPair: {
      crt: <Buffer 2d 2d 2d 2d 2d 42 45 47 49 4e 20 43 45 52 54 49 46 49 43 41 54 45 2d 2d 2d 2d 2d 0a 4d 49 49 46 65 6a 43 43 41 32 4b 67 41 77 49 42 41 67 49 55 59 73 ... 1910 more bytes>,
      key: <Buffer 2d 2d 2d 2d 2d 42 45 47 49 4e 20 50 52 49 56 41 54 45 20 4b 45 59 2d 2d 2d 2d 2d 0a 4d 49 49 4a 51 67 49 42 41 44 41 4e 42 67 6b 71 68 6b 69 47 39 77 ... 3222 more bytes>
    }
  }
}

Output of “docker ps” command:

IMAGE                           COMMAND                 PORTS                                                                      NAMES
piucd-workflows:v1.0.0          "docker-entrypoint.s…"  0.0.0.0:4000->4000/tcp                                                     piucd-workflows
temporalio/ui:latest            "./start-ui-server.sh"  0.0.0.0:8080->8080/tcp                                                     temporal-ui
temporalio/admin-tools:latest   "tini -- sleep infin…"                                                                             temporal-admin-tools
temporalio/auto-setup:latest    "/etc/temporal/entry…"  6933-6935/tcp, 6939/tcp, 7234-7235/tcp, 7239/tcp, 0.0.0.0:7233->7233/tcp   temporal
cassandra:3.11                  "docker-entrypoint.s…"  7000-7001/tcp, 7199/tcp, 9160/tcp, 0.0.0.0:9042->9042/tcp                  cassandra

Here is the Node.js service related Dockerfile. The Node.js service has the temporal worker which tries to connect to temporal cluser.

FROM node:16.20.2 AS build

WORKDIR /app

COPY mw-workflows/package*.json ./
COPY mw-commons ./mw-commons
RUN npm install 

COPY mw-workflows . 
EXPOSE 4000
CMD npm start

Below is part of my docker-compose file:


    temporal:
        container_name: temporal
        image: temporalio/auto-setup:${SERVER_TAG:-latest}
        environment:
            - "CASSANDRA_SEEDS=cassandra"
            - "DYNAMIC_CONFIG_FILE_PATH=config/dynamicconfig/development.yaml"
            - "SKIP_DEFAULT_NAMESPACE_CREATION=true"
            - "TEMPORAL_TLS_SERVER_CA_CERT=${TEMPORAL_TLS_CERTS_DIR}/ca.cert"
            - "TEMPORAL_TLS_SERVER_CERT=${TEMPORAL_TLS_CERTS_DIR}/cluster.pem"
            - "TEMPORAL_TLS_SERVER_KEY=${TEMPORAL_TLS_CERTS_DIR}/cluster.key"
            - "TEMPORAL_TLS_REQUIRE_CLIENT_AUTH=true"
            - "TEMPORAL_TLS_FRONTEND_CERT=${TEMPORAL_TLS_CERTS_DIR}/cluster.pem"
            - "TEMPORAL_TLS_FRONTEND_KEY=${TEMPORAL_TLS_CERTS_DIR}/cluster.key"
            - "TEMPORAL_TLS_CLIENT1_CA_CERT=${TEMPORAL_TLS_CERTS_DIR}/ca.cert"
            - "TEMPORAL_TLS_CLIENT2_CA_CERT=${TEMPORAL_TLS_CERTS_DIR}/ca.cert"
            - "TEMPORAL_TLS_INTERNODE_SERVER_NAME=tls-sample"
            - "TEMPORAL_TLS_FRONTEND_SERVER_NAME=tls-sample"
            - "TEMPORAL_TLS_FRONTEND_DISABLE_HOST_VERIFICATION=false"
            - "TEMPORAL_TLS_INTERNODE_DISABLE_HOST_VERIFICATION=false"
            - "TEMPORAL_CLI_ADDRESS=temporal:7233" # used by tctl. Will be deprecated
            - "TEMPORAL_CLI_TLS_CA=${TEMPORAL_TLS_CERTS_DIR}/ca.cert"
            - "TEMPORAL_CLI_TLS_CERT=${TEMPORAL_TLS_CERTS_DIR}/cluster.pem"
            - "TEMPORAL_CLI_TLS_KEY=${TEMPORAL_TLS_CERTS_DIR}/cluster.key"
            - "TEMPORAL_CLI_TLS_ENABLE_HOST_VERIFICATION=true"
            - "TEMPORAL_CLI_TLS_SERVER_NAME=tls-sample"
            - "TEMPORAL_ADDRESS=temporal:7233" # used by Temporal CLI
            - "TEMPORAL_TLS_CA=${TEMPORAL_TLS_CERTS_DIR}/ca.cert"
            - "TEMPORAL_TLS_CERT=${TEMPORAL_TLS_CERTS_DIR}/cluster.pem"
            - "TEMPORAL_TLS_KEY=${TEMPORAL_TLS_CERTS_DIR}/cluster.key"
            - "TEMPORAL_TLS_ENABLE_HOST_VERIFICATION=true"
            - "TEMPORAL_TLS_SERVER_NAME=tls-sample"
        ports:
            - 7233:7233
        restart: on-failure
        depends_on:
            - cassandra
        networks:
            - temporal-network
        volumes:
            - ${DYNAMIC_CONFIG_DIR:-./dynamicconfig}:/etc/temporal/config/dynamicconfig
            - ${TEMPORAL_LOCAL_CERT_DIR}:${TEMPORAL_TLS_CERTS_DIR}

However, I get below error message:

TransportError: tonic::transport::Error(Transport, hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })))
    at Function.connect (/app/node_modules/@temporalio/worker/src/connection.ts:51:15)
    at async Object.run (/app/src/temporal/agents/dnac/worker.ts:19:24)

Could you please help me understand the issue & identify a fix?

The error message you are getting in your worker indicates that the TCP connection was refused. That means that the TCP port is not open. That specific error message wouldn’t generally appear for a TLS-level error. Or at least, not unless the server turns out refusing to listen on that port because it was unable to load the cert or key. Look for error messages in your temporal container’s log that would indicate a problem with your frontend node.

I get the feeling however that the error you’re getting is actually due to the fact that both your temporal and piucd-workflow containers starts at the same time, and the piucd-workflow starter tries to connect to the server before the server is ready to accept connections. Try delaying starting your worker by a few seconds.

1 Like

You are spot on @jwatkins! The issue was really due to the fact that I had my Node.js service in the same docker-compose and I wasn’t waiting at all before trying to connect to Temporal cluster. I will use wait-for-it.sh and wait for few seconds before starting my Node.js container.

Thanks a lot and I really appreciate your help, thank you!