Worker TLS errors - "first record does not look like a TLS handshake"

Hi there,

We are seeing this error continuously in the worker service:

encounter error when describing the mutable state

last connection error: connection error: desc = "transport: authentication handshake failed: tls: first record does not look like a TLS handshake"

Sometimes the message also contains history client encountered error.

We have these certs/vars set:

"TEMPORAL_TLS_REQUIRE_CLIENT_AUTH"      = "true"
"TEMPORAL_TLS_CLIENT1_CA_CERT"          = "/temporal/ca.cert"
"TEMPORAL_TLS_CLIENT2_CA_CERT"          = "/temporal/ca.cert"
"TEMPORAL_TLS_FRONTEND_CERT"            = "/temporal/cluster.pem"
"TEMPORAL_TLS_FRONTEND_KEY"             = "/temporal/cluster.key"
"TEMPORAL_TLS_FRONTEND_SERVER_NAME"     = "temporal-frontend.${var.dns_suffix}"
"TEMPORAL_CLI_TLS_CA"                   = "/temporal/ca.cert"
"TEMPORAL_CLI_TLS_CERT"                 = "/temporal/cluster.pem"
"TEMPORAL_CLI_TLS_KEY"                  = "/temporal/cluster.key"
"TEMPORAL_TLS_ENABLE_HOST_VERIFICATION" = "true"
"TEMPORAL_TLS_SERVER_NAME"              = "temporal.${var.dns_suffix}"
"TEMPORAL_TLS_SERVER_CA_CERT" = "/temporal/ca.cert"
"TEMPORAL_TLS_SERVER_CERT"    = "/temporal/cluster.pem"
"TEMPORAL_TLS_SERVER_KEY"     = "/temporal/cluster.key"
"TEMPORAL_TLS_CA"            = "/temporal/ca.cert"
"TEMPORAL_TLS_CERT"          = "/temporal/cluster.pem"
"TEMPORAL_TLS_KEY"           = "/temporal/cluster.key"

Everything seems to be working fine so not sure what its complaining about.

I generated the certs with the instructions/script here

Any suggestions? :slight_smile:

Stacktrace:

go.temporal.io/server/common/log.(*zapLogger).Error
	/home/builder/temporal/common/log/zap_logger.go:150
go.temporal.io/server/service/worker/scanner/history.(*Scavenger).handleTask
	/home/builder/temporal/service/worker/scanner/history/scavenger.go:284
go.temporal.io/server/service/worker/scanner/history.(*Scavenger).taskWorker
	/home/builder/temporal/service/worker/scanner/history/scavenger.go:214

:thinking: is it better to just set TEMPORAL_TLS_CERTS_DIR?

I saw that is an option - does that work for all services and the UI?

I am facing similar issues :smile: any options available or more cert environment variables needed for Temporal Servers ?

One thing I’ve noticed with this, these seem to happen like clockwork every 12 hours (5am and 5pm PST), and lasts for 3-4 hours.

Still have no idea what’s causing it though :thinking:

I get nothing if I search for the workflowId either, makes me wonder if its something internal.