Trying to create new workflow after upgrading to v0.17.1. Seeing `context deadline exceeded` but no corresponding error logs in frontend container

Same issue when creating using TCTL and Go SDK. Other operations seem to be working as expected - able to list namespaces, do a health check etc. Any help would be really appreciated!

Error trace from TCTL -
Error Details: context deadline exceeded

Stack trace:


goroutine 1 [running]:

runtime/debug.Stack()

runtime/debug/stack.go:24 +0x65

runtime/debug.PrintStack()

runtime/debug/stack.go:16 +0x19

github.com/temporalio/tctl/cli_curr.printError({0x2d127a1, 0x1a}, {0x3179e40, 0xc00000fb60})

github.com/temporalio/tctl/cli_curr/util.go:392 +0x21e

github.com/temporalio/tctl/cli_curr.ErrorAndExit({0x2d127a1?, 0x24?}, {0x3179e40?, 0xc00000fb60?})

github.com/temporalio/tctl/cli_curr/util.go:403 +0x28

github.com/temporalio/tctl/cli_curr.startWorkflowHelper.func1()

github.com/temporalio/tctl/cli_curr/workflowCommands.go:225 +0x1a5

github.com/temporalio/tctl/cli_curr.startWorkflowHelper(0xc000237340, 0x0)

github.com/temporalio/tctl/cli_curr/workflowCommands.go:262 +0x557

github.com/temporalio/tctl/cli_curr.StartWorkflow(...)

github.com/temporalio/tctl/cli_curr/workflowCommands.go:173

github.com/temporalio/tctl/cli_curr.newWorkflowCommands.func3(0xc000237340?)

github.com/temporalio/tctl/cli_curr/workflow.go:57 +0x1b

github.com/urfave/cli.HandleAction({0x28a4e60?, 0x2dc54c0?}, 0x5?)

github.com/urfave/cli@v1.22.5/app.go:526 +0x50

github.com/urfave/cli.Command.Run({{0x2ce11ac, 0x5}, {0x0, 0x0}, {0x0, 0x0, 0x0}, {0x2d1ff3a, 0x1e}, {0x0, ...}, ...}, ...)

github.com/urfave/cli@v1.22.5/command.go:173 +0x652

github.com/urfave/cli.(*App).RunAsSubcommand(0xc0008bc000, 0xc000237080)

github.com/urfave/cli@v1.22.5/app.go:405 +0x91b

github.com/urfave/cli.Command.startApp({{0x2ce8a78, 0x8}, {0x0, 0x0}, {0xc000889920, 0x1, 0x1}, {0x2d0fee4, 0x19}, {0x0, ...}, ...}, ...)

github.com/urfave/cli@v1.22.5/command.go:372 +0x6e7

github.com/urfave/cli.Command.Run({{0x2ce8a78, 0x8}, {0x0, 0x0}, {0xc000889920, 0x1, 0x1}, {0x2d0fee4, 0x19}, {0x0, ...}, ...}, ...)

github.com/urfave/cli@v1.22.5/command.go:102 +0x808

github.com/urfave/cli.(*App).Run(0xc0000b9500, {0xc0001a0000, 0xc, 0xc})

github.com/urfave/cli@v1.22.5/app.go:277 +0x8a7

main.main()

./main.go:45 +0xa6

Frontend container has no corresponding error logs, but higher up in the logs, I do see something interesting, but not sure how related it is to this issue -

{"level":"info","ts":"2022-07-15T16:04:19.964Z","msg":"matching client encountered error","service":"frontend","error":"last connection error: connection closed before server preface received","service-error-type":"serviceerror.Unavailable","logging-call-at":"metric_client.go:217"}

{"level":"error","ts":"2022-07-15T16:04:20.000Z","msg":"Unable to call matching.PollActivityTaskQueue.","service":"frontend","wf-task-queue-name":"/_sys/APP-04-STAGE_TASK_QUEUE_merchantsettlement/19","timeout":"1m9.999717038s","error":"last connection error: connection closed before server preface received","logging-call-at":"workflowHandler.go:1178","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).PollActivityTaskQueue\n\t/temporal/service/frontend/workflowHandler.go:1178\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollActivityTaskQueue.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:564\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:125\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollActivityTaskQueue\n\t/temporal/service/frontend/dcRedirectionHandler.go:560\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollActivityTaskQueue_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v1.9.0/workflowservice/v1/service.pb.go:1443\ngo.temporal.io/server/common/rpc/interceptor.(*SDKVersionInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/sdk_version.go:64\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1117\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/temporal/common/authorization/interceptor.go:152\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/rate_limit.go:84\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceRateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_rate_limit.go:89\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceCountLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_count_limit.go:99\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceValidatorInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_validator.go:112\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/telemetry.go:135\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/metrics.NewServerMetricsContextInjectorInterceptor.func1\n\t/temporal/common/metrics/grpc.go:66\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1\n\t/go/pkg/mod/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc@v0.32.0/interceptor.go:325\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:135\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceLogInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_logger.go:84\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1122\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollActivityTaskQueue_Handler\n\t/go/pkg/mod/go.temporal.io/api@v1.9.0/workflowservice/v1/service.pb.go:1445\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1283\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1620\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:922"}

{"level":"error","ts":"2022-07-15T16:04:20.000Z","msg":"unavailable error","operation":"PollActivityTaskQueue","wf-namespace":"merchantsettlement","error":"last connection error: connection closed before server preface received","logging-call-at":"telemetry.go:280","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).handleError\n\t/temporal/common/rpc/interceptor/telemetry.go:280\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/telemetry.go:144\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/metrics.NewServerMetricsContextInjectorInterceptor.func1\n\t/temporal/common/metrics/grpc.go:66\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1\n\t/go/pkg/mod/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc@v0.32.0/interceptor.go:325\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:135\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceLogInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_logger.go:84\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1122\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollActivityTaskQueue_Handler\n\t/go/pkg/mod/go.temporal.io/api@v1.9.0/workflowservice/v1/service.pb.go:1445\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1283\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1620\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:922"}

{"level":"info","ts":"2022-07-15T16:04:20.018Z","msg":"matching client encountered error","service":"frontend","error":"last connection error: connection closed before server preface received","service-error-type":"serviceerror.Unavailable","logging-call-at":"metric_client.go:217"}

{"level":"info","ts":"2022-07-15T16:04:20.063Z","msg":"matching client encountered error","service":"frontend","error":"last connection error: connection closed before server preface received","service-error-type":"serviceerror.Unavailable","logging-call-at":"metric_client.go:217"}

{"level":"error","ts":"2022-07-15T16:04:20.089Z","msg":"Unable to call matching.PollActivityTaskQueue.","service":"frontend","wf-task-queue-name":"/_sys/APP-04-STAGE_TASK_QUEUE_merchantsettlement/19","timeout":"1m9.999759215s","error":"last connection error: connection closed before server preface received","logging-call-at":"workflowHandler.go:1178","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).PollActivityTaskQueue\n\t/temporal/service/frontend/workflowHandler.go:1178\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollActivityTaskQueue.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:564\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:125\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollActivityTaskQueue\n\t/temporal/service/frontend/dcRedirectionHandler.go:560\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollActivityTaskQueue_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v1.9.0/workflowservice/v1/service.pb.go:1443\ngo.temporal.io/server/common/rpc/interceptor.(*SDKVersionInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/sdk_version.go:64\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1117\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/temporal/common/authorization/interceptor.go:152\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/rate_limit.go:84\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceRateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_rate_limit.go:89\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceCountLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_count_limit.go:99\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceValidatorInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_validator.go:112\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/telemetry.go:135\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/metrics.NewServerMetricsContextInjectorInterceptor.func1\n\t/temporal/common/metrics/grpc.go:66\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1\n\t/go/pkg/mod/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc@v0.32.0/interceptor.go:325\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:135\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceLogInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_logger.go:84\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1122\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollActivityTaskQueue_Handler\n\t/go/pkg/mod/go.temporal.io/api@v1.9.0/workflowservice/v1/service.pb.go:1445\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1283\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1620\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:922"}

{"level":"error","ts":"2022-07-15T16:04:20.090Z","msg":"unavailable error","operation":"PollActivityTaskQueue","wf-namespace":"merchantsettlement","error":"last connection error: connection closed before server preface received","logging-call-at":"telemetry.go:280","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).handleError\n\t/temporal/common/rpc/interceptor/telemetry.go:280\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/telemetry.go:144\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/metrics.NewServerMetricsContextInjectorInterceptor.func1\n\t/temporal/common/metrics/grpc.go:66\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1\n\t/go/pkg/mod/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc@v0.32.0/interceptor.go:325\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:135\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceLogInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_logger.go:84\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1120\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1122\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollActivityTaskQueue_Handler\n\t/go/pkg/mod/go.temporal.io/api@v1.9.0/workflowservice/v1/service.pb.go:1445\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1283\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1620\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:922"}

{"level":"info","ts":"2022-07-15T16:04:20.135Z","msg":"matching client encountered error","service":"frontend","error":"last connection error: connection closed before server preface received","service-error-type":"serviceerror.Unavailable","logging-call-at":"metric_client.go:217"}

What persistence are you using? And what version of the server did you upgrade from?
Could you paste the tctl command that is working with list namespace, and the command that did not working with start workflow.

The log seems to indicate matching is not reachable by frontend. Could you look at logs from matching?

Persistence - Vitess MySQL

Upgraded from 1.7.0. Already done the schema migration. (to MySQL version 1.8 from 1.5)

tctl command that is working with list namespace

tctl --tls_disable_host_verification namespace list

the command that did not working with start workflow.

tctl --ns --tls_disable_host_verification workflow start --taskqueue -workflow_type

The log seems to indicate matching is not reachable by frontend. Could you look at logs from matching?

No errors in the logs in the container that’s currently running (For confidentiality reasons I can’t really share the non-error logs).

But I see that a container was terminated earlier because of the following failure - let me investigate and get back. The error trace is -
Unable to start server. Error: could not build arguments for function "go.temporal.io/server/common/pprof".LifetimeHooks (/temporal/common/pprof/fx.go:39): failed to build *pprof.PProfInitializerImpl: could not build arguments for function "go.temporal.io/server/common/pprof".NewInitializer (/temporal/common/pprof/pprof.go:56): failed to build *config.PProf: received non-nil error from function "go.temporal.io/server/temporal".ServerOptionsProvider (/temporal/temporal/fx.go:162): ringpop config validation error: ringpop config malformed broadcastAddress param

This seem to be the problem.

Also notice that, temporal server only guarantee backward compatible between 2 versions. It is required that you upgrade through every minor versions. Upgrade directly from 1.7.0 to 1.17.1 might have some unexpected result.

Okay the root cause was a missing default value for the current_version field in the current_executions table (the update schema script didn’t fully take effect for some reason :frowning: ). It showed up in the history logs. The immediate issue is solved. Will look into the ringpop and metric_client issues now.