Context deadline exceeded when trying to start workflow (v1.7.1)

Hi all, our temporal worker is currently spewing errors related to a failure to connect to the temporal frontend service. Our temporal cluster and worker are both deployed on kubernetes and this problem pops up often but usually doesn’t fully block workflow execution like it is now.

The errors from the java worker:

15:39:12.389 [Activity Poller taskQueue="singer-activity-task-list", namespace="singer-activity-namespace": 1] WARN  io.temporal.internal.worker.Poller - Failure in thread Activity Poller taskQueue="singer-activity-task-list", namespace="singer-activity-namespace": 1
io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 69.999511231s. [remote_addr=temporalio-frontend-headless.temporalio-prod.svc/172.20.177.44:7233]
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
	at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.pollActivityTaskQueue(WorkflowServiceGrpc.java:2683)
	at io.temporal.internal.worker.ActivityPollTask.poll(ActivityPollTask.java:105)
	at io.temporal.internal.worker.ActivityPollTask.poll(ActivityPollTask.java:39)
	at io.temporal.internal.worker.Poller$PollExecutionTask.run(Poller.java:265)
	at io.temporal.internal.worker.Poller$PollLoopTask.run(Poller.java:241)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)```

And from the frontend service:

{"level":"error","ts":"2022-03-23T15:39:19.028Z","msg":"PollWorkflowTaskQueue failed.","service":"frontend","wf-task-queue-name":"temporalio-worker-68b54d9db5-n7rm8:8894b8cf-4d18-4387-a661-a72924b2781a","value":"1m9.999762486s","error":"context deadline exceeded","logging-call-at":"workflowHandler.go:783","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:136\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).PollWorkflowTaskQueue\n\t/temporal/service/frontend/workflowHandler.go:783\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollWorkflowTaskQueue.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:532\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:118\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollWorkflowTaskQueue\n\t/temporal/service/frontend/dcRedirectionHandler.go:528\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollWorkflowTaskQueue_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1084\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/temporal/common/authorization/interceptor.go:136\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceCountLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_count_limit.go:84\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceRateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_rate_limit.go:85\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/rate_limit.go:79\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/telemetry.go:91\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:100\ngoogle.golang.org/grpc.chainUnaryServerInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1044\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollWorkflowTaskQueue_Handler\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1086\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1217\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1540\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:878"}

{"level":"error","ts":"2022-03-23T15:39:19.450Z","msg":"PollWorkflowTaskQueue failed.","service":"frontend","wf-task-queue-name":"temporalio-worker-68b54d9db5-n7rm8:7be3954b-01e4-4fc0-a2d6-ce997096579a","value":"1m9.99972652s","error":"context deadline exceeded","logging-call-at":"workflowHandler.go:783","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:136\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).PollWorkflowTaskQueue\n\t/temporal/service/frontend/workflowHandler.go:783\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollWorkflowTaskQueue.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:532\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:118\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollWorkflowTaskQueue\n\t/temporal/service/frontend/dcRedirectionHandler.go:528\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollWorkflowTaskQueue_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1084\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/temporal/common/authorization/interceptor.go:136\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceCountLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_count_limit.go:84\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceRateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_rate_limit.go:85\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/rate_limit.go:79\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/telemetry.go:91\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:100\ngoogle.golang.org/grpc.chainUnaryServerInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1044\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollWorkflowTaskQueue_Handler\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1086\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1217\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1540\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:878"}

{"level":"error","ts":"2022-03-23T15:39:20.433Z","msg":"PollActivityTaskQueue failed.","service":"frontend","wf-task-queue-name":"/_sys/temporal-sys-processor-parent-close-policy/3","value":"1m9.999747836s","error":"context deadline exceeded","logging-call-at":"workflowHandler.go:1045","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:136\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).PollActivityTaskQueue\n\t/temporal/service/frontend/workflowHandler.go:1045\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollActivityTaskQueue.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:502\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:118\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollActivityTaskQueue\n\t/temporal/service/frontend/dcRedirectionHandler.go:498\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollActivityTaskQueue_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1138\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/temporal/common/authorization/interceptor.go:136\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceCountLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_count_limit.go:84\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceRateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_rate_limit.go:85\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/rate_limit.go:79\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/telemetry.go:91\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:100\ngoogle.golang.org/grpc.chainUnaryServerInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1044\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollActivityTaskQueue_Handler\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1140\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1217\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1540\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:878"}

{"level":"error","ts":"2022-03-23T15:39:21.030Z","msg":"PollWorkflowTaskQueue failed.","service":"frontend","wf-task-queue-name":"/_sys/temporal-sys-processor-parent-close-policy/1","value":"1m9.999741537s","error":"context deadline exceeded","logging-call-at":"workflowHandler.go:783","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:136\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).PollWorkflowTaskQueue\n\t/temporal/service/frontend/workflowHandler.go:783\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollWorkflowTaskQueue.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:532\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:118\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollWorkflowTaskQueue\n\t/temporal/service/frontend/dcRedirectionHandler.go:528\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollWorkflowTaskQueue_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1084\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/temporal/common/authorization/interceptor.go:136\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceCountLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_count_limit.go:84\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceRateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_rate_limit.go:85\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/rate_limit.go:79\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/telemetry.go:91\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:100\ngoogle.golang.org/grpc.chainUnaryServerInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1044\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollWorkflowTaskQueue_Handler\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1086\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1217\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1540\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:878"}

{"level":"error","ts":"2022-03-23T15:39:21.232Z","msg":"PollWorkflowTaskQueue failed.","service":"frontend","wf-task-queue-name":"temporal-sys-batcher-taskqueue","value":"1m9.999763481s","error":"context deadline exceeded","logging-call-at":"workflowHandler.go:783","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:136\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).PollWorkflowTaskQueue\n\t/temporal/service/frontend/workflowHandler.go:783\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollWorkflowTaskQueue.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:532\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:118\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollWorkflowTaskQueue\n\t/temporal/service/frontend/dcRedirectionHandler.go:528\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollWorkflowTaskQueue_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1084\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/temporal/common/authorization/interceptor.go:136\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceCountLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_count_limit.go:84\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceRateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_rate_limit.go:85\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/rate_limit.go:79\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/telemetry.go:91\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:100\ngoogle.golang.org/grpc.chainUnaryServerInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1044\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollWorkflowTaskQueue_Handler\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1086\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1217\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1540\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:878"}

{"level":"error","ts":"2022-03-23T15:39:21.427Z","msg":"PollActivityTaskQueue failed.","service":"frontend","wf-task-queue-name":"/_sys/temporal-sys-processor-parent-close-policy/1","value":"1m9.999735015s","error":"context deadline exceeded","logging-call-at":"workflowHandler.go:1045","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:136\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).PollActivityTaskQueue\n\t/temporal/service/frontend/workflowHandler.go:1045\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollActivityTaskQueue.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:502\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:118\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollActivityTaskQueue\n\t/temporal/service/frontend/dcRedirectionHandler.go:498\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollActivityTaskQueue_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1138\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/temporal/common/authorization/interceptor.go:136\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceCountLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_count_limit.go:84\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceRateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_rate_limit.go:85\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/rate_limit.go:79\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/telemetry.go:91\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:100\ngoogle.golang.org/grpc.chainUnaryServerInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1044\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollActivityTaskQueue_Handler\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1140\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1217\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1540\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:878"}

{"level":"warn","ts":"2022-03-23T15:39:33.424Z","msg":"Failed to cancel outstanding poller.","service":"frontend","wf-task-queue-name":"/_sys/1@singer-worker-dep-6d76455557-6vpkc:7b8f572c-3e9d-43dc-a8ea-a91bc9462e54/1","error":"context deadline exceeded","logging-call-at":"workflowHandler.go:3389"}

{"level":"warn","ts":"2022-03-23T15:39:33.893Z","msg":"Failed to cancel outstanding poller.","service":"frontend","wf-task-queue-name":"singer-activity-task-list","error":"context deadline exceeded","logging-call-at":"workflowHandler.go:3389"

This is fairly pressing for us to resolve as our workflows are blocked by these connection issues. Any help pointing us in the right direction is greatly appreciated, thanks.

“msg”: “PollWorkflowTaskQueue failed.”,
“service”: “frontend”,

This can typically happen when there are no tasks on the task queues your workers poll on, and the frontend service times out on a poll request. It should be benign and resolve itself. Are you experiencing performance issues when you get these logs? How do/did you determine that this is blocking your workflow executions as mentioned?

Hi tihomir, I can tell that it is blocking workflow execution as the http requests to create the workflows are being fulfilled but then the workflows themselves do not register as begun on temporal web and the activities do not log as begun or indicate any progress. Instead the worker just spews connection errors of this nature. This happens to us frequently but it rarely impacts regular operation of the worker like it is here, so it seems like it is not always benign.

When I ping the worker to begin a workflow, it logs this error:

18:15:29.430 [qtp1233308726-28] WARN  i.t.internal.common.GrpcRetryer - Retrying after failure
io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 9.999912507s. [remote_addr=temporalio-frontend-headless.temporalio-prod.svc/172.20.177.44:7233]
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
	at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.startWorkflowExecution(WorkflowServiceGrpc.java:2614)
	at io.temporal.internal.external.GenericWorkflowClientExternalImpl.lambda$start$0(GenericWorkflowClientExternalImpl.java:88)
	at io.temporal.internal.common.GrpcRetryer.retryWithResult(GrpcRetryer.java:97)
	at io.temporal.internal.external.GenericWorkflowClientExternalImpl.start(GenericWorkflowClientExternalImpl.java:81)
	at io.temporal.internal.sync.WorkflowStubImpl.startWithOptions(WorkflowStubImpl.java:155)
	at io.temporal.internal.sync.WorkflowStubImpl.start(WorkflowStubImpl.java:268)
	at io.temporal.internal.sync.WorkflowInvocationHandler$StartWorkflowInvocationHandler.invoke(WorkflowInvocationHandler.java:245)
	at io.temporal.internal.sync.WorkflowInvocationHandler.invoke(WorkflowInvocationHandler.java:181)
	at com.sun.proxy.$Proxy15.singerWorkflow(Unknown Source)
	at main.java.io.surfside.api.SingerController$el$1.invoke(SingerController.kt:120)
	at main.java.io.surfside.api.SingerController$el$1.invoke(SingerController.kt:43)
	at main.java.io.surfside.api.SingerController$sam$io_temporal_workflow_Functions_Func2$0.apply(SingerController.kt)
	at io.temporal.internal.sync.WorkflowClientInternal.lambda$start$4ed02937$1(WorkflowClientInternal.java:273)
	at io.temporal.internal.sync.WorkflowClientInternal.lambda$start$c6190765$1(WorkflowClientInternal.java:263)
	at io.temporal.internal.sync.WorkflowClientInternal.start(WorkflowClientInternal.java:218)
	at io.temporal.internal.sync.WorkflowClientInternal.start(WorkflowClientInternal.java:261)
	at io.temporal.internal.sync.WorkflowClientInternal.start(WorkflowClientInternal.java:273)
	at io.temporal.client.WorkflowClient.start(WorkflowClient.java:372)
	at main.java.io.surfside.api.SingerController.el(SingerController.kt:120)
	at main.java.io.surfside.api.ServerMain$run$3$1$4$1.invoke(ServerMain.kt:52)
	at main.java.io.surfside.api.ServerMain$run$3$1$4$1.invoke(ServerMain.kt:16)
	at main.java.io.surfside.api.ServerMain$sam$io_javalin_http_Handler$0.handle(ServerMain.kt)
	at io.javalin.core.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:23)
	at io.javalin.http.JavalinServlet$addHandler$protectedHandler$1.handle(JavalinServlet.kt:119)
	at io.javalin.http.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:45)
	at io.javalin.http.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:24)
	at io.javalin.http.JavalinServlet$service$1.invoke(JavalinServlet.kt:126)
	at io.javalin.http.JavalinServlet$service$2.invoke(JavalinServlet.kt:40)
	at io.javalin.http.JavalinServlet.service(JavalinServlet.kt:78)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
	at io.javalin.websocket.JavalinWsServlet.service(JavalinWsServlet.kt:51)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:763)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:551)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)

Thanks for that info. For the first timeout you showed, the deadline is related to WorkflowServiceStubsOptions#rpcLongPollTimeout that has the default timeout set to 70 seconds. There is a related issue that’s been resolved here, but its related to the test workflow environment, just fyi.

Regarding

DEADLINE_EXCEEDED: deadline exceeded after 9.999912507s
WorkflowServiceBlockingStub.startWorkflowExecution(WorkflowServiceGrpc.java:2614)

@alec ** Update: Checking on this again, as apparently was looking at a newer SDK version (not 1.7.1 that you specified)