Context deadline exceeded when trying to start workflow (v1.7.1)

Hi all, our temporal worker is currently spewing errors related to a failure to connect to the temporal frontend service. Our temporal cluster and worker are both deployed on kubernetes and this problem pops up often but usually doesn’t fully block workflow execution like it is now.

The errors from the java worker:

15:39:12.389 [Activity Poller taskQueue="singer-activity-task-list", namespace="singer-activity-namespace": 1] WARN  io.temporal.internal.worker.Poller - Failure in thread Activity Poller taskQueue="singer-activity-task-list", namespace="singer-activity-namespace": 1
io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 69.999511231s. [remote_addr=temporalio-frontend-headless.temporalio-prod.svc/172.20.177.44:7233]
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
	at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.pollActivityTaskQueue(WorkflowServiceGrpc.java:2683)
	at io.temporal.internal.worker.ActivityPollTask.poll(ActivityPollTask.java:105)
	at io.temporal.internal.worker.ActivityPollTask.poll(ActivityPollTask.java:39)
	at io.temporal.internal.worker.Poller$PollExecutionTask.run(Poller.java:265)
	at io.temporal.internal.worker.Poller$PollLoopTask.run(Poller.java:241)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)```

And from the frontend service:

{"level":"error","ts":"2022-03-23T15:39:19.028Z","msg":"PollWorkflowTaskQueue failed.","service":"frontend","wf-task-queue-name":"temporalio-worker-68b54d9db5-n7rm8:8894b8cf-4d18-4387-a661-a72924b2781a","value":"1m9.999762486s","error":"context deadline exceeded","logging-call-at":"workflowHandler.go:783","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:136\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).PollWorkflowTaskQueue\n\t/temporal/service/frontend/workflowHandler.go:783\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollWorkflowTaskQueue.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:532\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:118\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollWorkflowTaskQueue\n\t/temporal/service/frontend/dcRedirectionHandler.go:528\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollWorkflowTaskQueue_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1084\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/temporal/common/authorization/interceptor.go:136\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceCountLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_count_limit.go:84\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceRateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_rate_limit.go:85\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/rate_limit.go:79\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/telemetry.go:91\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:100\ngoogle.golang.org/grpc.chainUnaryServerInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1044\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollWorkflowTaskQueue_Handler\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1086\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1217\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1540\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:878"}

{"level":"error","ts":"2022-03-23T15:39:19.450Z","msg":"PollWorkflowTaskQueue failed.","service":"frontend","wf-task-queue-name":"temporalio-worker-68b54d9db5-n7rm8:7be3954b-01e4-4fc0-a2d6-ce997096579a","value":"1m9.99972652s","error":"context deadline exceeded","logging-call-at":"workflowHandler.go:783","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:136\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).PollWorkflowTaskQueue\n\t/temporal/service/frontend/workflowHandler.go:783\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollWorkflowTaskQueue.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:532\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:118\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollWorkflowTaskQueue\n\t/temporal/service/frontend/dcRedirectionHandler.go:528\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollWorkflowTaskQueue_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1084\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/temporal/common/authorization/interceptor.go:136\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceCountLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_count_limit.go:84\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceRateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_rate_limit.go:85\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/rate_limit.go:79\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/telemetry.go:91\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:100\ngoogle.golang.org/grpc.chainUnaryServerInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1044\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollWorkflowTaskQueue_Handler\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1086\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1217\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1540\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:878"}

{"level":"error","ts":"2022-03-23T15:39:20.433Z","msg":"PollActivityTaskQueue failed.","service":"frontend","wf-task-queue-name":"/_sys/temporal-sys-processor-parent-close-policy/3","value":"1m9.999747836s","error":"context deadline exceeded","logging-call-at":"workflowHandler.go:1045","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:136\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).PollActivityTaskQueue\n\t/temporal/service/frontend/workflowHandler.go:1045\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollActivityTaskQueue.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:502\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:118\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollActivityTaskQueue\n\t/temporal/service/frontend/dcRedirectionHandler.go:498\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollActivityTaskQueue_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1138\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/temporal/common/authorization/interceptor.go:136\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceCountLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_count_limit.go:84\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceRateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_rate_limit.go:85\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/rate_limit.go:79\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/telemetry.go:91\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:100\ngoogle.golang.org/grpc.chainUnaryServerInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1044\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollActivityTaskQueue_Handler\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1140\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1217\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1540\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:878"}

{"level":"error","ts":"2022-03-23T15:39:21.030Z","msg":"PollWorkflowTaskQueue failed.","service":"frontend","wf-task-queue-name":"/_sys/temporal-sys-processor-parent-close-policy/1","value":"1m9.999741537s","error":"context deadline exceeded","logging-call-at":"workflowHandler.go:783","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:136\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).PollWorkflowTaskQueue\n\t/temporal/service/frontend/workflowHandler.go:783\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollWorkflowTaskQueue.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:532\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:118\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollWorkflowTaskQueue\n\t/temporal/service/frontend/dcRedirectionHandler.go:528\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollWorkflowTaskQueue_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1084\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/temporal/common/authorization/interceptor.go:136\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceCountLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_count_limit.go:84\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceRateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_rate_limit.go:85\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/rate_limit.go:79\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/telemetry.go:91\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:100\ngoogle.golang.org/grpc.chainUnaryServerInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1044\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollWorkflowTaskQueue_Handler\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1086\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1217\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1540\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:878"}

{"level":"error","ts":"2022-03-23T15:39:21.232Z","msg":"PollWorkflowTaskQueue failed.","service":"frontend","wf-task-queue-name":"temporal-sys-batcher-taskqueue","value":"1m9.999763481s","error":"context deadline exceeded","logging-call-at":"workflowHandler.go:783","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:136\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).PollWorkflowTaskQueue\n\t/temporal/service/frontend/workflowHandler.go:783\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollWorkflowTaskQueue.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:532\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:118\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollWorkflowTaskQueue\n\t/temporal/service/frontend/dcRedirectionHandler.go:528\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollWorkflowTaskQueue_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1084\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/temporal/common/authorization/interceptor.go:136\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceCountLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_count_limit.go:84\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceRateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_rate_limit.go:85\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/rate_limit.go:79\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/telemetry.go:91\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:100\ngoogle.golang.org/grpc.chainUnaryServerInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1044\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollWorkflowTaskQueue_Handler\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1086\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1217\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1540\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:878"}

{"level":"error","ts":"2022-03-23T15:39:21.427Z","msg":"PollActivityTaskQueue failed.","service":"frontend","wf-task-queue-name":"/_sys/temporal-sys-processor-parent-close-policy/1","value":"1m9.999735015s","error":"context deadline exceeded","logging-call-at":"workflowHandler.go:1045","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:136\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).PollActivityTaskQueue\n\t/temporal/service/frontend/workflowHandler.go:1045\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollActivityTaskQueue.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:502\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:118\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollActivityTaskQueue\n\t/temporal/service/frontend/dcRedirectionHandler.go:498\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollActivityTaskQueue_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1138\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/temporal/common/authorization/interceptor.go:136\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceCountLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_count_limit.go:84\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceRateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_rate_limit.go:85\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/rate_limit.go:79\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/telemetry.go:91\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:100\ngoogle.golang.org/grpc.chainUnaryServerInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1044\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollActivityTaskQueue_Handler\n\t/go/pkg/mod/go.temporal.io/api@v1.4.1-0.20210319015452-3dc250bb642a/workflowservice/v1/service.pb.go:1140\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1217\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1540\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:878"}

{"level":"warn","ts":"2022-03-23T15:39:33.424Z","msg":"Failed to cancel outstanding poller.","service":"frontend","wf-task-queue-name":"/_sys/1@singer-worker-dep-6d76455557-6vpkc:7b8f572c-3e9d-43dc-a8ea-a91bc9462e54/1","error":"context deadline exceeded","logging-call-at":"workflowHandler.go:3389"}

{"level":"warn","ts":"2022-03-23T15:39:33.893Z","msg":"Failed to cancel outstanding poller.","service":"frontend","wf-task-queue-name":"singer-activity-task-list","error":"context deadline exceeded","logging-call-at":"workflowHandler.go:3389"

This is fairly pressing for us to resolve as our workflows are blocked by these connection issues. Any help pointing us in the right direction is greatly appreciated, thanks.

“msg”: “PollWorkflowTaskQueue failed.”,
“service”: “frontend”,

This can typically happen when there are no tasks on the task queues your workers poll on, and the frontend service times out on a poll request. It should be benign and resolve itself. Are you experiencing performance issues when you get these logs? How do/did you determine that this is blocking your workflow executions as mentioned?

Hi tihomir, I can tell that it is blocking workflow execution as the http requests to create the workflows are being fulfilled but then the workflows themselves do not register as begun on temporal web and the activities do not log as begun or indicate any progress. Instead the worker just spews connection errors of this nature. This happens to us frequently but it rarely impacts regular operation of the worker like it is here, so it seems like it is not always benign.

When I ping the worker to begin a workflow, it logs this error:

18:15:29.430 [qtp1233308726-28] WARN  i.t.internal.common.GrpcRetryer - Retrying after failure
io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 9.999912507s. [remote_addr=temporalio-frontend-headless.temporalio-prod.svc/172.20.177.44:7233]
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
	at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.startWorkflowExecution(WorkflowServiceGrpc.java:2614)
	at io.temporal.internal.external.GenericWorkflowClientExternalImpl.lambda$start$0(GenericWorkflowClientExternalImpl.java:88)
	at io.temporal.internal.common.GrpcRetryer.retryWithResult(GrpcRetryer.java:97)
	at io.temporal.internal.external.GenericWorkflowClientExternalImpl.start(GenericWorkflowClientExternalImpl.java:81)
	at io.temporal.internal.sync.WorkflowStubImpl.startWithOptions(WorkflowStubImpl.java:155)
	at io.temporal.internal.sync.WorkflowStubImpl.start(WorkflowStubImpl.java:268)
	at io.temporal.internal.sync.WorkflowInvocationHandler$StartWorkflowInvocationHandler.invoke(WorkflowInvocationHandler.java:245)
	at io.temporal.internal.sync.WorkflowInvocationHandler.invoke(WorkflowInvocationHandler.java:181)
	at com.sun.proxy.$Proxy15.singerWorkflow(Unknown Source)
	at main.java.io.surfside.api.SingerController$el$1.invoke(SingerController.kt:120)
	at main.java.io.surfside.api.SingerController$el$1.invoke(SingerController.kt:43)
	at main.java.io.surfside.api.SingerController$sam$io_temporal_workflow_Functions_Func2$0.apply(SingerController.kt)
	at io.temporal.internal.sync.WorkflowClientInternal.lambda$start$4ed02937$1(WorkflowClientInternal.java:273)
	at io.temporal.internal.sync.WorkflowClientInternal.lambda$start$c6190765$1(WorkflowClientInternal.java:263)
	at io.temporal.internal.sync.WorkflowClientInternal.start(WorkflowClientInternal.java:218)
	at io.temporal.internal.sync.WorkflowClientInternal.start(WorkflowClientInternal.java:261)
	at io.temporal.internal.sync.WorkflowClientInternal.start(WorkflowClientInternal.java:273)
	at io.temporal.client.WorkflowClient.start(WorkflowClient.java:372)
	at main.java.io.surfside.api.SingerController.el(SingerController.kt:120)
	at main.java.io.surfside.api.ServerMain$run$3$1$4$1.invoke(ServerMain.kt:52)
	at main.java.io.surfside.api.ServerMain$run$3$1$4$1.invoke(ServerMain.kt:16)
	at main.java.io.surfside.api.ServerMain$sam$io_javalin_http_Handler$0.handle(ServerMain.kt)
	at io.javalin.core.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:23)
	at io.javalin.http.JavalinServlet$addHandler$protectedHandler$1.handle(JavalinServlet.kt:119)
	at io.javalin.http.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:45)
	at io.javalin.http.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:24)
	at io.javalin.http.JavalinServlet$service$1.invoke(JavalinServlet.kt:126)
	at io.javalin.http.JavalinServlet$service$2.invoke(JavalinServlet.kt:40)
	at io.javalin.http.JavalinServlet.service(JavalinServlet.kt:78)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
	at io.javalin.websocket.JavalinWsServlet.service(JavalinWsServlet.kt:51)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:763)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:551)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)

Thanks for that info. For the first timeout you showed, the deadline is related to WorkflowServiceStubsOptions#rpcLongPollTimeout that has the default timeout set to 70 seconds. There is a related issue that’s been resolved here, but its related to the test workflow environment, just fyi.

Regarding

DEADLINE_EXCEEDED: deadline exceeded after 9.999912507s
WorkflowServiceBlockingStub.startWorkflowExecution(WorkflowServiceGrpc.java:2614)

@alec ** Update: Checking on this again, as apparently was looking at a newer SDK version (not 1.7.1 that you specified)

hi @tihomir, i am facing the same issue as alec, I am using java SDK v1.18.1, can you give me some advice? thanks

hi @Toan_Tran , were you able to resolve this? We are receiving the same error where the workflow start method blocks the api for 10 sec, but is able to start it on retry

hi @sebin, in my case, i deploy my app on openshift with istio sidecar, when I load test, the sidecar hits resource limit, by increasing resource for sidecar, the issue goes out

@alec @Toan_Tran @sebin were you able to resolve io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded issue? if so please suggest solution

Hi Mahith,
In our case, the temporal cluster was sitting behind aws lb and istio. We updated the dynamic config property for frontend KeepAliveMaxConnectionAge to 2 hours (default is 5 mins). We also had initially set rpctimeout to 500ms which was too less, we reverted this to the default 10 sec. This still resulted in few deadline errors and we finally increased the capacity of our instances as the rds temporal uses had very little ram capacity. After that everything is working smoothly.