We are observing this error every 2-4 minutes on the temporal-frontend pods, even when there are no active workflows ( likely a poller related error ) .
Initial conditions:
Temporal : 1.18.0
Client : Java SDK : v 1.17.0
temporal-frontend - autoscaling with min 2 pods / max 3 pods
client-worker-app(Java) - autoscaling with min 2 pods / max 3 pods
[ namespace for both pods = “default” , queue-name = “my-queue-name” ]
workflow-poller configuration - default (MaxConcurrentWorkflowTaskPollers (default 5)
MaxConcurrentActivityTaskPollers (default 5) )
- Error Stack on temporal-frontend service :
{"level":"error","ts":"2023-01-09T22:40:52.073Z","msg":"Unable to call matching.PollWorkflowTaskQueue.","service":"frontend","wf-task-queue-name":"/_sys/my-queue-name/1","timeout":"1m9.99963089s","error":"context deadline exceeded","logging-call-at":"workflow_handler.go:894","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:143\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).PollWorkflowTaskQueue\n\t/home/builder/temporal/service/frontend/workflow_handler.go:894\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollWorkflowTaskQueue.func2\n\t
/home/builder/temporal/service/frontend/dcRedirectionHandler.go:598\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/home/builder/temporal/service/frontend/dcRedirectionPolicy.go:125\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollWorkflowTaskQueue\n\t/home/builder/temporal/service/frontend/dcRedirectionHandler.go:594\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollWorkflowTaskQueue_Handler.func1\n\t
/go/pkg/mod/go.temporal.io/api@v1.12.0/workflowservice/v1/service.pb.go:1516\ngo.temporal.io/server/common/rpc/interceptor.(*CallerInfoInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/caller_info.go:79\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1135\ngo.temporal.io/server/common/rpc/interceptor.(*SDKVersionInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/sdk_version.go:69\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1138\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/home/builder/temporal/common/authorization/interceptor.go:152\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1138\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/rate_limit.go:86\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1138\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceRateLimitInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/namespace_rate_limit.go:91\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1138\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceCountLimitInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/namespace_count_limit.go:99\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1138\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceValidatorInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/namespace_validator.go:109\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1138\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/telemetry.go:136\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1138\ngo.temporal.io/server/common/rpc/interceptor.(*RetryableInterceptor).Intercept.func1\n\t/home/builder/temporal/common/rpc/interceptor/retry.go:62\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/builder/temporal/common/backoff/retry.go:194\ngo.temporal.io/server/common/rpc/interceptor.(*RetryableInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/retry.go:66\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1138\ngo.temporal.io/server/common/metrics.NewServerMetricsContextInjectorInterceptor.func1\n\t/home/builder/temporal/common/metrics/grpc.go:66\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1138\ngo.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1\n\t/go/pkg/mod/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc@v0.32.0/interceptor.go:325\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1138\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/home/builder/temporal/common/rpc/grpc.go:137\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1138\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceLogInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/namespace_logger.go:84\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1138\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1140\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollWorkflowTaskQueue_Handler\n\t/go/pkg/mod/go.temporal.io/api@v1.12.0/workflowservice/v1/service.pb.go:1518\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1301\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1642\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:938"}
- Error stack on temporal-client-app ( Java application using temporal java sdk )
2023-01-09 23:52:47 INFO io.temporal.internal.worker.Poller [Workflow Poller taskQueue="my-queue-name", namespace="default": 2] DEADLINE_EXCEEDED in poller thread Workflow Poller taskQueue="my-queue-name", namespace="default": 2
io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 69.999918873s. [closed=[], open=[[remote_addr=temporal-frontend.workflow-qa2.svc/172.20.249.50:7233]]]
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:271) ~[grpc-stub-1.48.1.jar:1.48.1]
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:252) ~[grpc-stub-1.48.1.jar:1.48.1]
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:165) ~[grpc-stub-1.48.1.jar:1.48.1]
at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.pollWorkflowTaskQueue(WorkflowServiceGrpc.java:3248) ~[temporal-serviceclient-1.16.0.jar:?]
at io.temporal.internal.worker.WorkflowPollTask.poll(WorkflowPollTask.java:81) ~[temporal-sdk-1.16.0.jar:?]
at io.temporal.internal.worker.WorkflowPollTask.poll(WorkflowPollTask.java:38) ~[temporal-sdk-1.16.0.jar:?]
at io.temporal.internal.worker.Poller$PollExecutionTask.run(Poller.java:269) ~[temporal-sdk-1.16.0.jar:?]
at io.temporal.internal.worker.Poller$PollLoopTask.run(Poller.java:234) [temporal-sdk-1.16.0.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
- Both pods for the temporal-frontend service show traffic
grpc-health-probe was successful for WorkflowService
Note: -
I cross referenced this issue against similar existing issues and feel that the issues faced in our config seems different :
a. DEADLINE_EXCEEDED: deadline exceeded after 9.999933037s - solved by clearing up the database
b. Temporal Application is not establishing connection to Temporal Server inside Kubernetes - #7 by Martin_Lercher
Any specific poller configurations/metrics we need to look at ? which might be helpful in this case ?