In PROD, we are scaling out our Temporal usage. We updated poller count to 30 instead of using default and we suddenly seeing a lot of DEADLINE_EXCEEDED in poller thread
errors. Increaseing poller timeout to 90s didn’t help.
We tried to set poller count back to 5 and the errors are gone. I’m wondering if there’s a setting on the Temporal cluster side that we need to change to support more concurrent pollers.
I’d be very grateful if someone could point me in the right direction :).
In general you don’t have to alert on deadline exceeded on poll operations, is typically intermittent. in this case however its pretty high for long duration of time.
can you show
resource exhausted issues:
sum(rate(service_errors_resource_exhausted{}[1m])) by (operation, resource_exhausted_cause)
service errors:
sum(rate(service_errors{service_name="frontend"}[1m]) or on () vector(0))
Hi @tihomor, thanks for jumping in to help.
We don’t have many Resource Exhausted Error in the past 7 days, except for 1 moment in time where we ran some performance tests.
As for FE error, 1st one is the overall chart
which mainly consists of the following errors but I don’t think they are related to pollers