Handling DEADLINE_EXCEEDED error for closed connections

Hi Team,

We’ve been experiencing intermittent but recurring DEADLINE_EXCEEDED errors in our workflow, accompanied by the following error trace:

DEADLINE_EXCEEDED: deadline exceeded after 9.999893737s. [closed=[], open=[[remote_addr=xxx.temporal-frontend.xxx.com/...202.9.22:443]]]

To address this issue, we’ve implemented a temporary solution by increasing the ‘KeepAliveMaxConnectionAge’ parameter from 5 minutes to 2 hours. Additionally, we’ve enabled keep-alive and has minimised the RPC timeout from a default 10s to 1s.

Our workflow involves short-lived transactions where clients wait for responses. To further improve our system’s reliability, we’re exploring ways to proactively listen for HTTP2 ‘GoAway’ directives on the client side. This will allow us to renew connections instead of waiting for RPC timeouts.

Your insights and suggestions on how to do this would be greatly appreciated.

Best regards,