Rolling Update Downtime Issue with Temporal Deployment:

Hello everyone,
We’re facing a persistent issue with our Temporal deployment that’s causing brief downtime during rolling updates, and we’re hoping to get some insights and suggestions from the community.

Problem Description:
Our Temporal cluster, deployed using the official Git repository, is generally working well. However, when we perform a rolling update, the process doesn’t seem to complete seamlessly, leading to a minimal but noticeable downtime.
We suspect our rolling update strategy might be the cause, and we’d appreciate any guidance on how to optimize it.

Our Current Configuration:
We’re using a standard Kubernetes Deployment with the following strategy:
YAML

strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Max number of pods that can be created above the desired number of pods
maxUnavailable: 1

Questions for the Community:
Is this a common issue, and what are the typical causes of downtime during Temporal rolling updates?
Based on our configuration, what changes would you recommend to achieve a zero-downtime rolling update?
What’s the best-practice deployment process for a production Temporal cluster? Are there specific health checks, pre-stop hooks, or readiness probes we should be using?

Any help or a pointer in the right direction would be highly appreciated! Thanks in advance!

Regards,

Nagendra Amuri