Temporal client failure when server restarts

Benedito_Marques · November 21, 2023, 3:46pm

Hi everyone.
I’m using temporalio python SDK version 0.1b2 and I see that if Temporal Server restarts, the error bellow appears on client, and pending workflows are not executed anymore by this client:

So, how to restart the client/application, or retry connection automatically, if connection with server is lost in any moment?

craigd · December 7, 2023, 6:02pm

The SDK is at 1.4.0. You should upgrade!

I’d be interested to know what the recommended way of autoreconnecting as with the latest client SDK as if the Temporal server is restarted the client does not autoreconnect.

With a bit of trial and error trying to work out how to solve this, I found that the act of calling check_health() on the Temporal service client revives the connection after the Temporal server restarts so I am using this:

import asyncio
import logging
from typing import Optional
from temporalio.client import Client

TEMPORAL_SERVER = "localhost:7233"
TEMPORAL_NAMESPACE = "default"

logger = logging.getLogger(__name__)

class TemporalClientManager:
    def __init__(self) -> None:
        self.client: Optional[Client] = None
        self.client_health_check_period = 10

    async def connect(self) -> None:
        # Start task to periodically reconnect to Temporal if connection drops
        asyncio.create_task(self._keep_alive())
        # Make initial connection
        logger.info(f"Connecting to Temporal server {TEMPORAL_SERVER}")
        self.client = await Client.connect(
            target_host=TEMPORAL_SERVER,
            namespace=TEMPORAL_NAMESPACE,
        )

    async def _keep_alive(self) -> None:
        while True:
            # If disconnected, act of checking health appears to reconnect client
            await asyncio.sleep(self.client_health_check_period)
            if await self.is_connected():
                logger.debug(
                    f"Connection to Temporal server '{TEMPORAL_SERVER}' is alive"
                )
            else:
                logger.error(
                    f"Connection to Temporal server '{TEMPORAL_SERVER}' failed"
                )

    async def is_connected(self) -> bool:
        try:
            return await self.client.service_client.check_health()
        except Exception as e:
            logger.warning(f"Failed to check Temporal health: {e}")
            return False

temporal_client_mgr = TemporalClientManager()

I am not confident this is a good solution as I periodically get a “transport error” when running the health check:

craigd · December 7, 2023, 6:09pm

It looks like there is an open issue for this the client not auto-reconnecting:

Chad_Retz · December 7, 2023, 7:12pm

High-level client calls should automatically retry if they fail because of a connection is no longer available (as opposed to low level calls on one of the service fields, but you can set retry on those too). 1.4.0 now has keep alive built in by default (30s interval, 15s timeout). Which situations are you making a call after server restart that is failing? Is it a high level call like start workflow?

EDIT: I just updated/closed the GH issue with a test I performed confirming worker and client both recover

craigd · December 8, 2023, 9:43am

@Chad_Retz - please accept my apologies. I added my workaround when I was running 1.3.0 but didn’t retest on 1.4.0 because I found #397 open.

I only require autoreconnect to work for high-level calls so after reading your reply I removed my workaround code and found version 1.4.0 does reconnect automatically.

Thanks for the quick response.

Topic		Replies	Views
Health Check in the Python SDK Community Support python-sdk	1	647	October 12, 2023
Cannot connect by Python SDK Community Support python-sdk , general-impl	3	2862	September 28, 2022
Python client not able to connect to self-hosted Temporal server via proxy using authorization header Community Support python-sdk , auth	0	33	April 3, 2025
Connection failure Community Support go-sdk	1	1264	October 15, 2021
Container continuously getting restarted Server Deployment	2	820	April 8, 2024

Temporal client failure when server restarts

Related topics