Temporal tests don't run in air-gapped environments

I reported this as a bug here, but I think this needs wider visibility, so I am cross-posting it in the forums. Thank you for any attention you can give to it.

What are you really trying to do?

I’m trying to run the sample unit tests, behind a firewall that does not allow downloading the temporal test server binary.

Describe the bug

I don’t think this bug has to do with this repo, specifically. I think it’s actually deep in the guts of Temporal itself.

When I try to run the unit tests, Temporal is unable to download a test server. This happens when I am behind a firewall that blocks access to certain sites.

I am able to run temporal server start-dev in a separate terminal (using a Temporal server I installed using brew install temporal). I found that start_time_skipping() supports a test_server_existing_path argument, so I tried adding that to the test code (here in this repo). Unfortunately, it appears that Temporal doesn’t pass that argument all the way down to EphemeralServer, so that deep code tries to download a fresh copy of the test server binary. And that’s where my firewall steps in and says “nope.”

(More output is included later, for context.) The specific output that shows the binary cannot be downloaded is here:

    @staticmethod
    async def start_test_server(
        runtime: temporalio.bridge.runtime.Runtime, config: TestServerConfig
    ) -> EphemeralServer:
        """Start a test server instance."""
        return EphemeralServer(
>           await temporalio.bridge.temporal_sdk_bridge.start_test_server(
                runtime._ref, config
            )
        )
E       RuntimeError: Failed starting test server: error sending request for url (https://temporal.download/temporal-test-server/default?arch=arm64&platform=darwin&sdk-name=sdk-python&sdk-version=1.2.0): error trying to connect: invalid peer certificate contents: invalid peer certificate: UnknownIssuer

Minimal Reproduction

I have erred on the side of full details and included a full (sanitized) traceback.

source env/bin/activate
cd exercises/testing-code/solution
python -m pytest
============================= test session starts ==============================
platform darwin -- Python 3.11.6, pytest-7.4.2, pluggy-1.3.0
rootdir: /code/3rd/temporalio/edu-102-python-code/exercises/testing-code/practice
plugins: asyncio-0.21.1
asyncio: mode=Mode.STRICT
collected 2 items

tests/test_activities.py F                                               [ 50%]
tests/test_workflow.py F                                                 [100%]

=================================== FAILURES ===================================
_________ test_success_translate_activity_hello_german[input0-output0] _________

self = <aiohttp.connector.TCPConnector object at 0x102f18ed0>
req = <aiohttp.client_reqrep.ClientRequest object at 0x102f19950>
timeout = ClientTimeout(total=300, connect=None, sock_read=None, sock_connect=None)
client_error = <class 'aiohttp.client_exceptions.ClientConnectorError'>
args = (functools.partial(<class 'aiohttp.client_proto.ResponseHandler'>, loop=<_UnixSelectorEventLoop running=False closed=False debug=False>), '127.0.0.1', 9999)
kwargs = {'family': <AddressFamily.AF_INET: 2>, 'flags': <AddressInfo.AI_NUMERICHOST|AI_NUMERICSERV: 4100>, 'local_addr': None, 'proto': 6, ...}

    async def _wrap_create_connection(
        self,
        *args: Any,
        req: "ClientRequest",
        timeout: "ClientTimeout",
        client_error: Type[Exception] = ClientConnectorError,
        **kwargs: Any,
    ) -> Tuple[asyncio.Transport, ResponseHandler]:
        try:
            async with ceil_timeout(timeout.sock_connect):
>               return await self._loop.create_connection(*args, **kwargs)  # type: ignore[return-value]  # noqa

../../../env/lib/python3.11/site-packages/aiohttp/connector.py:980:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py:1085: in create_connection
    raise exceptions[0]
/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py:1069: in create_connection
    sock = await self._connect_sock(
/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py:973: in _connect_sock
    await self.sock_connect(sock, address)
/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/selector_events.py:634: in sock_connect
    return await fut
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_UnixSelectorEventLoop running=False closed=False debug=False>
fut = None, sock = <socket.socket [closed] fd=-1, family=2, type=1, proto=6>
address = ('127.0.0.1', 9999)

    def _sock_connect_cb(self, fut, sock, address):
        if fut.done():
            return

        try:
            err = sock.getsockopt(socket.SOL_SOCKET, socket.SO_ERROR)
            if err != 0:
                # Jump to any except clause below.
>               raise OSError(err, f'Connect call failed {address}')
E               ConnectionRefusedError: [Errno 61] Connect call failed ('127.0.0.1', 9999)

/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/selector_events.py:674: ConnectionRefusedError

The above exception was the direct cause of the following exception:

input = TranslationActivityInput(term='hello', language_code='de')
output = TranslationActivityOutput(translation='Hallo')

    @pytest.mark.asyncio
    @pytest.mark.parametrize(
        "input, output",
        [
            (
                TranslationActivityInput(term="hello", language_code="de"),
                TranslationActivityOutput("Hallo"),
            ),
            # TODO add a second test cases input and output here
        ],
    )
    async def test_success_translate_activity_hello_german(input, output):
        async with aiohttp.ClientSession() as session:
            activity_environment = ActivityEnvironment()
            activities = TranslationActivities(session)
>           assert output == await activity_environment.run(
                activities.translate_term, input
            )

tests/test_activities.py:23:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../env/lib/python3.11/site-packages/temporalio/testing/_activity.py:177: in run_async
    return await self.task
activities.py:20: in translate_term
    async with self.session.get(url) as response:
../../../env/lib/python3.11/site-packages/aiohttp/client.py:1141: in __aenter__
    self._resp = await self._coro
../../../env/lib/python3.11/site-packages/aiohttp/client.py:536: in _request
    conn = await self._connector.connect(
../../../env/lib/python3.11/site-packages/aiohttp/connector.py:540: in connect
    proto = await self._create_connection(req, traces, timeout)
../../../env/lib/python3.11/site-packages/aiohttp/connector.py:901: in _create_connection
    _, proto = await self._create_direct_connection(req, traces, timeout)
../../../env/lib/python3.11/site-packages/aiohttp/connector.py:1209: in _create_direct_connection
    raise last_exc
../../../env/lib/python3.11/site-packages/aiohttp/connector.py:1178: in _create_direct_connection
    transp, proto = await self._wrap_create_connection(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <aiohttp.connector.TCPConnector object at 0x102f18ed0>
req = <aiohttp.client_reqrep.ClientRequest object at 0x102f19950>
timeout = ClientTimeout(total=300, connect=None, sock_read=None, sock_connect=None)
client_error = <class 'aiohttp.client_exceptions.ClientConnectorError'>
args = (functools.partial(<class 'aiohttp.client_proto.ResponseHandler'>, loop=<_UnixSelectorEventLoop running=False closed=False debug=False>), '127.0.0.1', 9999)
kwargs = {'family': <AddressFamily.AF_INET: 2>, 'flags': <AddressInfo.AI_NUMERICHOST|AI_NUMERICSERV: 4100>, 'local_addr': None, 'proto': 6, ...}

    async def _wrap_create_connection(
        self,
        *args: Any,
        req: "ClientRequest",
        timeout: "ClientTimeout",
        client_error: Type[Exception] = ClientConnectorError,
        **kwargs: Any,
    ) -> Tuple[asyncio.Transport, ResponseHandler]:
        try:
            async with ceil_timeout(timeout.sock_connect):
                return await self._loop.create_connection(*args, **kwargs)  # type: ignore[return-value]  # noqa
        except cert_errors as exc:
            raise ClientConnectorCertificateError(req.connection_key, exc) from exc
        except ssl_errors as exc:
            raise ClientConnectorSSLError(req.connection_key, exc) from exc
        except OSError as exc:
            if exc.errno is None and isinstance(exc, asyncio.TimeoutError):
                raise
>           raise client_error(req.connection_key, exc) from exc
E           aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host localhost:9999 ssl:default [Connect call failed ('127.0.0.1', 9999)]

../../../env/lib/python3.11/site-packages/aiohttp/connector.py:988: ClientConnectorError
_________________________ test_successful_translation __________________________

    @pytest.mark.asyncio
    async def test_successful_translation():
>       async with await WorkflowEnvironment.start_time_skipping() as env:

tests/test_workflow.py:12:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../env/lib/python3.11/site-packages/temporalio/testing/_workflow.py:292: in start_time_skipping
    server = await temporalio.bridge.testing.EphemeralServer.start_test_server(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

runtime = <temporalio.bridge.runtime.Runtime object at 0x1031faad0>
config = TestServerConfig(existing_path=None, sdk_name='sdk-python', sdk_version='1.2.0', download_version='default', download_dest_dir=None, port=None, extra_args=[])

    @staticmethod
    async def start_test_server(
        runtime: temporalio.bridge.runtime.Runtime, config: TestServerConfig
    ) -> EphemeralServer:
        """Start a test server instance."""
        return EphemeralServer(
>           await temporalio.bridge.temporal_sdk_bridge.start_test_server(
                runtime._ref, config
            )
        )
E       RuntimeError: Failed starting test server: error sending request for url (https://temporal.download/temporal-test-server/default?arch=arm64&platform=darwin&sdk-name=sdk-python&sdk-version=1.2.0): error trying to connect: invalid peer certificate contents: invalid peer certificate: UnknownIssuer

../../../env/lib/python3.11/site-packages/temporalio/bridge/testing.py:67: RuntimeError
=========================== short test summary info ============================
FAILED tests/test_activities.py::test_success_translate_activity_hello_german[input0-output0]
FAILED tests/test_workflow.py::test_successful_translation - RuntimeError: Fa...
============================== 2 failed in 0.46s ===============================

Environment/Versions

  • OS and processor: M1 Mac
  • Temporal Version: temporal version 0.10.7 (server 1.22.2) (ui 2.21.3)
  • Python SDK version: temporalio==1.2.0
  • Are you using Docker or Kubernetes or building Temporal from source? None of those; local Python virtualenv

Additional context

NOTE: I tried this without my firewall, and Temporal did download the binary to my $TMPDIR. Subsequent runs seemed to try to use that binary (but there was another issue, which I’ll try to resolve/report elsewhere). I had to rm temporal-test-server-sdk-python-1.2.0 and turn my firewall back on to get the trackback details.

I also tried to find a workaround by running my local dev server on --port 9999 and --http-port 9999, but Temporal did not seem to find this server during testing; it kept trying to download the binary.

Also, can we add a tag like air-gapped or something similar, please? Many users will have networking restrictions and a tag like this would help us find relevant topics. Thanks.

start_time_skipping accepts a test_server_existing_path that when set won’t attempt any download. You can download the test server on our Java SDK releases page and just point test_server_existing_path to the extracted executable. (note, on an M1 you’ll have to use the Intel translation/emulation because we don’t offer an ARM-based version of the test server currently)

Thanks @Chad_Retz . However, let me highlight a relevant bit from my initial post:

Even when I specify test_server_existing_path to an actual binary that I downloaded, it doesn’t work. I looked through the code, and I cannot see where this path is passed down to EphemeralServer. I think it should be passed down, but it doesn’t appear to be so. Do you know the code well enough to assess if this is indeed being passed down properly?

Thanks!

Hrmm, it is set on the TestServerConfig which is passed through to the Rust implementation which should be using the ExistingPath enum instead of the CachedDownload enum, but it’s possible we missed something.

I will try to set some time aside in the near future to confirm that providing the existing exe path works and it isn’t getting lost somewhere (or if you find where it’s getting lost let us know).

Thank you. Now that you mention it, the Rust boundary is where I stopped digging. (I’d like to learn some Rust, but Python project deliverables are taking all my time these days.)

I have confirmed that test_server_existing_path does try to use that executable