WorkflowEnvironment.start_time_skipping() results in "Failed starting test server..."

yo yo!

I don’t know how to move forward with this error I’m getting while running a Workflow’s TestCase with pytest. I’m using Python3.11 (same results with Python3.10), temporalio==1.3.0, and the latest temporalio/auto-setup.

...
tevo_tbd_axs_repo_app_dev  | FAILED [100%]
test_update_listings_workflow.py:93 (UpdateListingsWorkflowTestCase.test_listings_workflow_simple)
self = <tests.workflows.managelistings.test_update_listings_workflow.UpdateListingsWorkflowTestCase testMethod=test_listings_workflow_simple>

    async def test_listings_workflow_simple(self):
        """Test for UpdateListingsWorkflow"""
        offer_id = '12345'
        item_ids = [1234, 1235]
        result = {'item_ids': [1234, 1235], 'result': '2 tickets updated'}
    
        @activity.defn
        async def update_listings(offer_id: str, item_ids: List[int]) -> List[int]:
            return [1234, 1235]
    
        activities = [
            update_listings,
        ]
    
>       async with await WorkflowEnvironment.start_time_skipping() as env:

test_update_listings_workflow.py:108: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.11/site-packages/temporalio/testing/_workflow.py:291: in start_time_skipping
    server = await temporalio.bridge.testing.EphemeralServer.start_test_server(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

runtime = <temporalio.bridge.runtime.Runtime object at 0xffff9d82a310>
config = TestServerConfig(existing_path=None, sdk_name='sdk-python', sdk_version='1.3.0', download_version='default', download_dest_dir=None, port=None, extra_args=[])

    @staticmethod
    async def start_test_server(
        runtime: temporalio.bridge.runtime.Runtime, config: TestServerConfig
    ) -> EphemeralServer:
        """Start a test server instance."""
        return EphemeralServer(
>           await temporalio.bridge.temporal_sdk_bridge.start_test_server(
                runtime._ref, config
            )
        )
E       RuntimeError: Failed starting test server: error decoding response body: expected value at line 1 column 1

/usr/local/lib/python3.11/site-packages/temporalio/bridge/testing.py:67: RuntimeError

I don’t see how we’ve botched this test. I get this error on my local dev machine (M1 macbook pro) and in our kubernetes testing & staging environments.

FWIW, this is the full failing test:

import datetime
import unittest

from temporalio import activity
from temporalio.client import WorkflowFailureError
from temporalio.testing import WorkflowEnvironment
from temporalio.worker import Worker

from app.workflows.managelistings.update_listings_workflow import UpdateListingsWorkflow


class UpdateListingsWorkflowTestCase(unittest.IsolatedAsyncioTestCase):
    """UpdateListingsWorkflowTestCase"""

    async def test_listings_workflow_simple(self):
        """Test for UpdateListingsWorkflow"""
        offer_id = '12345'
        item_ids = [1234, 1235]
        result = {'item_ids': [1234, 1235], 'result': '2 tickets updated'}

        @activity.defn
        async def update_listings(_: str, __: list[int]) -> list[int]:
            return [1234, 1235]

        activities = [
            update_listings,
        ]

        async with await WorkflowEnvironment.start_time_skipping() as env:
            async with Worker(
                env.client,
                task_queue="tq1",
                workflows=[UpdateListingsWorkflow],
                activities=activities
            ):
                workflow_params = {'offer_id': offer_id, 'item_ids': item_ids}
                if result == 'fail':
                    with self.assertRaises(WorkflowFailureError) as wfe:
                        await env.client.execute_workflow(
                            UpdateListingsWorkflow.run,
                            args=[workflow_params],
                            id="wf1",
                            task_queue="tq1",
                            run_timeout=datetime.timedelta(seconds=30),
                        )
                    self.assertEqual('Workflow execution failed', wfe.exception.args[0])
                else:
                    response = await env.client.execute_workflow(
                        UpdateListingsWorkflow.run,
                        args=[workflow_params],
                        id="wf1",
                        task_queue="tq1",
                        run_timeout=datetime.timedelta(seconds=30),
                    )
                    self.assertEqual(result, response)

Are there values I can pass to test_server_extra_args that would cause more verbose output from the test server binary? I’m sure these are documented somewhere but I haven’t found them yet. Any other suggested solutions (or things to look into) would be very appreciated.

I have seen this when the lazy downloader HTTP URL fails. We use a CDN server that gives the URL for us to download the proper JSON. In this case, it’s trying to visit https://temporal.download/temporal-test-server/default?platform=darwin&arch=arm64&sdk-name=sdk-python&sdk-version=1.3.0 which may have been failing at the time (it then uses that URL to try to download from GitHub releases). Can you retry? Worst case scenario, you can download a version of the test server from our Java releases area and set the test_server_existing_path to point to the extracted executable.

This is failing on the download step I think before it even starts the test server

Thanks for he info! After retrying, I get the same result.

I’m a little surprised at ...platform=darwin&arch=arm64.... My dev machine is an M1 macbook pro, however the run environment (locally) is a Linux docker container with uname -m returning aarch64. I don’t know what our testing and staging envs are running but presumably x86-based. Also interesting is the temporal.download response:

{
	"archiveUrl": "https://temporal.download/assets/temporalio/sdk-java/releases/download/v1.17.0/temporal-test-server_1.17.0_macOS_amd64.tar.gz",
	"fileToExtract": "temporal-test-server_1.17.0_macOS_amd64/temporal-test-server"
}

I assume (haven’t checked) this would work if my macbook were my runtime environment, but is this request+payload expected for an aarch64 Linux setup?

Tomorrow morning I’ll try something like the “worst case scenario” you described. Thanks again for the help, I (obviously) wasn’t getting anywhere on my own (@unittest.skip ftw!).

We don’t have an arm build for the test server, so no, aarch64/arm64 Linux will not work with our test server currently. This is likely the error you are running into, https://temporal.download/temporal-test-server/default?platform=linux&arch=arm64&sdk-name=sdk-python&sdk-version=1.3.0 404’s. For M1 mac, many run tests locally so the max-intel translation layer will work w/ x86/amd64 binaries. But if you’re running docker which has arm Linux, our test server won’t work. See this issue (and this one).

I have an issue open to document this better, and I have just opened an issue to make this error clearer.