How to properly use time skipping env.sleep

I am developing test cases using time skipping test envrionment and I get the following error when on second sleep. How should I interpret this error?
Thanks for helping

   await workflow_env.sleep(td)
.venv/lib/python3.9/site-packages/temporalio/testing/_workflow.py:425: in sleep
    await self._client.test_service.unlock_time_skipping_with_sleep(req)
.venv/lib/python3.9/site-packages/temporalio/service.py:658: in __call__
    return await self.service_client._rpc_call(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <temporalio.service._BridgeServiceClient object at 0x1071222e0>
rpc = 'unlock_time_skipping_with_sleep', req = duration {
  seconds: 5400
}

resp_type = <class 'temporal.api.testservice.v1.request_response_pb2.SleepResponse'>

    async def _rpc_call(
        self,
        rpc: str,
        req: google.protobuf.message.Message,
        resp_type: Type[ServiceResponse],
        *,
        service: str,
        retry: bool,
        metadata: Mapping[str, str],
        timeout: Optional[timedelta],
    ) -> ServiceResponse:
        global LOG_PROTOS
.venv/lib/python3.9/site-packages/temporalio/testing/_workflow.py:425: in sleep
    await self._client.test_service.unlock_time_skipping_with_sleep(req)
.venv/lib/python3.9/site-packages/temporalio/service.py:658: in __call__
    return await self.service_client._rpc_call(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <temporalio.service._BridgeServiceClient object at 0x1071222e0>
rpc = 'unlock_time_skipping_with_sleep', req = duration {
  seconds: 5400
}

resp_type = <class 'temporal.api.testservice.v1.request_response_pb2.SleepResponse'>

    async def _rpc_call(
        self,
        rpc: str,
        req: google.protobuf.message.Message,
        resp_type: Type[ServiceResponse],
        *,
        service: str,
        retry: bool,
        metadata: Mapping[str, str],
        timeout: Optional[timedelta],
    ) -> ServiceResponse:
        global LOG_PROTOS
        if LOG_PROTOS:
            logger.debug("Service %s request to %s: %s", service, rpc, req)
        try:
            client = await self._connected_client()
            resp = await client.call(
                service=service,
                rpc=rpc,
                req=req,
                resp_type=resp_type,
                retry=retry,
                metadata=metadata,
                timeout=timeout,
            )
            if LOG_PROTOS:
                logger.debug("Service %s response from %s: %s", service, rpc, resp)
            return resp
        except temporalio.bridge.client.RPCError as err:
            # Intentionally swallowing the cause instead of using "from"
            status, message, details = err.args
>           raise RPCError(message, RPCStatusCode(status), details)
E           temporalio.service.RPCError: Timeout expired

At first glance it looks like a timeout when calling the test server which makes me think it may have crashed. Is it possible to replicate easily with standalone code you can provide?

Hello Chad,
Here is case that fails.
A TopWorkflow starts a SubWorkflow. The test passes if code that starts SubWorkflow is commented out.

import asyncio
from datetime import timedelta

from temporalio import workflow
from temporalio.testing import WorkflowEnvironment
from temporalio.worker import Worker


@workflow.defn
class SubWorkflow:
    def __init__(self) -> None:
        self.signal_received = False

    @workflow.run
    async def run(self) -> bool:
        try:
            await workflow.wait_condition(
                lambda: self.signal_received, timeout=timedelta(days=2)
            )
            return True
        except asyncio.TimeoutError:
            return False

    @workflow.signal
    def some_signal(self) -> None:
        self.signal_received = True


@workflow.defn
class TopWorkflow:
    def __init__(self) -> None:
        self.signal_received = False

    @workflow.run
    async def run(self) -> bool:
        try:
           # test passes if child workflow is not started.
            await workflow.start_child_workflow(
                SubWorkflow.run, id="swf1", task_queue="tq1"
            )
            await workflow.wait_condition(
                lambda: self.signal_received, timeout=timedelta(days=3)
            )
            return True
        except asyncio.TimeoutError:
            return False

    @workflow.signal
    def some_signal(self) -> None:
        self.signal_received = True


async def test_signal_workflow_timeout():
    async with await WorkflowEnvironment.start_time_skipping() as env:
        with env.auto_time_skipping_disabled():
            async with Worker(
                env.client,
                task_queue="tq1",
                workflows=[TopWorkflow, SubWorkflow],
                debug_mode=True,
            ):
                # Start workflow, advance time past timeout, check result
                handle = await env.client.start_workflow(
                    TopWorkflow.run, id="wf1", task_queue="tq1"
                )
                await handle.describe()
                await env.sleep(timedelta(days=1))
                await handle.signal(TopWorkflow.some_signal)
                await env.sleep(timedelta(days=1))
                await env.sleep(timedelta(days=2))
                await env.sleep(timedelta(days=3))

Error:

                    await env.sleep(timedelta(days=1))
>                   await env.sleep(timedelta(days=2))

backend/tests/test_sleep_error/test_sleep_error.py:69: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.venv/lib/python3.9/site-packages/temporalio/testing/_workflow.py:425: in sleep
    await self._client.test_service.unlock_time_skipping_with_sleep(req)
.venv/lib/python3.9/site-packages/temporalio/service.py:658: in __call__
    return await self.service_client._rpc_call(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <temporalio.service._BridgeServiceClient object at 0x10dfbe3d0>
rpc = 'unlock_time_skipping_with_sleep', req = duration {
  seconds: 172800
}
....
            # Intentionally swallowing the cause instead of using "from"
            status, message, details = err.args
>           raise RPCError(message, RPCStatusCode(status), details)
E           temporalio.service.RPCError: Timeout expired

Thanks! I will investigate. There may be a bug with child workflows when auto-skipping is disabled.

ok. Thanks Chad.

This seems to be an issue with our time-skipping test server in this situation. I have opened Test server with child workflow and signal is timing out when attemping unlock-with-sleep · Issue #1618 · temporalio/sdk-java · GitHub.

Hello Chad,
Thanks for opening the bug to address this issue. Generally, how long would it take to address this kind of issue? Meanwhile, would it be possible to update the testing documentation to suggest of this issue? May be it will help others who stumble on this situation.
Thanks,
Pompa

I cannot give an exact time scale on the bugs in the testing server though we will try to get to them within a reasonable amount of time. We don’t generally add applicable bugs to the documentation sections they are about (there are a couple of other outstanding bugs on the test server too I believe). You can follow the issue for updates though.

OK. Fixing this issue will help writing top level test cases else each workflow test would need extensive mocking. I really love Temporal and thanks for creating such wonderful software. I will wait for the fix.
Thanks,
Pompa