Hi Temporal expert,
I’m running a Temporal workflow in a Kubernetes environment using the Python SDK. I’ve encountered an issue where the workflow gets stuck after an activity completes. Here’s the situation:
-
My worker pod was drained and rescheduled to a new node.
-
The past activity completed successfully on the new pod.
-
However, the workflow did not proceed to the next activity (ClearActivity), and the workflow remains in a running state.
From temporal workflow show, I see:[25] ActivityTaskCompleted
eventTime: 2025-09-24T18:51:42.195080007Z
identity: 1@cmswfcnos-7687dd9976-bmj54
result[0]: {“StatusCode”: 0}
Then:
[26] WorkflowTaskScheduled
[27] WorkflowTaskStarted
[28] WorkflowTaskCompleted
So the workflow task was scheduled and completed, but no new activity was started.
From temporal workflow describe, I see:
Pending Activities: 0
Pending Child Workflows: 0
And from temporal workflow query -w --name __stack_trace, I get:
QueryResult "Stack for \u003cTask pending name='query: __stack_trace (workflow: NodeEnterMaintWorkflow, id: node-enter-maint-workflow-56a57ff3dfffc8b0, run: 01997d00-f6f6-70eb-b497-f98bbb40942b)' coro=\u003c_WorkflowInstanceImpl._apply_query_workflow.\u003clocals\u003e.run_query() running at /usr/lib64/python3.10/site-packages/temporalio/worker/_workflow_instance.py:451\u003e cb=[set.remove()]\u003e (most recent call last):\n File "/usr/lib64/python3.10/threading.py", line 973, in _bootstrap\n self._bootstrap_inner()\n\n File "/usr/lib64/python3.10/threading.py", line 1016, in _bootstrap_inner\n self.run()\n\n File "/usr/lib64/python3.10/threading.py", line 953, in run\n self._target(*self._args, **self._kwargs)\n\n File "/usr/lib64/python3.10/concurrent/futures/thread.py", line 83, in _worker\n work_item.run()\n\n File "/usr/lib64/python3.10/concurrent/futures/thread.py", line 58, in run\n result = self.fn(*self.args, **self.kwargs)\n\n File "/usr/lib64/python3.10/site-packages/temporalio/worker/workflow_sandbox/_runner.py", line 145, in activate\n self._run_code(\n\n File "/usr/lib64/python3.10/site-packages/temporalio/worker/workflow_sandbox/_runner.py", line 160, in _run_code\n exec(code, self.globals_and_locals, self.globals_and_locals)\n\n File "\u003cstring\u003e", line 2, in \u003cmodule\u003e\n\n File "/usr/lib64/python3.10/site-packages/temporalio/worker/workflow_sandbox/_in_sandbox.py", line 51, in activate\n return self.instance.activate(act)\n\n File "/usr/lib64/python3.10/site-packages/temporalio/worker/_workflow_instance.py", line 306, in activate\n self._run_once(check_conditions=index == 1 or index == 2)\n\n File "/usr/lib64/python3.10/site-packages/temporalio/worker/_workflow_instance.py", line 1327, in _run_once\n handle._run()\n\n File "/usr/lib64/python3.10/asyncio/events.py", line 80, in _run\n self._context.run(self._callback, *self._args)\n\n File "/usr/lib64/python3.10/site-packages/temporalio/worker/_workflow_instance.py", line 451, in run_query\n success = await self._inbound.handle_query(input)\n\n\nStack for \u003c_ActivityHandle pending name='activity: ClearActivity (workflow: NodeEnterMaintWorkflow, id: node-enter-maint-workflow-56a57ff3dfffc8b0, run: 01997d00-f6f6-70eb-b497-f98bbb40942b)' coro=\u003c_WorkflowInstanceImpl._outbound_schedule_activity.\u003clocals\u003e.run_activity() running at /usr/lib64/python3.10/site-packages/temporalio/worker/_workflow_instance.py:1067\u003e wait_for=\u003cFuture pending cb=[shield.\u003clocals\u003e._outer_done_callback() at /usr/lib64/python3.10/asyncio/tasks.py:864, _ActivityHandle.task_wakeup()]\u003e cb=[set.remove(), Task.task_wakeup()]\u003e (most recent call last):\n File "/usr/lib64/python3.10/site-packages/temporalio/worker/_workflow_instance.py", line 1067, in run_activity\n return await asyncio.shield(handle._result_fut)\n\n\nStack for \u003cTask pending name='run (workflow: NodeEnterMaintWorkflow, id: node-enter-maint-workflow-56a57ff3dfffc8b0, run: 01997d00-f6f6-70eb-b497-f98bbb40942b)' coro=\u003c_WorkflowInstanceImpl._run_top_level_workflow_function() running at /usr/lib64/python3.10/site-packages/temporalio/worker/_workflow_instance.py:1350\u003e wait_for=\u003c_ActivityHandle pending name='activity: ClearActivity (workflow: NodeEnterMaintWorkflow, id: node-enter-maint-workflow-56a57ff3dfffc8b0, run: 01997d00-f6f6-70eb-b497-f98bbb40942b)' coro=\u003c_WorkflowInstanceImpl._outbound_schedule_activity.\u003clocals\u003e.run_activity() running at /usr/lib64/python3.10/site-packages/temporalio/worker/_workflow_instance.py:1067\u003e wait_for=\u003cFuture pending cb=[shield.\u003clocals\u003e._outer_done_callback() at /usr/lib64/python3.10/asyncio/tasks.py:864, _ActivityHandle.task_wakeup()]\u003e cb=[set.remove(), Task.task_wakeup()]\u003e cb=[set.remove()]\u003e (most recent call last):\n File "/usr/lib64/python3.10/site-packages/temporalio/worker/_workflow_instance.py", line 1350, in _run_top_level_workflow_function\n await coro\n"
This seems to suggest the workflow is awaiting ClearActivity, but it hasn’t been started.
Tried to delete the pod to spawn a new one, however no luck to push the workflow forward.
The temporal version is 1.27.1, backend DB is postgres, and python version is 3.10.14
Please help to shed lights what is going on and your help is highly appreciated!