Hello!
I’m currently trying out Workflow::awaitWithTimeout()
feature to implement “Watchful Timers” (e.g. timers that keep up with the date time value stored in the property of the workflow class).
In basic scenario it works just fine. I can start workflow with one fire date, then send update method to change this date so that previous timer gets cancelled, and new one is started.
Yet, in case of race conditions and multiple updates, it results in stuck workflow:
Workflow code basically looks like this:
#[WorkflowMethod]
public function execute(): Generator
{
yield new WatchfulTimer(fn () => $this->placementDate)();
yield $this->activity->placeOrder($this->orderId);
}
This WatchfulTimer
is implemented based on samples’ UpdatableTimer example.
class WatchfulTimer
{
public function __construct(
/** @var Closure(): CarbonImmutable */
private readonly Closure $waitUntil,
) {
}
public function __invoke(): Generator
{
$waitUntil = ($this->waitUntil)();
$now = $this->getCurrentDateTime();
$waitInterval = $now->diff($waitUntil);
if ($waitInterval->totalMilliseconds <= 1) {
$isTimerFired = true; // already timed-out
} else {
$isTimerFired = !yield Workflow::awaitWithTimeout(
$waitInterval,
fn (): bool => ($this->waitUntil)()->notEqualTo($waitUntil),
);
}
if (!$isTimerFired) {
return yield $this();
}
}
private function getCurrentDateTime(): CarbonImmutable
{
return CarbonImmutable::instance(Workflow::now());
}
}
Thus, algorithm is following:
- get target date
- get current date
- set timer for the interval as diff from current date to target date
- in case if workflow receives any updates for the target date, cancel the previous timer, and repeat the algorithm from the p.1.
- if the timer was fired, then continue execution of the main workflow
Now, this all was working fine until I tried to simulate multiple competing changes to the date. Looking at Temporal UI, the last scheduled timer is correct - 12 seconds. Yet, when it was fired, nothing has happened.
In logs I only see the following:
backend-1 | 2025-06-06T14:11:14+0000 DEBUG temporal validate request callback {"RunID": "0197456b-4f7d-7a7b-9f40-612ceb0bdbfa", "name": "changePlacementDate", "id": "31aabcc0-89c4-4b9e-8556-be5d97cacf8e", "is_replaying": true, "result": {"id":0,"command":{"id":"31aabcc0-89c4-4b9e-8556-be5d97cacf8e"}}}
backend-1 | 2025-06-06T14:11:14+0000 DEBUG temporal update request callback {"RunID": "0197456b-4f7d-7a7b-9f40-612ceb0bdbfa", "name": "changePlacementDate", "id": "31aabcc0-89c4-4b9e-8556-be5d97cacf8e", "result": {"id":0,"command":{"id":"31aabcc0-89c4-4b9e-8556-be5d97cacf8e"},"payloads":{"payloads":[{"metadata":{"encoding":"anNvbi9wbGFpbg=="},"data":"bnVsbA=="}]}}}
backend-1 | 2025-06-06T14:11:14+0000 DEBUG temporal update request received {"RunID": "0197456b-4f7d-7a7b-9f40-612ceb0bdbfa", "name": "changePlacementDate", "id": "b0d68f07-67df-4d59-894f-d15b4e13dbbe"}
backend-1 | 2025-06-06T14:11:14+0000 DEBUG temporal validate request callback {"RunID": "0197456b-4f7d-7a7b-9f40-612ceb0bdbfa", "name": "changePlacementDate", "id": "b0d68f07-67df-4d59-894f-d15b4e13dbbe", "is_replaying": true, "result": {"id":0,"command":{"id":"b0d68f07-67df-4d59-894f-d15b4e13dbbe"}}}
backend-1 | 2025-06-06T14:11:14+0000 DEBUG temporal update request callback {"RunID": "0197456b-4f7d-7a7b-9f40-612ceb0bdbfa", "name": "changePlacementDate", "id": "b0d68f07-67df-4d59-894f-d15b4e13dbbe", "result": {"id":0,"command":{"id":"b0d68f07-67df-4d59-894f-d15b4e13dbbe"},"payloads":{"payloads":[{"metadata":{"encoding":"anNvbi9wbGFpbg=="},"data":"bnVsbA=="}]}}}
backend-1 | 2025-06-06T14:11:14+0000 DEBUG temporal update request received {"RunID": "0197456b-4f7d-7a7b-9f40-612ceb0bdbfa", "name": "changePlacementDate", "id": "586f4167-9f75-416c-a85b-c31c5b2df125"}
backend-1 | 2025-06-06T14:11:14+0000 DEBUG temporal validate request callback {"RunID": "0197456b-4f7d-7a7b-9f40-612ceb0bdbfa", "name": "changePlacementDate", "id": "586f4167-9f75-416c-a85b-c31c5b2df125", "is_replaying": true, "result": {"id":0,"command":{"id":"586f4167-9f75-416c-a85b-c31c5b2df125"}}}
backend-1 | 2025-06-06T14:11:14+0000 DEBUG temporal update request callback {"RunID": "0197456b-4f7d-7a7b-9f40-612ceb0bdbfa", "name": "changePlacementDate", "id": "586f4167-9f75-416c-a85b-c31c5b2df125", "result": {"id":0,"command":{"id":"586f4167-9f75-416c-a85b-c31c5b2df125"},"payloads":{"payloads":[{"metadata":{"encoding":"anNvbi9wbGFpbg=="},"data":"bnVsbA=="}]}}}
backend-1 | 2025-06-06T14:11:14+0000 DEBUG temporal query request {"RunID": "0197456b-4f7d-7a7b-9f40-612ceb0bdbfa", "name": "__temporal_workflow_metadata"}
backend-1 | 2025-06-06T14:11:14+0000 DEBUG temporal close workflow {"RunID": "0197456b-4f7d-7a7b-9f40-612ceb0bdbfa"}
And lastly, temporal server logs contain only one line about this workflow:
time=2025-06-06T17:00:25.150 level=ERROR msg="service failures" operation=QueryWorkflow wf-namespace=default grpc_code=Unavailable wf-id=333:schedule wf-run-id=0197456b-4f7d-7a7b-9f40-612ceb0bdbfa error="sticky worker unavailable, please use original task queue."
Also, looking further down event history, it seems that some updates haven’t been fully applied (I mean timers weren’t canceled)