Deadlock for no obvious reason

So I do see:

[TMPRL1101] Workflow with ID 019c2c51-3ebf-7f2a-a011-ed14fe4028ac deadlocked after 00:00:02

Ah, so this was indeed a deadlock. Something happened that prevented the workflow from reaching its next await point within 2s. This usually means it is spinning CPU somewhere or there is some unsafe code not caught by the tracing event listener (or it is disabled in options or in code) that is delegating to default scheduler and not caught. Since it was transient, I think it is unlikely a code problem, but still possible. Could be something in how logging or payload conversion is implemented if there are customizations there.

If able to replicate that would be ideal, otherwise I am afraid there is not enough to go on to know which code caused it. Can try to replay history via replayer, but it is unlikely to fail given that it succeeded on another worker (and therefore Temporal replayed successfully to have it do so).

I see from a past post at OpenTelemetry locate workflow from error you are using OTel logging, if able to replicate somehow, I wonder if you can stop OTel logging and see if the error no longer occurs (therefore implying OTel logging may be doing something with the tasks that doesn’t work with our deterministic scheduler).