Workflow with promises is not recovering gracefully when restarting client

This exception is not a problem as it is part of the normal SDK function when the workflow is removed from the worker cache.

You said that the workflow task is failing. When it fails it should be logged in the workflow history or the worker log.

OK good to know. Then I suppose the problem is just that the workflow never completes. At least it remains stuck until it times out. Given that the workflow is only executing activity methods with a thread sleep of a relatively short number of seconds, I’m led to think that it is malfunctioning.

It looks like your activities never complete. Look at the list of pending ones. They have 10-minute timeout. So on any failure, they will take 10 minutes to be retried. I don’t think you have problem with the workflow itself.

This shows that workflow is waiting on running activities:

That’s correct, the workflow reports a pending activity. So is it the case that I need to wait at least 10 mins after restarting for it to complete? And if it doesn’t complete after that, then I can conclude that it’s malfunctioning? If so, I’ll test this and wait 10 mins to see if it resolves itself.

Why are you specifying 10 minute timeout? Change it to a much shorter one if this activity is expected to run faster.

And only specify the StartToClose timeout

OK, this time I reduced the startToClose timeout, and, after restarting the client, it completed after roughly that amount of time had passed. So If I understand correctly, is it that, when an activity method is interrupted by the client shutting down, upon restarting, the workflow will wait for the duration of the startToClose timeout before continuing with the workflow?

Yes, this is the correct understanding. I would recommend reading up on the activity timeouts as they are essential for designing temporal applications.

To be precise workflow is not blocked unless it is waiting for an activity completion.

Thank you. I was wrongly assuming that, if the activity had failed due to the client going down, this would be immediately detected on resuming, as opposed to requiring the timeout to elapse. I’ll explore the heartbeat functionality as a way to recover more quickly following a crash. Thanks again for your time.