Workflow with promises is not recovering gracefully when restarting client

maxim · February 13, 2023, 9:14pm

This exception is not a problem as it is part of the normal SDK function when the workflow is removed from the worker cache.

You said that the workflow task is failing. When it fails it should be logged in the workflow history or the worker log.

jkrch · February 13, 2023, 9:20pm

OK good to know. Then I suppose the problem is just that the workflow never completes. At least it remains stuck until it times out. Given that the workflow is only executing activity methods with a thread sleep of a relatively short number of seconds, I’m led to think that it is malfunctioning.

maxim · February 13, 2023, 9:22pm

It looks like your activities never complete. Look at the list of pending ones. They have 10-minute timeout. So on any failure, they will take 10 minutes to be retried. I don’t think you have problem with the workflow itself.

maxim · February 13, 2023, 9:23pm

This shows that workflow is waiting on running activities:

jkrch:

  "pendingActivities": [
    {
      "activityId": "3ab474ee-fef1-3c9a-854d-e18bd27d5866",
      "activityType": {
        "name": "Test"
      },
      "state": "Started",
      "lastStartedTime": "2023-02-08T20:27:31.118983925Z",
      "attempt": 1,
      "expirationTime": "2023-02-08T20:37:31.099028607Z",
      "lastWorkerIdentity": "16028@test-1094"
    },
    {
      "activityId": "26b4d106-d8dd-36ec-919d-97d34ac63ef0",
      "activityType": {
        "name": "Test"
      },
      "state": "Started",
      "lastStartedTime": "2023-02-08T20:27:31.135965466Z",
      "attempt": 1,
      "expirationTime": "2023-02-08T20:37:31.099190386Z",
      "lastWorkerIdentity": "16028@test-1094"
    }
  ]

jkrch · February 13, 2023, 9:24pm

That’s correct, the workflow reports a pending activity. So is it the case that I need to wait at least 10 mins after restarting for it to complete? And if it doesn’t complete after that, then I can conclude that it’s malfunctioning? If so, I’ll test this and wait 10 mins to see if it resolves itself.

maxim · February 13, 2023, 9:25pm

Why are you specifying 10 minute timeout? Change it to a much shorter one if this activity is expected to run faster.

maxim · February 13, 2023, 9:25pm

And only specify the StartToClose timeout

jkrch · February 13, 2023, 9:35pm

OK, this time I reduced the startToClose timeout, and, after restarting the client, it completed after roughly that amount of time had passed. So If I understand correctly, is it that, when an activity method is interrupted by the client shutting down, upon restarting, the workflow will wait for the duration of the startToClose timeout before continuing with the workflow?

maxim · February 13, 2023, 9:39pm

Yes, this is the correct understanding. I would recommend reading up on the activity timeouts as they are essential for designing temporal applications.

To be precise workflow is not blocked unless it is waiting for an activity completion.

jkrch · February 13, 2023, 10:02pm

Thank you. I was wrongly assuming that, if the activity had failed due to the client going down, this would be immediately detected on resuming, as opposed to requiring the timeout to elapse. I’ll explore the heartbeat functionality as a way to recover more quickly following a crash. Thanks again for your time.

Topic		Replies	Views
Randomly Start workflow operation fails with internal error Community Support	1	33	December 2, 2024
Workflow is getting stuck at re-starting the service Community Support java-sdk	6	1286	October 1, 2021
Manual Workflow reset Community Support java-sdk	1	100	November 5, 2024
Workflows struck- not broke deterministic nature of workflow to best of my knowledge Community Support workflow-options	2	718	June 30, 2021
Handling promise timeout gracefully Community Support java-sdk	3	728	March 6, 2022

Workflow with promises is not recovering gracefully when restarting client

Related topics