Older workflow not re-executed and new workflows not initiated after stop/start temporal server

Team,

I ran the sample HelloActivity and I included Thread.sleep for 20 secs in my activity method to test the temporal server failover.

  1. I Started temporal server
  2. I ran the worker and started my workflow (say wf1)
  3. I quitted my temporal server (My activity is in running status in the meantime)
  4. After a couple of minutes I started my Temporal server.

My previous workflow(wf1) is still in running status. It should have to replay my workflow history and run my activity again right? But it shouldn’t run it.

I tried creating new workflows(say N-wf2, N-wf3, etc…) but it is not executed and not even shown in the web UI.

And then I terminated my previous workflow(wf1) which is in running status via web UI, then I created new workflows(say AT-wf1, AT-wf2, etc…) that are working as expected.

Two questions,

  1. My first workflow(wf1) should have to be completed successfully. But it won’t
  2. The new workflows(N-wf2, N-wf3, etc…) which are created after re-starting my temporal server should have to be executed. But it not even initiated.

Kindly give your inputs, I hope I conveyed my issue.

Would you post your workflow execution history here?

Please find the attachment

Would you show the “Full Details” view or the JSON of the history (Export button). Also, the summary view of the workflow.

Please refer the image for summary


Find below the Json History

[{“eventId”:“1”,“eventTime”:{“seconds”:“1621600664”,“nanos”:382172168},“eventType”:“EVENT_TYPE_WORKFLOW_EXECUTION_STARTED”,“version”:“0”,“taskId”:“10485817”,“workflowExecutionStartedEventAttributes”:{“workflowType”:{“name”:“GreetingWorkflow”},“parentWorkflowNamespace”:“”,“parentInitiatedEventId”:“0”,“taskQueue”:{“name”:“HelloActivityTaskQueue”,“kind”:“TASK_QUEUE_KIND_NORMAL”},“input”:{“payloads”:[{“metadata”:{“encoding”:“anNvbi9wbGFpbg==”},“data”:“IldvcmxkIg==”}]},“workflowExecutionTimeout”:{“seconds”:“0”,“nanos”:0},“workflowRunTimeout”:{“seconds”:“0”,“nanos”:0},“workflowTaskTimeout”:{“seconds”:“10”,“nanos”:0},“continuedExecutionRunId”:“”,“initiator”:“CONTINUE_AS_NEW_INITIATOR_UNSPECIFIED”,“originalExecutionRunId”:“75d09ba5-fd7e-42c2-b993-64a988770741”,“identity”:“800@Digitals-MacBook-Pro.local”,“firstExecutionRunId”:“75d09ba5-fd7e-42c2-b993-64a988770741”,“attempt”:1,“cronSchedule”:“”,“firstWorkflowTaskBackoff”:{“seconds”:“0”,“nanos”:0},“header”:{“fields”:{}}},“attributes”:“workflowExecutionStartedEventAttributes”},{“eventId”:“2”,“eventTime”:{“seconds”:“1621600664”,“nanos”:382224558},“eventType”:“EVENT_TYPE_WORKFLOW_TASK_SCHEDULED”,“version”:“0”,“taskId”:“10485818”,“workflowTaskScheduledEventAttributes”:{“taskQueue”:{“name”:“HelloActivityTaskQueue”,“kind”:“TASK_QUEUE_KIND_NORMAL”},“startToCloseTimeout”:{“seconds”:“10”,“nanos”:0},“attempt”:1},“attributes”:“workflowTaskScheduledEventAttributes”},{“eventId”:“3”,“eventTime”:{“seconds”:“1621600664”,“nanos”:429544912},“eventType”:“EVENT_TYPE_WORKFLOW_TASK_STARTED”,“version”:“0”,“taskId”:“10485824”,“workflowTaskStartedEventAttributes”:{“scheduledEventId”:“2”,“identity”:“800@Digitals-MacBook-Pro.local”,“requestId”:“0f595c91-5edf-40ef-8b0c-c6584ccb4c3e”},“attributes”:“workflowTaskStartedEventAttributes”},{“eventId”:“4”,“eventTime”:{“seconds”:“1621600664”,“nanos”:628374141},“eventType”:“EVENT_TYPE_WORKFLOW_TASK_COMPLETED”,“version”:“0”,“taskId”:“10485827”,“workflowTaskCompletedEventAttributes”:{“scheduledEventId”:“2”,“startedEventId”:“3”,“identity”:“800@Digitals-MacBook-Pro.local”,“binaryChecksum”:“”},“attributes”:“workflowTaskCompletedEventAttributes”},{“eventId”:“5”,“eventTime”:{“seconds”:“1621600664”,“nanos”:628467531},“eventType”:“EVENT_TYPE_ACTIVITY_TASK_SCHEDULED”,“version”:“0”,“taskId”:“10485828”,“activityTaskScheduledEventAttributes”:{“activityId”:“818c7e2d-99fe-3b47-851e-57d9b74a7b24”,“activityType”:{“name”:“ComposeGreeting”},“namespace”:“”,“taskQueue”:{“name”:“HelloActivityTaskQueue”,“kind”:“TASK_QUEUE_KIND_NORMAL”},“header”:{“fields”:{}},“input”:{“payloads”:[{“metadata”:{“encoding”:“anNvbi9wbGFpbg==”},“data”:“IkhlbCI=”},{“metadata”:{“encoding”:“anNvbi9wbGFpbg==”},“data”:“IldvcmxkIg==”}]},“scheduleToCloseTimeout”:{“seconds”:“0”,“nanos”:0},“scheduleToStartTimeout”:{“seconds”:“0”,“nanos”:0},“startToCloseTimeout”:{“seconds”:“1200”,“nanos”:0},“heartbeatTimeout”:{“seconds”:“0”,“nanos”:0},“workflowTaskCompletedEventId”:“4”,“retryPolicy”:{“nonRetryableErrorTypes”:,“initialInterval”:{“seconds”:“1”,“nanos”:0},“backoffCoefficient”:2,“maximumInterval”:{“seconds”:“100”,“nanos”:0},“maximumAttempts”:0}},“attributes”:“activityTaskScheduledEventAttributes”},{“eventId”:“6”,“eventTime”:{“seconds”:“1621600933”,“nanos”:17768158},“eventType”:“EVENT_TYPE_WORKFLOW_EXECUTION_TERMINATED”,“version”:“0”,“taskId”:“11534341”,“workflowExecutionTerminatedEventAttributes”:{“reason”:“”,“identity”:“”},“attributes”:“workflowExecutionTerminatedEventAttributes”}]

Your activity is scheduled with StartToClose timeout of 20 minutes. So when the worker goes down it times out after 20 minutes and retried. So nothing is expected to happen until 20 minutes passed. If this activity is expected to complete faster reduce the timeout value to get it retried faster. If the activity duration is indeed long then specify a shorter heartbeat timeout and make the activity heartbeat periodically. This way the worker failure will be detected faster.

Okay, let me reduce the time out and check it

Thanks @maxim for your immediate response