Hi,
Recently our DevOps team has been updating infrastructure (terminating nodes), and the temporal worker was being restarted/redeployed as well. One workflow has failed with
Caused By: java.lang.RuntimeException: WorkflowTask: failure executing SCHEDULED- WORKFLOW_TASK_STARTED, transition history is [CREATED->WORKFLOW_TASK_SCHEDULED]
Potential deadlock detected: workflow thread “workflow-root” didn’t yield control for over a second.
If I understand the flow correctly,
- Temporal has scheduled a task
- the worker took it from a task queue but had no enough time to execute any activity
- the worker went down due to an infrastructure event
- Temporal saw a timeout, recognized that as a deadlock, marked the workflow as failed, and did not retry it because of RetryPolicyNotSet
I’m looking for a way to configure retries only on DevOps-related occasions. For example, it would be great to retry the workflow when the worker was not able to start executing the workflow (e.g. because it has been redeployed right at this time), without having retries when some activity or workflow business logic fails.
Here is a screenshot and exception of this case:
Event #5 WorkflowExecutionFailed - failure
io.temporal.internal.replay.InternalWorkflowTaskException: Failure handling event 3 of ‘EVENT_TYPE_WORKFLOW_TASK_STARTED’ type. IsReplaying=false, PreviousStartedEventId=3, workflowTaskStartedEventId=3, Currently Processing StartedEventId=3
io.temporal.internal.statemachines.WorkflowStateMachines.createEventProcessingException(WorkflowStateMachines.java:221)
io.temporal.internal.statemachines.WorkflowStateMachines.handleEventsBatch(WorkflowStateMachines.java:201)
io.temporal.internal.statemachines.WorkflowStateMachines.handleEvent(WorkflowStateMachines.java:175)
io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTaskImpl(ReplayWorkflowRunTaskHandler.java:177)
io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTask(ReplayWorkflowRunTaskHandler.java:146)
io.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTaskWithEmbeddedQuery(ReplayWorkflowTaskHandler.java:201)
io.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTask(ReplayWorkflowTaskHandler.java:114)
io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:319)
io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:279)
io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:73)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
java.base/java.lang.Thread.run(Thread.java:833)Caused By: java.lang.RuntimeException: WorkflowTask: failure executing SCHEDULED->WORKFLOW_TASK_STARTED, transition history is [CREATED->WORKFLOW_TASK_SCHEDULED]
io.temporal.internal.statemachines.StateMachine.executeTransition(StateMachine.java:151)
io.temporal.internal.statemachines.StateMachine.handleHistoryEvent(StateMachine.java:101)
io.temporal.internal.statemachines.EntityStateMachineBase.handleEvent(EntityStateMachineBase.java:67)
io.temporal.internal.statemachines.WorkflowStateMachines.handleSingleEvent(WorkflowStateMachines.java:233)
io.temporal.internal.statemachines.WorkflowStateMachines.handleEventsBatch(WorkflowStateMachines.java:199)
io.temporal.internal.statemachines.WorkflowStateMachines.handleEvent(WorkflowStateMachines.java:175)
io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTaskImpl(ReplayWorkflowRunTaskHandler.java:177)
io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTask(ReplayWorkflowRunTaskHandler.java:146)
io.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTaskWithEmbeddedQuery(ReplayWorkflowTaskHandler.java:201)
io.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTask(ReplayWorkflowTaskHandler.java:114)
io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:319)
io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:279)
io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:73)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
java.base/java.lang.Thread.run(Thread.java:833)Caused By: io.temporal.internal.sync.PotentialDeadlockException: Potential deadlock detected: workflow thread “workflow-root” didn’t yield control for over a second. Other workflow threads:
java.base@17.0.5/jdk.internal.misc.Unsafe.park(Native Method)
java.base@17.0.5/java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)
java.base@17.0.5/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:715)
java.base@17.0.5/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:938)
java.base@17.0.5/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
java.base@17.0.5/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
io.temporal.internal.sync.WorkflowThreadContext.setStatus(WorkflowThreadContext.java:172)
io.temporal.internal.sync.WorkflowThreadImpl$RunnableWrapper.run(WorkflowThreadImpl.java:128)
java.base@17.0.5/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
java.base@17.0.5/java.util.concurrent.FutureTask.run(FutureTask.java:264)
java.base@17.0.5/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
java.base@17.0.5/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
java.base@17.0.5/java.lang.Thread.run(Thread.java:833)