History is out of order error

Hi,

I have multiple workflows running on production. Few of my workflows are failing with below error.
Any pointers to debug?

{
“message”: “History is out of order. There is a gap between the last event workflow state machine observed and currently handling event. Last processed eventId: 52, handling eventId: 107”,
“source”: “JavaSDK”,
“stackTrace”: “com.google.common.base.Preconditions.checkState(Preconditions.java:749)\nio.temporal.internal.statemachines.WorkflowStateMachines.handleEvent(WorkflowStateMachines.java:197)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.applyServerHistory(ReplayWorkflowRunTaskHandler.java:208)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTaskImpl(ReplayWorkflowRunTaskHandler.java:192)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTask(ReplayWorkflowRunTaskHandler.java:147)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTaskWithQuery(ReplayWorkflowTaskHandler.java:132)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTask(ReplayWorkflowTaskHandler.java:97)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handleTask(WorkflowWorker.java:336)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:246)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:188)\nio.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:93)\njava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\njava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\njava.lang.Thread.run(Thread.java:750)\n”,
“encodedAttributes”: null,
“cause”: null,
“applicationFailureInfo”: {
“type”: “java.lang.IllegalStateException”,
“nonRetryable”: false,
“details”: null
}
}

Would you check if these workflows are indeed failing or if this is just an intermittent message a worker logged?

yes workflow failed

Did you configure the workflow to fail on any unexpected exception?

no @maxim thats my confusion as well why did temporal retry , default retry is 48 hours right

What do you mean by “did temporal retry”? You said that the workflow had failed.