Temporal failure: scheduled event being referred by duplicate task events with exact same timestamp

Hi Temporal Team,

We are facing a severe issue in production, and it looks like a temporal server issue:
The eventId 170 which is a scheduled task is already referred by event id 177 (task started) and 178 (task completed), but somehow it got referred again by event id 249 (task started), and this time workflow failed with error: "Failure handling event 249 of type ‘EVENT_TYPE_ACTIVITY_TASK_STARTED’ during execution. {WorkflowTaskStartedEventId=253, CurrentStartedEventId=247} ". And interesting thing is that both event 177, 249 and a few started events has the same event time “eventTime”: “2025-11-25T02:50:25.088612114Z”.

This is a new workflow, and there is no deterministic error shown. We are using temporal server version 1.27.1, sdk version 1.27.1, and GCP Kubernetes to deploy different workflow/activity workers. Cassandra in GCP as well.
Although this production issue can be fixed after resetting the activity but we do want to know why it was happening and we definitely need to prevent this from occurring.
Please help on this issue. thank you!

The workflow history logs:

{
      "eventId": "170",
      "eventTime": "2025-11-25T02:50:25.067551966Z",
      "eventType": "EVENT_TYPE_ACTIVITY_TASK_SCHEDULED",
      "taskId": "222067538",
      "activityTaskScheduledEventAttributes": {
        "activityId": "xx-xx-xx-xx-xx",
        "activityType": {
          "name": "xx-xx-xx-xx-xx"
        },
        "taskQueue": {
          "name": "xx-xx-xx-ACTIVITY",
          "kind": "TASK_QUEUE_KIND_NORMAL"
        },
        "header": {},
        "input": {
          "payloads": [
            {
              "metadata": {
                "encoding": "abcd=="
              },
              "data": "abcd"
            },
            {
              "metadata": {
                "encoding": "abcd=="
              },
              "data": "abcd"
            }
          ]
        },
        "scheduleToCloseTimeout": "31536000s",
        "scheduleToStartTimeout": "31536000s",
        "startToCloseTimeout": "120s",
        "heartbeatTimeout": "0s",
        "workflowTaskCompletedEventId": "169",
        "retryPolicy": {
          "initialInterval": "2s",
          "backoffCoefficient": 1.2,
          "maximumInterval": "300s",
          "nonRetryableErrorTypes": [
            "java.lang.IllegalArgumentException",
            "java.lang.NullPointerException",
            "javax.validation.ConstraintViolationException",
            "com.db.cashmgmt.payments.temporal.exception.ApiNonRetryableException",
            "com.db.cashmgmt.payments.temporal.exception.RepairedException"
          ]
        }
      }
    },
{
      "eventId": "177",
      "eventTime": "2025-11-25T02:50:25.088612114Z",
      "eventType": "EVENT_TYPE_ACTIVITY_TASK_STARTED",
      "taskId": "222067555",
      "activityTaskStartedEventAttributes": {
        "scheduledEventId": "170",
        "identity": "1@xx-xx-xx-activity-xx-xx",
        "requestId": "xx-xx-xx-xx-xx",
        "attempt": 1,
        "workerVersion": {}
      }
    },
    {
      "eventId": "178",
      "eventTime": "2025-11-25T02:50:25.174842970Z",
      "eventType": "EVENT_TYPE_ACTIVITY_TASK_COMPLETED",
      "taskId": "222067556",
      "activityTaskCompletedEventAttributes": {
        "result": {
          "payloads": [
            {
              "metadata": {
                "encoding": "abcd=="
              },
              "data": "abcd=="
            }
          ]
        },
        "scheduledEventId": "170",
        "startedEventId": "177",
        "identity": "xx-xx-xx-activity-xx-xx"
      }
    },

{
      "eventId": "249",
      "eventTime": "2025-11-25T02:50:25.088612114Z",
      "eventType": "EVENT_TYPE_ACTIVITY_TASK_STARTED",
      "taskId": "222074019",
      "activityTaskStartedEventAttributes": {
        "scheduledEventId": "170",
        "identity": "xx-xx-xx-activity-xx-xx",
        "requestId": "xx-xx-xx-xx-xx",
        "attempt": 1,
        "workerVersion": {}
      }
    },
{
      "eventId": "254",
      "eventTime": "2025-11-25T04:08:09.518858189Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_FAILED",
      "taskId": "222074030",
      "workflowTaskFailedEventAttributes": {
        "scheduledEventId": "252",
        "startedEventId": "253",
        "failure": {
          "message": "Failure handling event 249 of type 'EVENT_TYPE_ACTIVITY_TASK_STARTED' during execution. {WorkflowTaskStartedEventId=253, CurrentStartedEventId=247}",
          "source": "JavaSDK",
          "stackTrace": "io.temporal.internal.statemachines.WorkflowStateMachines.createEventProcessingException(WorkflowStateMachines.java:426)\nio.temporal.internal.statemachines.WorkflowStateMachines.handleEventsBatch(WorkflowStateMachines.java:334)\nio.temporal.internal.statemachines.WorkflowStateMachines.handleEvent(WorkflowStateMachines.java:293)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.applyServerHistory(ReplayWorkflowRunTaskHandler.java:249)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTaskImpl(ReplayWorkflowRunTaskHandler.java:231)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTask(ReplayWorkflowRunTaskHandler.java:165)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTaskWithQuery(ReplayWorkflowTaskHandler.java:135)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTask(ReplayWorkflowTaskHandler.java:100)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handleTask(WorkflowWorker.java:476)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:367)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:307)\nio.temporal.internal.worker.PollTaskExecutor.lambda$process$1(PollTaskExecutor.java:106)\njava.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\njava.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\njava.base/java.lang.Thread.run(Thread.java:840)\n",
          "cause": {
            "message": "Unexpected event:event_id: 249\nevent_time {\n  seconds: 1764039025\n  nanos: 88612114\n}\nevent_type: EVENT_TYPE_ACTIVITY_TASK_STARTED\ntask_id: 222074019\nactivity_task_started_event_attributes {\n  scheduled_event_id: 170\n  identity: \"xx-xx-xx-activity-xx-xx\"\n  request_id: \"xx-xx-xx-xx-xx\"\n  attempt: 1\n  worker_version {\n  }\n}\n",
            "source": "JavaSDK",
            "stackTrace": "io.temporal.internal.statemachines.WorkflowStateMachines.handleNonStatefulEvent(WorkflowStateMachines.java:767)\nio.temporal.internal.statemachines.WorkflowStateMachines.handleSingleEvent(WorkflowStateMachines.java:482)\nio.temporal.internal.statemachines.WorkflowStateMachines.handleEventsBatch(WorkflowStateMachines.java:332)\nio.temporal.internal.statemachines.WorkflowStateMachines.handleEvent(WorkflowStateMachines.java:293)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.applyServerHistory(ReplayWorkflowRunTaskHandler.java:249)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTaskImpl(ReplayWorkflowRunTaskHandler.java:231)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTask(ReplayWorkflowRunTaskHandler.java:165)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTaskWithQuery(ReplayWorkflowTaskHandler.java:135)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTask(ReplayWorkflowTaskHandler.java:100)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handleTask(WorkflowWorker.java:476)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:367)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:307)\nio.temporal.internal.worker.PollTaskExecutor.lambda$process$1(PollTaskExecutor.java:106)\njava.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\njava.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\njava.base/java.lang.Thread.run(Thread.java:840)\n",
            "applicationFailureInfo": {
              "type": "java.lang.IllegalArgumentException"
            }
          },
          "applicationFailureInfo": {
            "type": "io.temporal.internal.statemachines.InternalWorkflowTaskException"
          }
        },
        "identity": "xx-xx-xx-xx-workflow-xx-xx"
      }
    },
...
{
      "eventId": "255",
      "eventTime": "2025-11-25T02:50:25.088612114Z",
      "eventType": "EVENT_TYPE_ACTIVITY_TASK_STARTED",
      "taskId": "222076509",
      "activityTaskStartedEventAttributes": {
        "scheduledEventId": "170",
        "identity": "1@xxxxxxx-txn-merge-activity-695b848f64-8m42h",
        "requestId": "fb2ba85d-8f78-410a-af2c-e638838c72f7",
        "attempt": 1,
        "workerVersion": {}
      }
    },{
      "eventId": "260",
      "eventTime": "2025-11-25T02:50:25.088612114Z",
      "eventType": "EVENT_TYPE_ACTIVITY_TASK_STARTED",
      "taskId": "222089634",
      "activityTaskStartedEventAttributes": {
        "scheduledEventId": "170",
        "identity": "1@xxxxxxx-txn-merge-activity-695b848f64-8m42h",
        "requestId": "fb2ba85d-8f78-410a-af2c-e638838c72f7",
        "attempt": 1,
        "workerVersion": {}
      }
    },
{
      "eventId": "265",
      "eventTime": "2025-11-25T02:50:25.088612114Z",
      "eventType": "EVENT_TYPE_ACTIVITY_TASK_STARTED",
      "taskId": "222092460",
      "activityTaskStartedEventAttributes": {
        "scheduledEventId": "170",
        "identity": "1@xxxxxxx-txn-merge-activity-695b848f64-8m42h",
        "requestId": "fb2ba85d-8f78-410a-af2c-e638838c72f7",
        "attempt": 1,
        "workerVersion": {}
      }
    },

Whats Cassandra version used? Is it 5?

Hi Tihomir,

Yes, it is 5, We are using cassandra 5.0.0
And it happened on one activity, while we concurrently have other async functions running in the workflow.

Best regards,
Jx

Thanks for confirming. Temporal does not at this moment support Cassandra 5.

We do have issue Cassandra 5 support · Issue #6618 · temporalio/temporal · GitHub
which does not have a lot of specifics I know, but issues what you are describing (event resurection from dlq) has been reported by community with Cassandra 5.

Our team is working on full support for 5 but its just not there as of yet. Would strongly recommend using Cassandra 4.x instead which is supported.