Got panic [TMPRL1100] lookup failed for scheduledEventID to activityID

Hi, I use temporal to execute workflow for application of configuration to external systems. External systems broadcasts result by message broker, and my consumer send signal of response to waiting workflow by WorkflowClient.SignalWorkflow and on another side workflow receives the response by channel ReceiveChannel.ReceiveWithTimeout. Everything works fine, but yesterday I’ve got panic panic [TMPRL1100] lookup failed for scheduledEventID to activityID.
I’ve read that it happens when non-deterministic error happens, but can’t understand where it come from.
Maybe you can help me to understand

Here the part of code of wf that receives signals

updateChan := workflow.GetSignalChannel(ctx, kafka.GetSignalName(cfg.ResponseTopic, request.ID))
	for {
		var event utils.Response
		ok, _ := updateChan.ReceiveWithTimeout(ctx, cfg.ResponseTimeout, &event)
		if !ok {
			return fmt.Errorf("timeout waiting for update events from %s for requestID: %s", cfg.ExternalSystem, request.ID)
		}

		if event.Status == string(configurationModels.ConfiguratorStatusFailed) {
			return fmt.Errorf("got error from %s: requestID=%s, error=%s", cfg.ExternalSystem, event.RequestID, event.Error)
		}

		if event.Status == string(configurationModels.ConfiguratorStatusAccepted) {
			continue
		}

		if event.Status == string(configurationModels.ConfiguratorStatusInProgress) {

			if err := activities.PublishKafkaMessage(ctx, request, configurationModels.ConfigurationStatusInProgress, "", updateTopic); err != nil {
				return err
			}

			continue
		}

		if event.Status == string(configurationModels.ConfiguratorStatusCompleted) {
			break
		}

		logger.Errorf("unknown status from %s: requestID=%s, status=%s", cfg.ExternalSystem, event.RequestID, event.Status)
	}

The trace of panic

{
  "message": "[TMPRL1100] lookup failed for scheduledEventID to activityID: scheduleEventID: 61, activityID: 61",
  "source": "GoSDK",
[panic]:\ngo.temporal.io/sdk/internal.panicIllegalState(...)\n\t/go/pkg/mod/go.temporal.io/sdk@v1.27.0/internal/internal_command_state_machine.go:440\ngo.temporal.io/sdk/internal.(*commandsHelper).handleActivityTaskScheduled(0xc000d6b240, {0xc0005ef5d8, 0x2}, 0x5?)\n\t/go/pkg/mod/go.temporal.io/sdk@v1.27.0/internal/internal_command_state_machine.go:998 +0xfb\ngo.temporal.io/sdk/internal.(*workflowExecutionEventHandlerImpl).ProcessEvent(0xc000a4d200, 0xc0002d9110, 0x3?, 0x0)\n\t/go/pkg/mod/go.temporal.io/sdk@v1.27.0/internal/internal_event_handlers.go:1152 +0x290\ngo.temporal.io/sdk/internal.(*workflowExecutionContextImpl).ProcessWorkflowTask(0xc001080b40, 0xc000d6e1e0)\n\t/go/pkg/mod/go.temporal.io/sdk@v1.27.0/internal/internal_task_handlers.go:1149 +0x193d\ngo.temporal.io/sdk/internal.(*workflowTaskHandlerImpl).ProcessWorkflowTask(0xc0002c4680, 0xc000d6e1e0, 0xc001080b40, 0xc000d6eb10)\n\t/go/pkg/mod/go.temporal.io/sdk@v1.27.0/internal/internal_task_handlers.go:914 +0x3c5\ngo.temporal.io/sdk/internal.(*workflowTaskPoller).processWorkflowTask(0xc00057d560, 0xc000d6e1e0)\n\t/go/pkg/mod/go.temporal.io/sdk@v1.27.0/internal/internal_task_pollers.go:357 +0x3a3\ngo.temporal.io/sdk/internal.(*workflowTaskPoller).ProcessTask(0xc00057d560, {0x1237920?, 0xc000d6e1e0?})\n\t/go/pkg/mod/go.temporal.io/sdk@v1.27.0/internal/internal_task_pollers.go:321 +0x87\ngo.temporal.io/sdk/internal.(*baseWorker).processTask(0xc000668420, {0x12384e0?, 0xc000d5c850})\n\t/go/pkg/mod/go.temporal.io/sdk@v1.27.0/internal/internal_worker_base.go:518 +0x15b\ngo.temporal.io/sdk/internal.(*baseWorker).processTaskAsync.func1()\n\t/go/pkg/mod/go.temporal.io/sdk@v1.27.0/internal/internal_worker_base.go:369 +0x45\ncreated by go.temporal.io/sdk/internal.(*baseWorker).processTaskAsync in goroutine 132\n\t/go/pkg/mod/go.temporal.io/sdk@v1.27.0/internal/internal_worker_base.go:365 +0xb8",
  "encodedAttributes": null,
  "cause": null,
  "applicationFailureInfo": {
    "type": "PanicError",
    "nonRetryable": true,
    "details": null
  }
}

Here the history of execution of failed workflow
Server version: v1.20.3
SDK version: v1.27.0

1 Like

The issue can be with a different part of the workflow code.

But when I replayed it from history I’ve got panic exactly from this place

1 Like

It means that the first time discrepancy was detected at this place. But the source of dependency could be somewhere else.

1 Like

I’ve noticed that panic happens when the workflow hit WorkflowTaskScheduled timeout 10s.
But can’t understand why the timeout reached

By the way, in my case temporal deployed as standalone on kubernetes

When this errors happens I see kind of errors in temporal-frontend and temporal-backend logs

{
    "level": "error",
    "ts": "2024-11-13T06:48:16.937Z",
    "msg": "Fail to process task",
    "shard-id": 301,
    "address": "10.42.52.104:7234",
    "component": "transfer-queue-processor",
    "wf-namespace-id": "cef03ff3-bea6-4c6b-97cd-522ddafe1a05",
    "wf-id": "bc7ff9a1-c586-4d91-954d-c975fa581fff",
    "wf-run-id": "dcefb334-17c5-4904-a459-4a7b9f73210e",
    "queue-task-id": 678428924,
    "queue-task-visibility-timestamp": "2024-11-13T06:48:14.855Z",
    "queue-task-type": "TransferWorkflowTask",
    "queue-task": {
        "NamespaceID": "cef03ff3-bea6-4c6b-97cd-522ddafe1a05",
        "WorkflowID": "bc7ff9a1-c586-4d91-954d-c975fa581fff",
        "RunID": "dcefb334-17c5-4904-a459-4a7b9f73210e",
        "VisibilityTimestamp": "2024-11-13T06:48:14.855741595Z",
        "TaskID": 678428924,
        "TaskQueue": "exposure-service-individual-worker-648b58cdc-sjdvx:a6dc0b6d-e2e4-4e30-93c4-8217cdc04402",
        "ScheduledEventID": 76,
        "Version": 0
    },
    "wf-history-event-id": 76,
    "error": "context deadline exceeded",
    "lifecycle": "ProcessingFailed",
    "logging-call-at": "lazy_logger.go:68",
    "stacktrace": "go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:150\ngo.temporal.io/server/common/log.(*lazyLogger).Error\n\t/home/builder/temporal/common/log/lazy_logger.go:68\ngo.temporal.io/server/service/history/queues.(*executableImpl).HandleErr\n\t/home/builder/temporal/service/history/queues/executable.go:320\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask.func1\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:232\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/builder/temporal/common/backoff/retry.go:175\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/builder/temporal/common/backoff/retry.go:199\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/builder/temporal/common/backoff/retry.go:176\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:241\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:217"
}
{
    "level": "info",
    "ts": "2024-11-13T06:48:25.868Z",
    "msg": "Non-Deterministic Error",
    "service": "frontend",
    "wf-namespace-id": "cef03ff3-bea6-4c6b-97cd-522ddafe1a05",
    "wf-id": "bc7ff9a1-c586-4d91-954d-c975fa581fff",
    "wf-run-id": "dcefb334-17c5-4904-a459-4a7b9f73210e",
    "logging-call-at": "workflow_handler.go:1059"
}