Deadlock for no obvious reason

Hello,

I’m looking for effective ways to debug deadlock issues. This morning, I encountered a deadlock in a workflow that occurred before any real code had run. I’m not using CPU-heavy or non-deterministic code when this happens—it deadlocks right before the first activity execution.

All I have is a log message from the workflow. The pod wasn’t under heavy load; this is a scheduled workflow that runs once every 10 minutes.

The only thing we can take a wild guess at is that we had a lot of TCP-resets on one of our kubernetes nodes. I don’t know if the deadlock occurs because of the pod losing connection to temporal maybe?

Has anyone experienced this before, or do you have tips for debugging such early-stage deadlocks?

[WorkflowRun]
public async Task Run(FormsReceiptWorkflowArguments arguments)
{
    Workflow.Logger.LogInformation("Checking if there are any new forms responses to handle");
    Payload<List<FormResponsesAttribute>>? responses = await Workflow.ExecuteActivityAsync((FormsReceiptActivities a) => a.GetDocumentsWithPendingResponses(arguments.PackageKey), DefaultOptions);
    if (responses != null)
    {
        await CreateReceipts(arguments.Directory, responses);
    }
    Workflow.Logger.LogInformation("Finished receipt processing of forms responses");
}
{
  "events": [
    {
      "eventId": "1",
      "eventTime": "2026-02-05T05:40:47.935994473Z",
      "eventType": "EVENT_TYPE_WORKFLOW_EXECUTION_STARTED",
      "taskId": "80742225",
      "workflowExecutionStartedEventAttributes": {
        "workflowType": {
          "name": "customer-xyz-forms-receipt-workflow"
        },
        "taskQueue": {
          "name": "customer-xyz",
          "kind": "TASK_QUEUE_KIND_NORMAL"
        },
        "input": {
          "payloads": [
            {
              "metadata": {
                "encoding": "json/plain"
              },
              "data": {
                "PackageKey": {
                  "Value": "019c2b76-685d-73d4-8d26-4e985f727c34"
                },
                "Directory": "/data/customers/019aca94-339d-76b5-adc0-f5dbebc08cad/019c2b76-685d-73d4-8d26-4e985f727c34/formsResponses"
              }
            }
          ]
        },
        "workflowTaskTimeout": "10s",
        "lastCompletionResult": {
          "payloads": [
            {
              "metadata": {
                "encoding": "json/plain"
              },
              "data": {}
            }
          ]
        },
        "originalExecutionRunId": "019c2c51-3ebf-7f2a-a011-ed14fe4028ac",
        "identity": "temporal-scheduler-production-customer-xyz-forms-receipt-019c2b76-685d-73d4-8d26-4e985f727c34",
        "firstExecutionRunId": "019c2c51-3ebf-7f2a-a011-ed14fe4028ac",
        "attempt": 1,
        "firstWorkflowTaskBackoff": "0s",
        "searchAttributes": {
          "indexedFields": {
            "TemporalScheduledById": {
              "metadata": {
                "encoding": "json/plain",
                "type": "Keyword"
              },
              "data": "customer-xyz-forms-receipt-019c2b76-685d-73d4-8d26-4e985f727c34"
            },
            "TemporalScheduledStartTime": {
              "metadata": {
                "encoding": "json/plain",
                "type": "Datetime"
              },
              "data": "2026-02-05T05:40:00Z"
            }
          }
        },
        "workflowId": "customer-xyz-forms-receipt-019c2b76-685d-73d4-8d26-4e985f727c34-2026-02-05T05:40:00Z"
      }
    },
    {
      "eventId": "2",
      "eventTime": "2026-02-05T05:40:47.936098731Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_SCHEDULED",
      "taskId": "80742226",
      "workflowTaskScheduledEventAttributes": {
        "taskQueue": {
          "name": "customer-xyz",
          "kind": "TASK_QUEUE_KIND_NORMAL"
        },
        "startToCloseTimeout": "10s",
        "attempt": 1
      }
    },
    {
      "eventId": "3",
      "eventTime": "2026-02-05T05:40:47.952091263Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_STARTED",
      "taskId": "80742231",
      "workflowTaskStartedEventAttributes": {
        "scheduledEventId": "2",
        "identity": "1@customer-xyz-deployment-597b5d74d6-n5ndc",
        "requestId": "0c9b3b44-7802-4e89-8ebc-80d705c68766",
        "historySizeBytes": "879",
        "workerVersion": {
          "buildId": "0f2bb25d-407e-432a-b268-d3450e1c291f"
        }
      }
    },
    {
      "eventId": "4",
      "eventTime": "2026-02-05T05:40:57.955572429Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_TIMED_OUT",
      "taskId": "80742235",
      "workflowTaskTimedOutEventAttributes": {
        "scheduledEventId": "2",
        "startedEventId": "3",
        "timeoutType": "TIMEOUT_TYPE_START_TO_CLOSE"
      }
    },
    {
      "eventId": "5",
      "eventTime": "2026-02-05T05:40:57.955578162Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_SCHEDULED",
      "taskId": "80742240",
      "workflowTaskScheduledEventAttributes": {
        "taskQueue": {
          "name": "customer-xyz",
          "kind": "TASK_QUEUE_KIND_NORMAL"
        },
        "startToCloseTimeout": "10s",
        "attempt": 2
      }
    },
    {
      "eventId": "6",
      "eventTime": "2026-02-05T05:40:57.964060578Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_STARTED",
      "taskId": "80742241",
      "workflowTaskStartedEventAttributes": {
        "scheduledEventId": "5",
        "identity": "1@customer-xyz-deployment-597b5d74d6-429hd",
        "requestId": "c82dbef6-875a-4fc4-a2a0-01fd468f23df",
        "historySizeBytes": "1072",
        "workerVersion": {
          "buildId": "0f2bb25d-407e-432a-b268-d3450e1c291f"
        }
      }
    },
    {
      "eventId": "7",
      "eventTime": "2026-02-05T05:40:57.973612461Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_COMPLETED",
      "taskId": "80742242",
      "workflowTaskCompletedEventAttributes": {
        "scheduledEventId": "5",
        "startedEventId": "6",
        "identity": "1@customer-xyz-deployment-597b5d74d6-429hd",
        "workerVersion": {
          "buildId": "0f2bb25d-407e-432a-b268-d3450e1c291f"
        },
        "sdkMetadata": {
          "coreUsedFlags": [
            3,
            2,
            1
          ],
          "langUsedFlags": [
            2
          ],
          "sdkName": "temporal-dotnet",
          "sdkVersion": "1.9.0.0"
        },
        "meteringMetadata": {}
      }
    },
    {
      "eventId": "8",
      "eventTime": "2026-02-05T05:40:57.973732832Z",
      "eventType": "EVENT_TYPE_ACTIVITY_TASK_SCHEDULED",
      "taskId": "80742243",
      "activityTaskScheduledEventAttributes": {
        "activityId": "1",
        "activityType": {
          "name": "GetDocumentsWithPendingResponses"
        },
        "taskQueue": {
          "name": "customer-xyz",
          "kind": "TASK_QUEUE_KIND_NORMAL"
        },
        "header": {
          "fields": {
            "_tracer-data": {
              "metadata": {
                "encoding": "json/plain"
              },
              "data": {
                "traceparent": "00-da80c9d5af390e77740923e41790f181-87e5b83d2741e57b-01"
              }
            }
          }
        },
        "input": {
          "payloads": [
            {
              "metadata": {
                "encoding": "json/plain"
              },
              "data": {
                "Value": "019c2b76-685d-73d4-8d26-4e985f727c34"
              }
            }
          ]
        },
        "scheduleToCloseTimeout": "0s",
        "scheduleToStartTimeout": "0s",
        "startToCloseTimeout": "90s",
        "heartbeatTimeout": "0s",
        "workflowTaskCompletedEventId": "7",
        "retryPolicy": {
          "initialInterval": "10s",
          "backoffCoefficient": 2,
          "maximumInterval": "1000s"
        },
        "useWorkflowBuildId": true
      }
    },
    {
      "eventId": "9",
      "eventTime": "2026-02-05T05:40:57.983388627Z",
      "eventType": "EVENT_TYPE_ACTIVITY_TASK_STARTED",
      "taskId": "80742250",
      "activityTaskStartedEventAttributes": {
        "scheduledEventId": "8",
        "identity": "1@customer-xyz-deployment-597b5d74d6-429hd",
        "requestId": "215331fa-de43-400b-97f0-48675b2f4d96",
        "attempt": 1,
        "workerVersion": {
          "buildId": "0f2bb25d-407e-432a-b268-d3450e1c291f"
        }
      }
    },
    {
      "eventId": "10",
      "eventTime": "2026-02-05T05:41:14.265036647Z",
      "eventType": "EVENT_TYPE_ACTIVITY_TASK_COMPLETED",
      "taskId": "80742251",
      "activityTaskCompletedEventAttributes": {
        "result": {
          "payloads": [
            {
              "metadata": {
                "encoding": "binary/null"
              },
              "data": null
            }
          ]
        },
        "scheduledEventId": "8",
        "startedEventId": "9",
        "identity": "1@customer-xyz-deployment-597b5d74d6-429hd"
      }
    },
    {
      "eventId": "11",
      "eventTime": "2026-02-05T05:41:14.265043387Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_SCHEDULED",
      "taskId": "80742252",
      "workflowTaskScheduledEventAttributes": {
        "taskQueue": {
          "name": "1@customer-xyz-deployment-597b5d74d6-429hd-dddb6ce891a8406a8bd95b0d7dbe7a17",
          "kind": "TASK_QUEUE_KIND_STICKY",
          "normalName": "customer-xyz"
        },
        "startToCloseTimeout": "10s",
        "attempt": 1
      }
    },
    {
      "eventId": "12",
      "eventTime": "2026-02-05T05:41:14.274525846Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_STARTED",
      "taskId": "80742256",
      "workflowTaskStartedEventAttributes": {
        "scheduledEventId": "11",
        "identity": "1@customer-xyz-deployment-597b5d74d6-429hd",
        "requestId": "8d83238d-af63-4369-923d-039d9657a314",
        "historySizeBytes": "2171",
        "workerVersion": {
          "buildId": "0f2bb25d-407e-432a-b268-d3450e1c291f"
        }
      }
    },
    {
      "eventId": "13",
      "eventTime": "2026-02-05T05:41:14.284294903Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_COMPLETED",
      "taskId": "80742260",
      "workflowTaskCompletedEventAttributes": {
        "scheduledEventId": "11",
        "startedEventId": "12",
        "identity": "1@customer-xyz-deployment-597b5d74d6-429hd",
        "workerVersion": {
          "buildId": "0f2bb25d-407e-432a-b268-d3450e1c291f"
        },
        "sdkMetadata": {},
        "meteringMetadata": {}
      }
    },
    {
      "eventId": "14",
      "eventTime": "2026-02-05T05:41:14.284332485Z",
      "eventType": "EVENT_TYPE_WORKFLOW_EXECUTION_COMPLETED",
      "taskId": "80742261",
      "workflowExecutionCompletedEventAttributes": {
        "result": {
          "payloads": [
            {
              "metadata": {
                "encoding": "json/plain"
              },
              "data": {}
            }
          ]
        },
        "workflowTaskCompletedEventId": "13"
      }
    }
  ]
}

The history is showing this isn’t what we traditionally call a “deadlock” (where the workflow doesn’t reach a yield point in 2s), but rather a task timeout. The task timeout defaults at 10s and does mean for some reason the task is received by the worker but isn’t being responded to within that 10s. Usually this takes only milliseconds, so something did cause the worker to not respond in this case.

The server did retry the task as it is expected to and it seemed to proceed as normal, which means this may have been a transient issue. The task that did proceed was picked up by another worker (worker 1@customer-xyz-deployment-597b5d74d6-n5ndc is the one that had the timeout, 1@customer-xyz-deployment-597b5d74d6-429hd is worker that ran the code successfully and continued to do so for successive tasks). A task failure is definitely meant to be recovered by the system, as it was here.

Are there are any logs? Were there any restarts or system issues at the time? Are there any customizations via interceptors, logging adapters, payload converters, payload codecs, or similar that may be out of band and causing some issue? Since this is transient, I assume there may be no easy/obvious way to replicate?

Here are some Elastic logs that I hope are readable. It looks like the workflow was started on two workers. There were no restarts at the time, but we are investigating some other issues on our nodes currently.

[
    {
        "_index": ".ds-logs-apm.app.Customer_Xyz-devprod-2026.01.14-000008",
        "_id": "7cNRLJwBqb_djJBbskpV",
        "_version": 1,
        "_score": null,
        "fields": {
            "labels.WorkflowType": [
                "customer-xyz-forms-receipt-workflow"
            ],
            "labels.FirstExecutionRunId": [
                "019c2c51-3ebf-7f2a-a011-ed14fe4028ac"
            ],
            "host.name.text": [
                "customer-xyz-deployment-597b5d74d6-429hd"
            ],
            "service.node.name": [
                "1cb9538b-699b-44ed-8a0c-13ca8254052f"
            ],
            "host.hostname": [
                "customer-xyz-deployment-597b5d74d6-429hd"
            ],
            "service.language.name": [
                "dotnet"
            ],
            "container.id": [
                "4f9a73bd966d1c3b50b8a3c4f34951395a2d77d5a0ea45f7e08c27b263eab397"
            ],
            "service.node.name.text": [
                "1cb9538b-699b-44ed-8a0c-13ca8254052f"
            ],
            "agent.name.text": [
                "opentelemetry/dotnet"
            ],
            "labels.WorkflowId": [
                "customer-xyz-forms-receipt-019c2b76-685d-73d4-8d26-4e985f727c34-2026-02-05T05:40:00Z"
            ],
            "log.level": [
                "Information"
            ],
            "agent.name": [
                "opentelemetry/dotnet"
            ],
            "host.name": [
                "customer-xyz-deployment-597b5d74d6-429hd"
            ],
            "event.severity": [
                9
            ],
            "service.environment": [
                "Production"
            ],
            "numeric_labels.Attempt": [
                1
            ],
            "service.name": [
                "Customer_Xyz"
            ],
            "service.framework.name": [
                "Temporalio.Workflow:customer-xyz-forms-receipt-workflow"
            ],
            "data_stream.namespace": [
                "devprod"
            ],
            "labels.TaskQueue": [
                "customer-xyz"
            ],
            "service.language.name.text": [
                "dotnet"
            ],
            "labels.RunId": [
                "019c2c51-3ebf-7f2a-a011-ed14fe4028ac"
            ],
            "message": [
                "Finished receipt processing of forms responses"
            ],
            "data_stream.type": [
                "logs"
            ],
            "observer.hostname": [
                "edge-apm-devprod"
            ],
            "service.framework.name.text": [
                "Temporalio.Workflow:customer-xyz-forms-receipt-workflow"
            ],
            "@timestamp": [
                "2026-02-05T05:41:14.283Z"
            ],
            "observer.type": [
                "apm-server"
            ],
            "observer.version": [
                "9.2.4"
            ],
            "service.name.text": [
                "Customer_Xyz"
            ],
            "data_stream.dataset": [
                "apm.app.Customer_Xyz"
            ],
            "labels.Namespace": [
                "production"
            ],
            "agent.version": [
                "1.14.0"
            ],
            "event.dataset": [
                "apm.app.Customer_Xyz"
            ],
            "labels.{OriginalFormat}": [
                "Finished receipt processing of forms responses"
            ]
        },
        "highlight": {
            "labels.RunId": [
                "@kibana-highlighted-field@019c2c51-3ebf-7f2a-a011-ed14fe4028ac@/kibana-highlighted-field@"
            ]
        },
        "sort": [
            "2026-02-05T05:41:14.283Z",
            4234
        ]
    },
    {
        "_index": ".ds-logs-apm.error-devprod-2026.01.14-000014",
        "_id": "9mRSLJwBC1mF5u7OlcdF",
        "_version": 1,
        "_score": null,
        "fields": {
            "error.grouping_key": [
                "bc8b1c3a150a08ab"
            ],
            "error.exception.type": [
                "InvalidOperationException"
            ],
            "host.name.text": [
                "customer-xyz-deployment-597b5d74d6-n5ndc"
            ],
            "service.node.name": [
                "c82e3865-3746-4fe3-8315-1f1ea5004335"
            ],
            "host.hostname": [
                "customer-xyz-deployment-597b5d74d6-n5ndc"
            ],
            "service.language.name": [
                "dotnet"
            ],
            "container.id": [
                "b4795a57af0ff70399f7e9e174b2b96087816c76a8b5d42fc663e5071801e9ef"
            ],
            "service.node.name.text": [
                "c82e3865-3746-4fe3-8315-1f1ea5004335"
            ],
            "agent.name.text": [
                "opentelemetry/dotnet"
            ],
            "processor.event": [
                "error"
            ],
            "log.level": [
                "Error"
            ],
            "agent.name": [
                "opentelemetry/dotnet"
            ],
            "host.name": [
                "customer-xyz-deployment-597b5d74d6-n5ndc"
            ],
            "event.kind": [
                "event"
            ],
            "event.severity": [
                17
            ],
            "service.environment": [
                "Production"
            ],
            "service.name": [
                "Customer_Xyz"
            ],
            "service.framework.name": [
                "Temporalio.Worker.WorkflowWorker"
            ],
            "data_stream.namespace": [
                "devprod"
            ],
            "labels.TaskQueue": [
                "customer-xyz"
            ],
            "service.language.name.text": [
                "dotnet"
            ],
            "error.exception.handled": [
                true
            ],
            "labels.RunId": [
                "019c2c51-3ebf-7f2a-a011-ed14fe4028ac"
            ],
            "message": [
                "Failed handling activation on workflow with run ID 019c2c51-3ebf-7f2a-a011-ed14fe4028ac"
            ],
            "data_stream.type": [
                "logs"
            ],
            "observer.hostname": [
                "edge-apm-devprod"
            ],
            "service.framework.name.text": [
                "Temporalio.Worker.WorkflowWorker"
            ],
            "timestamp.us": [
                1770270073004577
            ],
            "@timestamp": [
                "2026-02-05T05:41:13.004Z"
            ],
            "observer.type": [
                "apm-server"
            ],
            "observer.version": [
                "9.2.4"
            ],
            "service.name.text": [
                "Customer_Xyz"
            ],
            "data_stream.dataset": [
                "apm.error"
            ],
            "error.id": [
                "cbcb5ce365f62be1ef48ac3affe4e483"
            ],
            "event.type": [
                "error"
            ],
            "agent.version": [
                "1.14.0"
            ],
            "error.grouping_name": [
                "[TMPRL1101] Workflow with ID 019c2c51-3ebf-7f2a-a011-ed14fe4028ac deadlocked after 00:00:02"
            ],
            "error.stack_trace": [
                "System.InvalidOperationException: [TMPRL1101] Workflow with ID 019c2c51-3ebf-7f2a-a011-ed14fe4028ac deadlocked after 00:00:02\n   at Temporalio.Worker.WorkflowWorker.HandleActivationAsync(WorkflowActivation act)"
            ],
            "error.stack_trace.text": [
                "System.InvalidOperationException: [TMPRL1101] Workflow with ID 019c2c51-3ebf-7f2a-a011-ed14fe4028ac deadlocked after 00:00:02\n   at Temporalio.Worker.WorkflowWorker.HandleActivationAsync(WorkflowActivation act)"
            ],
            "event.dataset": [
                "apm.error"
            ],
            "error.exception.message": [
                "[TMPRL1101] Workflow with ID 019c2c51-3ebf-7f2a-a011-ed14fe4028ac deadlocked after 00:00:02"
            ],
            "labels.{OriginalFormat}": [
                "Failed handling activation on workflow with run ID {RunId}"
            ]
        },
        "highlight": {
            "labels.RunId": [
                "@kibana-highlighted-field@019c2c51-3ebf-7f2a-a011-ed14fe4028ac@/kibana-highlighted-field@"
            ]
        },
        "sort": [
            "2026-02-05T05:41:13.004Z",
            736
        ]
    },
    {
        "_index": ".ds-logs-apm.app.Customer_Xyz-devprod-2026.01.14-000008",
        "_id": "EGRRLJwBC1mF5u7Opakt",
        "_version": 1,
        "_score": null,
        "fields": {
            "labels.WorkflowType": [
                "customer-xyz-forms-receipt-workflow"
            ],
            "labels.FirstExecutionRunId": [
                "019c2c51-3ebf-7f2a-a011-ed14fe4028ac"
            ],
            "host.name.text": [
                "customer-xyz-deployment-597b5d74d6-n5ndc"
            ],
            "service.node.name": [
                "c82e3865-3746-4fe3-8315-1f1ea5004335"
            ],
            "host.hostname": [
                "customer-xyz-deployment-597b5d74d6-n5ndc"
            ],
            "service.language.name": [
                "dotnet"
            ],
            "container.id": [
                "b4795a57af0ff70399f7e9e174b2b96087816c76a8b5d42fc663e5071801e9ef"
            ],
            "service.node.name.text": [
                "c82e3865-3746-4fe3-8315-1f1ea5004335"
            ],
            "agent.name.text": [
                "opentelemetry/dotnet"
            ],
            "labels.WorkflowId": [
                "customer-xyz-forms-receipt-019c2b76-685d-73d4-8d26-4e985f727c34-2026-02-05T05:40:00Z"
            ],
            "log.level": [
                "Information"
            ],
            "agent.name": [
                "opentelemetry/dotnet"
            ],
            "host.name": [
                "customer-xyz-deployment-597b5d74d6-n5ndc"
            ],
            "event.severity": [
                9
            ],
            "service.environment": [
                "Production"
            ],
            "numeric_labels.Attempt": [
                1
            ],
            "service.name": [
                "Customer_Xyz"
            ],
            "service.framework.name": [
                "Temporalio.Workflow:customer-xyz-forms-receipt-workflow"
            ],
            "data_stream.namespace": [
                "devprod"
            ],
            "labels.TaskQueue": [
                "customer-xyz"
            ],
            "service.language.name.text": [
                "dotnet"
            ],
            "labels.RunId": [
                "019c2c51-3ebf-7f2a-a011-ed14fe4028ac"
            ],
            "message": [
                "Checking if there are any new forms responses to handle"
            ],
            "data_stream.type": [
                "logs"
            ],
            "observer.hostname": [
                "edge-apm-devprod"
            ],
            "service.framework.name.text": [
                "Temporalio.Workflow:customer-xyz-forms-receipt-workflow"
            ],
            "@timestamp": [
                "2026-02-05T05:41:10.640Z"
            ],
            "observer.type": [
                "apm-server"
            ],
            "observer.version": [
                "9.2.4"
            ],
            "service.name.text": [
                "Customer_Xyz"
            ],
            "data_stream.dataset": [
                "apm.app.Customer_Xyz"
            ],
            "labels.Namespace": [
                "production"
            ],
            "agent.version": [
                "1.14.0"
            ],
            "event.dataset": [
                "apm.app.Customer_Xyz"
            ],
            "labels.{OriginalFormat}": [
                "Checking if there are any new forms responses to handle"
            ]
        },
        "highlight": {
            "labels.RunId": [
                "@kibana-highlighted-field@019c2c51-3ebf-7f2a-a011-ed14fe4028ac@/kibana-highlighted-field@"
            ]
        },
        "sort": [
            "2026-02-05T05:41:10.640Z",
            4241
        ]
    },
    {
        "_index": ".ds-logs-apm.app.Customer_Xyz-devprod-2026.01.14-000008",
        "_id": "y3BRLJwB2qJcSzX_dyR7",
        "_version": 1,
        "_score": null,
        "fields": {
            "labels.WorkflowType": [
                "customer-xyz-forms-receipt-workflow"
            ],
            "labels.FirstExecutionRunId": [
                "019c2c51-3ebf-7f2a-a011-ed14fe4028ac"
            ],
            "host.name.text": [
                "customer-xyz-deployment-597b5d74d6-429hd"
            ],
            "service.node.name": [
                "1cb9538b-699b-44ed-8a0c-13ca8254052f"
            ],
            "host.hostname": [
                "customer-xyz-deployment-597b5d74d6-429hd"
            ],
            "service.language.name": [
                "dotnet"
            ],
            "container.id": [
                "4f9a73bd966d1c3b50b8a3c4f34951395a2d77d5a0ea45f7e08c27b263eab397"
            ],
            "service.node.name.text": [
                "1cb9538b-699b-44ed-8a0c-13ca8254052f"
            ],
            "agent.name.text": [
                "opentelemetry/dotnet"
            ],
            "labels.WorkflowId": [
                "customer-xyz-forms-receipt-019c2b76-685d-73d4-8d26-4e985f727c34-2026-02-05T05:40:00Z"
            ],
            "log.level": [
                "Information"
            ],
            "agent.name": [
                "opentelemetry/dotnet"
            ],
            "host.name": [
                "customer-xyz-deployment-597b5d74d6-429hd"
            ],
            "event.severity": [
                9
            ],
            "service.environment": [
                "Production"
            ],
            "numeric_labels.Attempt": [
                1
            ],
            "service.name": [
                "Customer_Xyz"
            ],
            "service.framework.name": [
                "Temporalio.Workflow:customer-xyz-forms-receipt-workflow"
            ],
            "data_stream.namespace": [
                "devprod"
            ],
            "labels.TaskQueue": [
                "customer-xyz"
            ],
            "service.language.name.text": [
                "dotnet"
            ],
            "labels.RunId": [
                "019c2c51-3ebf-7f2a-a011-ed14fe4028ac"
            ],
            "message": [
                "Checking if there are any new forms responses to handle"
            ],
            "data_stream.type": [
                "logs"
            ],
            "observer.hostname": [
                "edge-apm-devprod"
            ],
            "service.framework.name.text": [
                "Temporalio.Workflow:customer-xyz-forms-receipt-workflow"
            ],
            "@timestamp": [
                "2026-02-05T05:40:57.971Z"
            ],
            "observer.type": [
                "apm-server"
            ],
            "observer.version": [
                "9.2.4"
            ],
            "service.name.text": [
                "Customer_Xyz"
            ],
            "data_stream.dataset": [
                "apm.app.Customer_Xyz"
            ],
            "labels.Namespace": [
                "production"
            ],
            "agent.version": [
                "1.14.0"
            ],
            "event.dataset": [
                "apm.app.Customer_Xyz"
            ],
            "labels.{OriginalFormat}": [
                "Checking if there are any new forms responses to handle"
            ]
        },
        "highlight": {
            "labels.RunId": [
                "@kibana-highlighted-field@019c2c51-3ebf-7f2a-a011-ed14fe4028ac@/kibana-highlighted-field@"
            ]
        },
        "sort": [
            "2026-02-05T05:40:57.971Z",
            4246
        ]
    }
]

So I do see:

[TMPRL1101] Workflow with ID 019c2c51-3ebf-7f2a-a011-ed14fe4028ac deadlocked after 00:00:02

Ah, so this was indeed a deadlock. Something happened that prevented the workflow from reaching its next await point within 2s. This usually means it is spinning CPU somewhere or there is some unsafe code not caught by the tracing event listener (or it is disabled in options or in code) that is delegating to default scheduler and not caught. Since it was transient, I think it is unlikely a code problem, but still possible. Could be something in how logging or payload conversion is implemented if there are customizations there.

If able to replicate that would be ideal, otherwise I am afraid there is not enough to go on to know which code caused it. Can try to replay history via replayer, but it is unlikely to fail given that it succeeded on another worker (and therefore Temporal replayed successfully to have it do so).

I see from a past post at OpenTelemetry locate workflow from error you are using OTel logging, if able to replicate somehow, I wonder if you can stop OTel logging and see if the error no longer occurs (therefore implying OTel logging may be doing something with the tasks that doesn’t work with our deterministic scheduler).

We got an update from our infra team. We scaled up our worker nodes to use more memory, but this ended up exceeding a physical host–level memory limit that isn’t visible to Kubernetes. Because of that, the backend enforced memory reclaim/swapping instead of triggering pod OOMKills, so we never saw any restarts.

Given that, it seems likely that the timeouts/deadlock we observed were a side effect of this memory pressure scenario.

Thanks for the quick replies!

1 Like