TMPRL1101 “deadlock” error under high parallelism causes worker hang

Vinay_Kakkad · July 17, 2025, 2:20pm

We encountered a TMPRL1101 deadlock error in one of our python worker deployments. On encountering the error, the worker process hanged and no subsequent workflows were processed. The worker had to be brought back to a healthy state by restarting the worker (through k8s pod).

Environment Details

Workflow Structure

initial set of activities
...
...
a group of 4 parallel activities that run for (n) different inputs in parallel
...
...
final set of activities

Resource Utilization
- When we encountered the deadlock error, the CPU utilization was at 100% of the k8s limit and memory utilization at 70%
Workflow State
- The number of the parallel inputs (n) was ~120
- When we encountered the deadlock error, the in-flight activities belonged to the parallel group
Temporal Python SDK version: v1.6.0

What we’ve tried

To pin down the root cause (high concurrency / high resource utilization) we did the following experiments – a) Increase pod resources to avoid hitting the resource limits. b) Limit the number of parallel inputs to ~30. In both cases, we encountered fewer WorkflowTaskTimedOut and TMPRL1101 errors and the worker continued even after encountering the deadlock error.
As the cause of the TMPRL1101 is longer execution(> 2s) of workflow tasks, we separated the workers for activity and workflow tasks on separated processes and were able to avoid the time out and deadlock errors.

Questions for the community

Expected behavior understanding: With higher resource (or lower concurrency), the worker is able to recover after encountering the deadlock error, but the worker hanged in the other case. What is the expected recovery behaviour of the worker if it encounters the deadlock error?
Prevention: Is there a recommended config or SDK option to disable or mitigate TMPRL1101 entirely?
High-parallelism best practices: For hundreds of concurrent activities in one workflow, what patterns (e.g. child workflows, worker tuning / separation) does the community recommend?

Thanks in advance for any tips or pointers!

tihomir · July 20, 2025, 1:34am

What is the expected recovery behaviour of the worker if it encounters the deadlock error?

Worker reports workflow task failure to service, service will keep retrying the workflow task. Workflow execution remains in running status.

Is there a recommended config or SDK option to disable or mitigate TMPRL1101 entirely?

There are ways but not really recommended to do in production:
Set debug_mode to True when creating worker. You can also set TEMPORAL_DEBUG env var to 1
Better to set worker options, especially max_concurrent_activities to a number where your worker cpu does not go over 70%, then scale on number of worker pods to achieve needed activity execution parallelism.

For hundreds of concurrent activities in one workflow, what patterns (e.g. child workflows, worker tuning / separation) does the community recommend?

Depends on your workers and their capacity in terms of compute resources and desired parallelism needs.
One way to limit could be to run smaller batches of parallel running activities, wait for them to complete, then start a new batch.

Vimal_Sheoran · July 21, 2025, 12:02pm

One thing that we analyzed from further investigation is that within this activity we call an AWS service via boto3. This is a synchronous I/O bound operation within an asynchronous activity definition. Could this be an issue?

Topic		Replies	Views
Is there a way to increase the workflow deadlock detection timeout from the default 2 seconds? in python_sdk Community Support python-sdk	10	445	September 17, 2025
Cause of temporal deadlock errors Community Support python-sdk	3	1628	January 26, 2024
HELP - Workers not pooling for task. Task queue are decreasing in workers until 0 Community Support python-sdk	13	160	January 13, 2026
When starting worker, get the Potential Deadlock error, can anyone suggest? Community Support	4	2915	March 30, 2021
PotentialDeadlockException, when does this happen? Community Support java-sdk	7	897	September 28, 2024

TMPRL1101 “deadlock” error under high parallelism causes worker hang

Related topics