WorkerTuner causes workers to not poll queues

I’m using WorkerTuner for one of the task queues in a worker that listens to two queues. This queue processes long-running activities. Over time, I observe that workers stop polling tasks from this queue, and they disappear from the server UI’s worker list for that queue.

To rule out potential WorkerTuner configuration issues, I’ve set both targetMemoryUsage and targetMemoryCPU in ResourceBasedControllerOptions to 1.0 (as per documentation, this should disable CPU/RAM-based scaling influence). At the same time, I have minimumSlots in ResourceBasedSlotOptions set to 2 and maximumSlots to 16.

The other queue processed by the same worker application (but without WorkerTuner, using the same WorkflowClient) does not exhibit this problem. That worker continues polling tasks from it even when the WorkerTuner-enabled worker gets stuck.

In the worker metrics, when it stops polling, I see temporal_worker_task_slots_used for the affected queue gradually dropping to 0 and not increasing again — meaning the activity code itself is not hanging. Moreover, during this period (while the slot count remains at 0), the server allows already-running activities from this worker to complete; they send heartbeats and finish successfully. However, the worker still does not pick up new tasks.

Unfortunately, I haven’t found other useful metrics when using WorkerTuner (e.g., temporal_worker_task_slots_available is not exposed for such queues). I’m not sure where else to look to understand what’s happening.

Can you look from sdk metrics temporal_request_failure and especially temporal_long_request_failure per operation and status_code

temporal_request_failure_total is 1.0 for that queue (and there is no such metric for other queues)

temporal_request_failure_total{activity_type="***",exception="none",namespace="***",operation="RecordActivityTaskHeartbeat",query_type="none",signal_name="none",status_code="NOT_FOUND",ta
sk_queue="***",worker_type="ActivityWorker",workflow_type="***"} 1.0
temporal_request_failure_total{activity_type="***",exception="none",namespace="***",operation="RespondActivityTaskFailed",query_type="none",signal_name="none",status_code="NOT_FOUND",task
_queue="***",worker_type="ActivityWorker",workflow_type="***"} 1.0

temporal_long_request_failure is not present for that queue (though, there is such a metric for other queues).

I turned on all all TRACE logs for the io.temporal package and the last log related to the queue when worker stops polling new tasks from it is the following

2025-12-24T17:39:10.937Z TRACE 1 --- [="***": 13] i.t.i.w.ActivityPollTask                 : poll request begin: namespace: "***"
task_queue {
  name: "***"
}
identity: "***"
worker_version_capabilities {
}

After that, I see no messages related to that queue. However, already running activities of that queue heartbeat and finish with no problems.

At the same time, for other queues (which do not use WorkerTuner) I see messages like this:

2025-12-24T17:39:59.291Z TRACE 1 --- [="***": 11] i.t.i.w.ActivityPollTask                 : poll request begin: namespace: "***"
task_queue {
  name: "***"
}
identity: "***"
worker_version_capabilities {
}
2025-12-24T17:39:59.478Z TRACE 1 --- [e="***": 6] i.t.s.GrpcTracingInterceptor             : Returned "temporal.api.workflowservice.v1.WorkflowService" of "temporal.api.workflowservice.v1.Workflo
wService/PollActivityTaskQueue" with output:

Also, it all works with no issues when I replace WorkerTuner for the worker of that queue with min/max concurrent activities options.