Is there a way to check the health of workers?

Hi, we are using the java SDK and currently have workers deployed. We are wondering if there’s a way to check if these workers are running. I saw some post about this and most of the answer was because there was currently no way to do it. These posts had been at least 6 months ago so I’m wondering if there had been an update since then that would allow us to perform a health check on the workers.

Here are the posts:

The only check that kind of works is

tctl --ns default taskqueue desc -tq "Task Queue Name"

Even if the worker is running, the problem is sometimes it doesn’t return a result. I believe it’s because it only checks for task queues that process workflows but on our setup we separated an activity to use its own worker even if it’s the same workflow because it consumes a lot of resources and we gave that activity a maximum number of executions.

Please let me know how we can perform the health check on our workers or if it’s possible. Thanks in advance.

DescribeTaskQueue is still currently the way to get list and count of pollers on the specific task queue.

You can also utilize sdk metrics, alert on high workflow_task_schedule_to_start_latency and activity_schedule_to_start_latency. This could indicate workers not responding / are down.

Another thing that could help is server metric
service_pending_requests which emitted when there is a poller polling on frontend service and waiting for a task. If its value is 0 would mean there are no pollers and your workers could be down.

Thanks. We have datadog integration and we will try to use the sdk to send those metrics using statsd