Autoscaling Workers Based on Custom Prom Metrics, For one specific activity in the queue

Hey Guys, Currently, I am trying to deploy my workers in autoscaling mode. But autoscaling based on resource usage is not that helpful.
In my workflow, I have multiple activities, of which only one (let’s call it the BigActivity) takes a long time and is resource-intensive. So I want to scale my workers based on the number of BigActivity activities in the queue(1 BigActivity vs 1 Worker). But in prom metrics, I don’t see any metrics related to this, or may be schedule latency for a specific activity type.

Part 2: I am also planning to run this activity in separate workers using a seperate queue, but question remains same, on which metric to auto-scale.

Yes would separate this BigActivity on its own task queue and have dedicated activity workers for it.
One thing you can consider scaling on is activity worker task slots available -
Temporal SDK metrics reference | Temporal Documentation for worker_type=ActivityWorker and filter on this activities task queue.
If this metrics depletes or goes to 0 meaning all your workers are processing max configured activity tasks could mean you need to scale up if you want to process more.
CPU on those activity workers can also be a good indication in combination to this to scale on.

If you have service metrics available one more indication of task backlog is

sum(rate(persistence_requests{operation=“CreateTask”}[1m]))

meaning CreateTask would be recorded only when a task is not dispatched to your worker right away but had to be persisted as no pollers were availabe at that time to dispatch to.