I have a use case where activity is executed relatively infrequently (~ once an hour), but requires a lot of cpu and ram. Currently we deploy a static pool of workers (1-2 replicas), that are unused 90% of the time. This is obviously very wasteful, especially when we have many such activities.
What I would like to have: workers spin down to 0 when there are no tasks assigned for that activity type, and scale them up to necessary number of replicas to process the tasks in the queue.
How would you do it with temporal? Is there a plan to have a native support for k8s workers to enable such capabilities?
I think there are plans to add worker scaling feature but no ETA on that as of yet.
One idea currently could be to look at task_schedule_to_start_latency server metric for Activity task_type, sample query
histogram_quantile(0.95, sum(rate(task_schedule_to_start_latency_bucket{namespace="$namespace", task_type="Activity"}[$__rate_interval])) by (task_type, le))
if this latency goes up and you don’t have any activity pollers, sample query
sum(rate(service_pending_requests{namespace="$namespace", operation="PollActivityTaskQueue", service_name="frontend"}[1m])) or vector(0)
would mean you need to start your activity worker(s) to process pending activity tasks