KEDA allows Kubernetes deployments to be scaled based on workloads.
A scaler can be created that monitors particular resources – in this case possibly a Temporal worker queue length, and can thus scale a deployment based on that information, even to zero. There are many examples of other system-specific scalers.
A KEDA-based Temporal Scaler would be great to allow scaling resources and maximizing cost effectiveness of an infrastructure in which some services are only used occassionally, or have inconsistent resource requirements over time.
A CPU scaler can be used as a stand-in but that will never scale to zero, whereas a Temporal-specific scaler could scale to zero if there were no entries pending processing in a worker queue.
Is there any plan to provide such functionality out of the box? If not, what would be the best way to go about implementing one? I see various latency metrics in the temporal docs that could potentially be used, but it will be tricky and error-prone to attempt to infer a queue size from these, as a higher latency could just as easily mean more workers are required as less.