Automatically scale server components e.g. using `HorizontalPodAutoscaler`

In the provided Helm chart for deploying Temporal on Kubernetes fixed-size replica sets are deployed for each server component. Is it recommended to have the same number of instances running for longer periods of time or is it fine to autoscale on short time scales? For example by employing a HorizontalPodAutoscaler?
If so, what is the recommended metric to autoscale the different server components on (frontend, matching, history)?

Autoscale is a great idea!

We don’t include that in the helm chart but you can totally deploy HPA and scale on metrics from k8s-prometheus-adapter.

As to how to scale, that really depends on your workload. It’s fairly likely though that your persistence layer will be a large bottleneck in overall system performance. You could tune things like number of workers based on sync match rates. As for other components, I haven’t had a chance to play with autoscaling those based on a metric yet.

If you go this route, we’d love to hear what you find out!

Cool, I think then I’ll start with something dead simple like:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: temporal-frontend
spec:
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        value: "50"
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: temporal-frontend

From there I have to do load tests (I’ll check out this: link) to see what’s my bottleneck and what metric I should be scaling on, but good to know there’s no reason not to autoscale.

If I have any significant findings I’ll report here.