Suggested metrics to autoscale Temporal workers on

Hi what are the general metrics you recommend autoscaling workers on? I currently have 3 workers, each working on a separate task queue. 2 of them only execute activities, while 1 only executes workflows. I saw posts seeming to recommend autoscaling on schedule_to_start latency, but I see multiple schedule_to_start latency metrics, namely:

temporal.task_schedule_to_start_latency
temporal_activity_schedule_to_start_latency
temporal_workflow_task_schedule_to_start_latency

Do you have any recommendations which metrics we should be autoscaling on, and for which workers (given that some are activities only and some workflows only)? Can you also explain in layman’s terms what the task_schedule_to_start?

Also curious, if there are multiple retries on an activity, does schedule_to_start indicate the time until the successful retry?

Let’s say I have an activity that runs 5 times (because of failures but is retried each time), but is scheduled first at 12. The first attempt is at 12:02, indicating a schedule to start latency of 2s. Does this metric get bumped on each subsequent retry, or does it maintain it at 2s?

what are the general metrics you recommend autoscaling workers on?

Would use a combination of SDK and server metrics.

SDK:
Starting point can be the worker tuning guide in doc and metrics:

  1. worker_task_slots_available: Gauge metric, defines how many task slots are available for workers to process tasks. It should be > 0, otherwise workers would not be able to keep up processing tasks.
    Sample Prometheus query:
    avg_over_time(temporal_worker_task_slots_available{namespace="default",worker_type="WorkflowWorker"}[10m])
    (or for current value)
    temporal_worker_task_slots_available{namespace=“default”, worker_type=“WorkflowWorker”, task_queue=“<your_tq_name>”}
    Note worker_type can be WorkflowWorker, ActivityWorker, LocalActivityWorker

  2. workflow_task_schedule_to_start_latency: Histogram metric, latency from when a workflow task is placed on task queue by server to the time your worker picks it up to process it.
    Sample Prometheus query:
    sum by (namespace, task_queue) (rate(temporal_workflow_task_schedule_to_start_latency_seconds_bucket[5m]))
    You would have to define your own alert latency number here per your perf requirements. You want this latency to be as small as possible.

  3. activity_schedule_to_start_latency: Histogram metric, latency from when activity task is placed on task queue by server to the time your activity workers (note it can be same worker that processes your workflow if they are on sam task queue) picks it up to process.
    Sample Prometheus query:
    sum by (namespace, task_queue) (rate(temporal_activity_schedule_to_start_latency_seconds_bucket[5m]))
    Same here, you would want to this be as low as possible and as high as per your performance requirements.

  4. sticky_cache_size: Gauge metric, reports the size of you worker in-memory cache. Your workers have a workflow execution cache, if an execution is in cache your workers do not have to replay the whole wf history to continue workflow code execution when they receive a “go” from server to do so.
    You you don’t want this value to go over the set WorkflowCacheSize for specific task queue.
    Sample Prometheus query:
    max_over_time(temporal_sticky_cache_size{namespace="default"}[10m])
    Along with this you could look at temporal_sticky_cache_total_forced_eviction_total counter over time, it’s ok if this is > 0 but you might want to alert if this number jumps over a pre defined threshhold over period of time.

  5. workflow_active_thread_count (note this is relevant only in Java SDK): Gauge metric,
    Number of cached workflow threads. You could alert if this number gets close to the set maxWorkflowThreadCount in worker factory options.
    Sample Prometheus query:
    max_over_time(temporal_workflow_active_thread_count{namespace="default"}[10m])

If you had to pick two SDK metrics that should definitely include in autoscaling logic should be the two latency metrics for activity and workflow tasks.

Will include server metrics info in next reply.

Server metrics:

  1. Sync match rate, Matching service metrics. Measures the rate of tasks that can be delivered to workers without having to be persisted (workers are up and available to pick them up) to the rate of all delivered tasks. You want this to be > 95% ideally 99%. If sync match rate is low you should consider increasing worker capacity.
    Metrics:
    poll_success_sync -
    poll_success - total tasks delivered to workers
    Sample query:
    sum(rate(poll_success_sync{namespace="<namespace_name>"}[1m])) / sum(rate(poll_success{}[1m]))

  2. task_schedule_to_start_latency - timer metric, latency between when task is scheduled and when delivered to your workers. If this latency increases its a good indication to add/scale up your workers.
    You can use the mentioned SDK metrics workflow_task_schedule_to_start_latency and activity_schedule_to_start_latency instead of this if you wanted.
    Sample query:
    histogram_quantile(0.95, sum(rate(task_schedule_to_start_latency_bucket{namespace="default", taskqueue="metricsqueue"}[10m])) by (task_type, le))

Worker container/pod CPU utilization:
In addition to temporal metrics you should also measure your worker containers/pods CPU utilization. This in addition to schedule to start latencies (SDK metrics) can give you a pretty good indication to scale up workers (for example when your cpu utilization > defined threshold or schedule to start latencies are high, or to scale down when cpu utilization is low).

Hope this helps you get started.

Gotcha, appreciate the reply. I still have the question of if there are multiple retries on an activity, does schedule_to_start indicate the time until the successful/last retry, or until the first attempt?

Let’s say I have an activity that is retried 5 times, but is scheduled first at 12. The first attempt is at 12:02, indicating a schedule to start latency of 2s. Does this metric get bumped on each subsequent retry (in theory resulting in a metric of 10s), or does it maintain it at 2s?