Strategies for Scaling AWS Services

keelerm84 · February 23, 2021, 4:20pm

I have several separate microservices deployed, all of which will connect to Temporal and act as activity workers. I’m curious what strategies people might recommend for knowing when to scale those services?

In a previous iteration of this project, these microservices were polling from AWS SQS. Our scaling strategy was based off the number of jobs in the queue.

Is there an equivalent metric we can use in Temporal? Do each of the workers instead need to track custom metrics regarding their processing duration and rate of jobs consumed?

Any suggestions would be welcome. Thank you!

maxim · February 23, 2021, 5:42pm

Temporal doesn’t expose number of tasks in a task queue. The reason that it is very non trivial to calculate as each task has an independent queuing timeout.

Temporal exposes Schedule to Start task latency metric which can be used to autoscale. Without backlog its value is close to 0. When tasks start to accumulate the latency goes up.

keelerm84 · February 24, 2021, 2:36pm

Is this the stats_counts.temporal.schedule_to_start_timeout.* metric (using statsd as an example)?

If so, does that metric only get logged when the request has timed out vs. was just slow to start? Can this metric be reported on a per task queue basis or is it only at the namespace level?

shineirvin · July 20, 2021, 10:22am

Any updates on this ? @keelerm84 have you solved this ?

Yimin_Chen · July 20, 2021, 5:57pm

We have schedule to start latency recorded on SDK. For example, sdk-go record for activity:

The workflow version is also available as WorkflowTaskScheduleToStartLatency in the same file.

shineirvin · July 22, 2021, 4:38am

I only found workflow_type=“archivalWorkflow” and workflow_type=“temporal_sys_tq_scanner_workflow” on my prometheus on the temporal_activity_schedule_to_start_latency_bucket metrics.

Yimin_Chen · July 22, 2021, 3:49pm

The metrics that I referenced above is SDK side of metrics. I think what you are looking at is server side metrics.

shineirvin · July 23, 2021, 3:25am

Ahh I see, can you reference me schedule to start latency metrics on php-sdk ? I can’t find it.

Bo_Gao · October 1, 2021, 7:43pm

@maxim similar question, does go sdk support schedule to start latency metrics?

Only see service side metrics but no sdk side’s metrics, not sure if it is not supported or my setup has issues.

maxim · October 1, 2021, 8:25pm

Yes, it is supported. You have to initialize the reporter when creating a worker. See the metrics sample.

Topic		Replies	Views
Suggested metrics to autoscale Temporal workers on Community Support general-impl , metrics , kubernetes	9	8085	January 3, 2024
Scaling temporal worker Community Support	2	465	March 30, 2025
Activity start schedule latency value is extremely large Community Support python-sdk , metrics	4	841	September 14, 2023
What are the best metrics to autoscale each cluster service on? Community Support	5	912	May 8, 2023
How to monitor ScheduleToStart latency Community Support metrics	10	2062	February 28, 2022

Strategies for Scaling AWS Services

Related topics