Strategies for Scaling AWS Services

I have several separate microservices deployed, all of which will connect to Temporal and act as activity workers. I’m curious what strategies people might recommend for knowing when to scale those services?

In a previous iteration of this project, these microservices were polling from AWS SQS. Our scaling strategy was based off the number of jobs in the queue.

Is there an equivalent metric we can use in Temporal? Do each of the workers instead need to track custom metrics regarding their processing duration and rate of jobs consumed?

Any suggestions would be welcome. Thank you!

1 Like

Temporal doesn’t expose number of tasks in a task queue. The reason that it is very non trivial to calculate as each task has an independent queuing timeout.

Temporal exposes Schedule to Start task latency metric which can be used to autoscale. Without backlog its value is close to 0. When tasks start to accumulate the latency goes up.

Is this the stats_counts.temporal.schedule_to_start_timeout.* metric (using statsd as an example)?

If so, does that metric only get logged when the request has timed out vs. was just slow to start? Can this metric be reported on a per task queue basis or is it only at the namespace level?

Any updates on this ? @keelerm84 have you solved this ?

We have schedule to start latency recorded on SDK. For example, sdk-go record for activity:

The workflow version is also available as WorkflowTaskScheduleToStartLatency in the same file.

I only found workflow_type=“archivalWorkflow” and workflow_type=“temporal_sys_tq_scanner_workflow” on my prometheus on the temporal_activity_schedule_to_start_latency_bucket metrics.

The metrics that I referenced above is SDK side of metrics. I think what you are looking at is server side metrics.

Ahh I see, can you reference me schedule to start latency metrics on php-sdk ? I can’t find it.

@maxim similar question, does go sdk support schedule to start latency metrics?

Only see service side metrics but no sdk side’s metrics, not sure if it is not supported or my setup has issues.

Yes, it is supported. You have to initialize the reporter when creating a worker. See the metrics sample.

1 Like