I have several separate microservices deployed, all of which will connect to Temporal and act as activity workers. I’m curious what strategies people might recommend for knowing when to scale those services?
In a previous iteration of this project, these microservices were polling from AWS SQS. Our scaling strategy was based off the number of jobs in the queue.
Is there an equivalent metric we can use in Temporal? Do each of the workers instead need to track custom metrics regarding their processing duration and rate of jobs consumed?
Temporal doesn’t expose number of tasks in a task queue. The reason that it is very non trivial to calculate as each task has an independent queuing timeout.
Temporal exposes Schedule to Start task latency metric which can be used to autoscale. Without backlog its value is close to 0. When tasks start to accumulate the latency goes up.
Is this the stats_counts.temporal.schedule_to_start_timeout.* metric (using statsd as an example)?
If so, does that metric only get logged when the request has timed out vs. was just slow to start? Can this metric be reported on a per task queue basis or is it only at the namespace level?
I only found workflow_type=“archivalWorkflow” and workflow_type=“temporal_sys_tq_scanner_workflow” on my prometheus on the temporal_activity_schedule_to_start_latency_bucket metrics.