Community AMA session: In-Depth Temporal Workers Q&A with Tihomir Surdilovic

temporal_angie · September 13, 2024, 6:01pm

Earlier today, the fabulous @tihomir, renowned community hero and Senior Staff Developer Success Engineer at Temporal, led a Community AMA session that covered a wide range of community questions about Temporal Workers, including:

How Temporal Workers “work”
How Services and Workers interact
Related metrics
Tuning tips and best practices based on use cases

Here’s the session, in case you missed it!

Thanks so much to @tihomir for leading this session, as well as @Chad_Retz and @jwatkins for helping with live Q&A!

temporal_angie · September 13, 2024, 6:55pm

And here were the additional questions (and answers!) that came up during Q&A:

Is it okay for one Go process to have multiple workers polling different queues?

Yes! Very common.

How expensive is it to spin up many (let’s say 1-10k) workers in a single Go process using client.NewFromExisting? What resources does this consume on the worker process and on the Temporal server (from polling/etc)?

It’s very cheap to spin up a lot of workers. Most workers are two/three long-running gRPC calls. What’s expensive is your code the worker runs, and that may affect your decision about how many workers you collocate alongside each other.

Thomir mentioned that Go/TypeScript/.NET SDKS dynamically tune the non-sticky-to-sticky. What is the Python SDK behaviour?

.NET and Python are working the same way in that regard.

When a Go process has multiple workers running, how is it decided which worker will get to process a workflow/activity next? I’m curious if starvation can happen if one task queue is very busy (i.e. would other task queues still get a chance to run)?
It’s just goroutines like any other Go code. There are per-worker settings to limit concurrency, and you should set them to values to ensure the resources are not overloaded (may have to benchmark because everyone’s workflows/activities are different). But otherwise, if slots are available and the process is not overloaded, all workers will continually ask for more work, they won’t affect each other.

Is temporal_worker_task_slots_available the best metric to implement worker auto-scaling?

Yes, mostly, see https://docs.temporal.io/develop/worker-performance. We are going to have a way to get more accurate task queue information in the very near future to help drive scalers though, stay tuned!

What are the most common failure modes when the workers are not configured properly?
Depends on which option is not set reasonably. Too large of a max-concurrent means overloaded/slow process and potentially high latencies. Low max-concurrent may mean growing task queue backlog and bad schedule-to-start latencies. Low cache means more CPU/network work on each workflow task, high cache means potential memory overuse. Etc.

https://docs.temporal.io/develop/worker-performance may be able to help a bit

Is there any provision in the product to do auto-scaling?
We’re actively working on it. See this recent announcement: https://temporal.io/change-log/announcing-auto-tuning-for-workers-in-pre-release

Topic		Replies	Views
Need Some more information about Temporal workers Community Support go-sdk	1	406	June 2, 2022
When to scale an amount of Temporal Workers from 1 per queue Community Support java-sdk	3	1488	November 9, 2020
Creating multiple workers in the same process Community Support go-sdk	18	3159	February 11, 2025
Multiple Worker Instances for Asynchronous Activity Processing. Community Support go-sdk	2	2263	June 15, 2021
How to start the number of workers correctly (golang)? Community Support go-sdk	3	890	September 10, 2022

Community AMA session: In-Depth Temporal Workers Q&A with Tihomir Surdilovic

Related topics