And here were the additional questions (and answers!) that came up during Q&A:
Is it okay for one Go process to have multiple workers polling different queues?
Yes! Very common.
How expensive is it to spin up many (let’s say 1-10k) workers in a single Go process using client.NewFromExisting
? What resources does this consume on the worker process and on the Temporal server (from polling/etc)?
It’s very cheap to spin up a lot of workers. Most workers are two/three long-running gRPC calls. What’s expensive is your code the worker runs, and that may affect your decision about how many workers you collocate alongside each other.
Thomir mentioned that Go/TypeScript/.NET SDKS dynamically tune the non-sticky-to-sticky. What is the Python SDK behaviour?
.NET and Python are working the same way in that regard.
When a Go process has multiple workers running, how is it decided which worker will get to process a workflow/activity next? I’m curious if starvation can happen if one task queue is very busy (i.e. would other task queues still get a chance to run)?
It’s just goroutines like any other Go code. There are per-worker settings to limit concurrency, and you should set them to values to ensure the resources are not overloaded (may have to benchmark because everyone’s workflows/activities are different). But otherwise, if slots are available and the process is not overloaded, all workers will continually ask for more work, they won’t affect each other.
Is temporal_worker_task_slots_available the best metric to implement worker auto-scaling?
Yes, mostly, see https://docs.temporal.io/develop/worker-performance. We are going to have a way to get more accurate task queue information in the very near future to help drive scalers though, stay tuned!
What are the most common failure modes when the workers are not configured properly?
Depends on which option is not set reasonably. Too large of a max-concurrent means overloaded/slow process and potentially high latencies. Low max-concurrent may mean growing task queue backlog and bad schedule-to-start latencies. Low cache means more CPU/network work on each workflow task, high cache means potential memory overuse. Etc.
https://docs.temporal.io/develop/worker-performance may be able to help a bit
Is there any provision in the product to do auto-scaling?
We’re actively working on it. See this recent announcement: https://temporal.io/change-log/announcing-auto-tuning-for-workers-in-pre-release