Hello Team,
It has been really interesting to explore temporal. We have distributed application where we deploy the worker in kubernetes with multiple pods. Want to understand how polling is done , worker is picked up to process a request , load balancing.
Also , what is the best practices and approaches in production to deploy temporal ?
You mean like if you have multiple workers listening to the same task queue, which one will receive the workflow task?
Short answer, it’s random. Task queue has multiple partitions (by default 4). When a worker tries to poll a task queue, it is first randomly picked one of the partitions and it waits for tasks on that partition.
If a worker waits for too long for a task, the temporal server matching service will poll the root partition.
Regarding production deployments, see our docs page, and also the forum links mentioned in this post. Let us know if you have specific questions.
These links have been really helpful
Currently we are targeting 50-60 instances running concurrently . So is one temporal server instance sufficient ?
Also if multiple instances are there -
Let’s say if we have two instances of temporal server running, would the scheduling be distributed among them or would one act as like a standby?
Currently we are targeting 50-60 instances running concurrently . So is one temporal server instance sufficient ?
A single Temporal server instance (k8s pod) is probably not enough. Try to have 3 or more per Temporal service (frontend/matching/history) for a prod environment, as during deployment at least one k8s pod is being replaced.
50-60 instances running concurrently
with that many instance you will most likely (depending on your target task queue load) either have to increase the number of pollers or increase the size of your cluster, but it’s hard to tell as you will have to do load testing to determine what works best for your deployment and load.
Also if multiple instances are there
assuming here we are talking about the same Temporal cluster,
Let’s say if we have two instances of temporal server running, would the scheduling be distributed among them or would one act as like a standby?
Yes, Elasticsearch is optional. It is needed for advanced visibility. If you don’t use it, it will just standard visibility on top of your primary database instance (separate database though). Cassandra, MySQL, PostgreSQL support standard visibility. The main advantage of advanced visibility is custom search attributes which you can attach to any workflow and then filter based on them. Also with advanced visibility you can write complex SQL-like queries in Web UI (“Advanced” button there).
Just to add regarding Java SDK, some of the client side apis also depend on Elasticsearch to be enabled.
For example you will not be able to use ListWorkflowExecutions without it (you can still use ListOpenWorkflowExecutions and ListClosedWorkflowExecutions tho).
Same for similar tctl commands (list with query) I believe as well.