Scaling Task Queues | Correlation between numHistoryShards, taskQueuePartitions, number of matching service hosts

maxim · January 12, 2023, 7:52pm

I think there is some terminology confusion. Temporal history service divides workflow id space into numHistoryShards shards. Each shard contains a transfer queue. This queue implements an outbox pattern as tasks are inserted into it using the same transaction that updates the workflow state.

Task queues are maintained by the matching service. They are partitioned independently. By default, each task queue uses 4 partitions.

When an activity or workflow task is scheduled, it is first added transactionally to the shard transfer queue. Then the task is consumed from that queue and pushed to the matching engine. The choice of which partition to push a task is based on the task id and is completely independent of the history service sharding logic.

Is shard = partition? (shards as in the value set via numHistoryShards)

No. Shard is the history service concept. Partition is the matching engine concept.

How would more taskQueuePartitions help during high load?

On average, you expect 100 tasks per second per task queue partition. This heavily depends on the service and DB hardware.

What would be the reasonable value for read and write partitions for task queues, if the numHistoryShards=4096? do numHistoryShards and taskQueuePartitions need to be proportional?

They are absolutely independent.

Can a single instance of matching service handle a large taskQueuePartitions value? or should number of matching service hosts be proportional to taskQueuePartitions as well?

It can handle certain throughput per host. If multiple partitions that share the matching host don’t overload it then it is OK. The number of matching hosts should be proportional to the total traffic across all task queues.

Can taskQueuePartitions be different for different task queues, depending on load per task queue?

Yes, only task queues that need to support high traffic (over 400 tasks per second) need the increased number of task queue partitions.

Topic		Replies	Views
Communication between worker process and matching service Community Support	5	165	June 11, 2024
Estimating the right configuration values of the temporal services Server Deployment	2	549	January 26, 2025
Temporal framework performance evaluation Community Support performance	6	2139	August 20, 2022
Suggestions to increase worker throughput Community Support	7	2034	December 10, 2020
Is there an example of scalable task list? Community Support java-sdk	4	649	October 11, 2020

Scaling Task Queues | Correlation between numHistoryShards, taskQueuePartitions, number of matching service hosts

Related topics