I think there is some terminology confusion. Temporal history service divides workflow id space into numHistoryShards
shards. Each shard contains a transfer queue. This queue implements an outbox pattern as tasks are inserted into it using the same transaction that updates the workflow state.
Task queues are maintained by the matching service. They are partitioned independently. By default, each task queue uses 4 partitions.
When an activity or workflow task is scheduled, it is first added transactionally to the shard transfer queue. Then the task is consumed from that queue and pushed to the matching engine. The choice of which partition to push a task is based on the task id and is completely independent of the history service sharding logic.
- Is shard = partition? (shards as in the value set via
numHistoryShards
)
No. Shard is the history service concept. Partition is the matching engine concept.
- How would more taskQueuePartitions help during high load?
On average, you expect 100 tasks per second per task queue partition. This heavily depends on the service and DB hardware.
- What would be the reasonable value for read and write partitions for task queues, if the numHistoryShards=4096? do numHistoryShards and taskQueuePartitions need to be proportional?
They are absolutely independent.
- Can a single instance of matching service handle a large taskQueuePartitions value? or should number of matching service hosts be proportional to taskQueuePartitions as well?
It can handle certain throughput per host. If multiple partitions that share the matching host don’t overload it then it is OK. The number of matching hosts should be proportional to the total traffic across all task queues.
- Can taskQueuePartitions be different for different task queues, depending on load per task queue?
Yes, only task queues that need to support high traffic (over 400 tasks per second) need the increased number of task queue partitions.