Determining optimal number of history shards

ntnj · August 21, 2025, 6:02am

Hi,

We’re hosting Temporal on our own k8s cluster. Since the number of history shards can’t be changed after initial startup, we’re running some load tests to determine scalability and balance it with idle resource usage. Our loadtest tests various scenarios, like Echo activity, Sleep activity, Workflow sleeps, Child workflows in a tree scenarios etc.

We tested with various history shard counts, and can see the increase in number of similar workflows completed per second by increasing the history shard count, but it stops increasing at some point (around 2k history shards with 200 workflows/s with ~20 echo activities). We’ve tried increasing taskQueuePartitions, various RPS limits on each service, scaling up Temporal server instances, adding more workers etc.

Is there a metric we can use to determine whether the number of history shards is a bottleneck (e.g. delay in acquiring shard or workflows waiting for a shard to write to persistence) or if the bottleneck is coming from some other limits?

What is the general guideline to determine number of history shards? Say we want to run 200 workflows/s, each with ~20 activities.

Thanks,

ChrisM · August 22, 2025, 1:43am

Hi Nitin,
Are you still having challenges with this? I’m part of the Temporal APJ team. We have some Indian based engineers who may be able to assist. You can use my calendly link to book a time to chat: Calendly

Chris

Topic		Replies	Views
Update history shards Community Support go-sdk	5	68	December 15, 2025
Temporal Server Config Community Support	1	670	September 23, 2022
numHistoryShards modify Community Support	6	1811	April 6, 2023
History Service CPU usage Community Support	5	1026	March 26, 2021
Suggestions to increase worker throughput Community Support	7	2164	December 10, 2020

Determining optimal number of history shards

Related topics