Update history shards

Hi,

I have a temporal setup right now running in production and numHistoryShards is 4. (default) and as I know it’s immutable. I want to update it to 1024. However I need to find a way to migarate the running workflows from old v1 server to the new v2. the problem is in my case the running workflows returns `NewContinueAsNewError` back. What is the best and safe way to make this numHistoryShards update in such scenario? For your information, there are approximately 1,000 running workflows and about 700,000 workflows in the ContinueAsNew state.

You have two options.

  1. Use multi-cluster replication to migrate the namespace from one cluster to another.

  2. Change your workflow to call an activity to start the workflow on the new cluster instead of calling continue-as-new. This logic can be implemented generically as an interceptor.

1 Like

Thanks for your reply! Can having only 4 history shards cause performance bottlenecks or issues with horizontal scalability? I noticed many errors related to shard locks in the logs when I tried running 3 server instances.

There were different kind of errrors in the logs:

{"level":"info","ts":"2025-09-22T10:33:44.535Z","msg":"history client encountered error","service":"matching","error":"Workflow is busy.","service-error-type":"serviceerror.ResourceExhausted","logging-call-at":"metric_client.go:104"}
{"level":"info","ts":"2025-09-22T10:47:10.509Z","msg":"history client encountered error","service":"matching","error":"Activity task already started.","service-error-type":"serviceerror.TaskAlrea

I’m trying to understand that if the low history shards is main issue to make our servers horizontally scalable or how can I manage it with minimal risks.

All updates to a shard share a single lock. So having only 4 shards does limit the cluster throughput.

This might be useful: Choosing the Number of Shards in Temporal History Service | Mikhail Shilkov

Thanks for your reply @maxim I’d like to ask it this history shards can be bottleneck while we run 3 instance of server? I’m asking because I want to understand if updating history shards is must todo to solve the scalability issue or do we have another way to do it? in my scenario just increasing servers count from 1 to 3 worked in staging env but caused errors like above on production

Yes, four shards should never be used for production. Increasing the number of hosts is not going to help as the bottleneck is a DB lock per shard.

1 Like