I have a temporal setup right now running in production and numHistoryShards is 4. (default) and as I know it’s immutable. I want to update it to 1024. However I need to find a way to migarate the running workflows from old v1 server to the new v2. the problem is in my case the running workflows returns `NewContinueAsNewError` back. What is the best and safe way to make this numHistoryShards update in such scenario? For your information, there are approximately 1,000 running workflows and about 700,000 workflows in the ContinueAsNew state.
Use multi-cluster replication to migrate the namespace from one cluster to another.
Change your workflow to call an activity to start the workflow on the new cluster instead of calling continue-as-new. This logic can be implemented generically as an interceptor.
Thanks for your reply! Can having only 4 history shards cause performance bottlenecks or issues with horizontal scalability? I noticed many errors related to shard locks in the logs when I tried running 3 server instances.
There were different kind of errrors in the logs:
{"level":"info","ts":"2025-09-22T10:33:44.535Z","msg":"history client encountered error","service":"matching","error":"Workflow is busy.","service-error-type":"serviceerror.ResourceExhausted","logging-call-at":"metric_client.go:104"}
I’m trying to understand that if the low history shards is main issue to make our servers horizontally scalable or how can I manage it with minimal risks.
Thanks for your reply @maxim I’d like to ask it this history shards can be bottleneck while we run 3 instance of server? I’m asking because I want to understand if updating history shards is must todo to solve the scalability issue or do we have another way to do it? in my scenario just increasing servers count from 1 to 3 worked in staging env but caused errors like above on production