Temporal Shards ( configured in 'shards' table )

Hi ,

Could you please provide information on how many concurrent workflows a single Temporal shard can support ? what’s the metric to compute this ? , Is this configurable via Helm charts? Additionally, I would like to know if it’s possible to increase the number of shards post-deployment without affecting existing workflows.

Thanks.

how many concurrent workflows a single Temporal shard can support

this will highly depend on your db, max concurrency is going to be bounded by your db partition size limit and its processing capacity.

processing capacity can be measured via shard lock latency.
you can look at server metric semaphore_latency (available since server release 1.23.0, before that use lock_latency) for example p99:

histogram_quantile(0.99, sum(rate(semaphore_latency_bucket[5m])) by (le))

another thing you can look at is to adjust concurrency of persistence operations in shard context via dynamic config history.shardIOConcurrency, default is 1 (note for cassandra persistence it cannot be set to > 1 currently)

I would like to know if it’s possible to increase the number of shards post-deployment without affecting existing workflows.

no, its not possible unless you start with “fresh” db as well (have to re-index and lose your current data).

since server release 1.20 however you can use multi-cluster replication to replicate between clusters with different shard counts (have to be multiple of each other). this made possible to replicate to a cluster with a much larger shard count allowing you to eventually fail over to the larger cluster

@tihomir I’m on Temporal version 1.23 and I have P95 lock_latency of under 1ms but semaphore_latency_bucket is typically 10ms and peaks at 90ms. The Scaling Temporal the basics post made me think everything is good due to low lock latency but reading your post I’m wondering if I should ignore lock latency and only pay attention to semaphore latency, what would you advise? Also, what is a tolerable semaphore latency?

For extra context, I’m on a cluster with only 8 shards (silly historical choice) which I’m worried is causing me problems but want to make sure I’m using the right metric to measure the issue.

but semaphore_latency_bucket is typically 10ms and peaks at 90ms

correct, semaphore_latency is new metric to use for monitoring processing capacity. lock_latency still exists but its used for something else now

I’m wondering if I should ignore lock latency and only pay attention to semaphore latency

correct

Thank you @tihomir! Should semaphore_latency be less than 5ms ideally less than 1ms, just like the guidance for lock_latency?

Also, any guidance on what we can do with history.shardIOConcurrency? Like how high can we set it or what metrics should we watch for to identify if we have it set optimally? I cannot find any documentation on this setting. Some guidance on how to think about what this setting does would be much appreciated. I’m using Postgres.

Update: I’ve been varying history.shardIOConcurrency (tried values like 2 and 3) and have not noticed it have an affect on semaphore_latency