Hey @Akshay_Goyal,
I can provide some high level recommendations so you have a starting point, but mostly you have to run your bench test your usecase to figure out the right sizing for your workload. Here are some of the things to consider when sizing your clusters:
Number Of Shards
Based on your future load characteristics I would recommend to start with atleast 4k history shards. Number of history shards is a setting which cannot be updated after the cluster is provisioned. For all other parameters you could start small and scale your cluster based on need with time but this one you have to think upfront about your maximum load.
Core Database
Temporal is write intensive and uses LWT feature of Cassandra. 50% of the writes are LWT. I expect your workload to be mostly bounded by database capacity. Nice thing about Cassandra is that it scales horizontally and you just increases the size of cluster for Temporal to scale on throughput. We recommend to use replication factor of 3 for core Temporal keyspace. You want to start with 5 Cassandra hosts each with 16 cores and 32 GB memory.
Visibility Database
I don’t think Cassandra implementation for visibility store would scale at the numbers you are looking at. You should start with ElasticSearch as the visibility store.
Temporal Roles
Temporal server consists of 4 roles. Although you can run all roles within same process but we highly recommend running them separately as they have completely different concerns and scale characteristics. It also makes it operationally much simpler to isolate problems in production. All of the roles are completely stateless and system scales horizontally as you spin up more instances of role once you identify any bottleneck. Here are some recommendations to use as a starting point:
Frontend: Responsible for hosting all service api. All client interaction goes through frontend and mostly scales with rps for the cluster. Start with 3 instances of 4 cores and 4GB memory.
History: This hosts the workflow state transition logic. Each history host is running a shard controller which is responsible for activating and passivating shards on that host. If you provision a cluster with 4k shards then they are distributed across all available history hosts within the cluster through shard controller. If history hosts are scalability bottleneck, you just add more history hosts to the cluster. All history hosts form its own membership ring and shards are distributed among available nodes in the hash ring. They are quite memory intensive as they host mutable state and event caches. Start with 5 history instances with 8 cores and 8 GB memory.
Matching: They are responsible for hosting TaskQueues within the system. Each TaskQueue partition is placed separately on all available matching hosts. They usually scale with the number of workers connecting for workflow or activity task, throughput of workflow/activity/query task, and number of total active TaskQueues in the system. Start with 3 matching instances each with 4 cores and 4 GB memory.
Worker: This is needed for various background logic for ElasticSearch kafka processor, CrossDC consumers, and some system workflows (archival, batch processing, etc). You can just start with 2 instances each with 4 cores and 4 GB memory.
Hope this will give you a good starting point. Let’s keep updating this thread when you have more specific questions.
Our Cassandra setup:
We are running our own Cassandra setup on Kubernetes .Our primary Cassandra cluster setup has 5 nodes (hosted in 1 AZ) and we have a backup of 3 nodes cluster which acts as DR (in another AZ).
Have a few followup questions setting up Cassandra for Cadence/Temporal:
On Cassandra, what should be the read and write level?
What latency can we expect on the client side and does Cadence/Temporal expose any metrics for the same?
How does tombstoning have an impact on the performance
What is the recommended backup frequency?
Can you pls share the number of reads per second and writes per second that Cadence does with Cassandra. This will help us a lot while provisioning the infra for our Cassandra cluster (at optimal cost).
Follow up question on the memory and CPU cores. Does each of the 5 instances have to have 8 cores and 8 GB Memory? Can I split the total provisioned cores and memory into more number of history pods?
Hi @samar
Our scale requirements change from 20-30 wps for now to maybe ~500 wps in say next 10-12 months.
We’re also very conservative about our infra costs and want to optimise it as much as possible, without impacting uptime.
Q6: Now if we were to set up 4k shards, does that have any implications on the infra we provision for both Cadence components or Cassandra?
Q7: What is the minimum setup that I can run to support 30-50 tps on day 1 and have the ability to scale it up to 500 tps when required.
Q8: Also would love to understand what metric should we look at to identify what component we need to scale.
On Cassandra, what should be the read and write level?
I guess you mean read/write consistency. gocql driver used for cassandra defaults to LOCAL_QUORUM and LOCAL_SERIAL.
What latency can we expect on the client side and does Cadence/Temporal expose any metrics for the same?
Temporal emits tons of metric both on server and client. Look at existing threads on the forum on this topic. Here are some to get you started:
How does tombstoning have an impact on the performance
We have tested Cassandra schema very carefully and it is throughly tested for sustained load for long periods of time. Cassandra tombstone characteristics should not have an impact on performance characteristics of the server. Watch my architecture talk which touches this topic.
What is the recommended backup frequency?
Ideally we recommend to run the server with cross data center replication. We have our own replication stack and does not rely on database level replication for various reasons. It is completely upto you for backup frequency for disaster scenario. Basically it comes down to your business requirements rather than any requirements enforced by Temporal. I would rely on best practice recommended by Cassandra.
Can you pls share the number of reads per second and writes per second that Cadence does with Cassandra. This will help us a lot while provisioning the infra for our Cassandra cluster (at optimal cost).
We don’t have any benchmarks to share at this point. I would recommend to run a bench run specifically for your use case to generate those numbers.
Now if we were to set up 4k shards, does that have any implications on the infra we provision for both Cadence components or Cassandra?
Shards are very lightweight. There are no real implications on the cost of clusters. We have tested the system upto 16k shards.
Q7: What is the minimum setup that I can run to support 30-50 tps on day 1 and have the ability to scale it up to 500 tps when required.
30-50 tps is pretty low scale from our perspective. More than performance the size of our cluster would be driven by availability requirements. The rough numbers I provided earlier is a good place to start and going anything below those numbers may have implications on availability of the clusters during certain failure scenarios.
Q8: Also would love to understand what metric should we look at to identify what component we need to scale.
Please refer to the metric posts I linked earlier.
We send out a bi-weekly update on our stabilization testing which covers details on the kind of failure testing. Here is one which give you an idea on failure scenarios we are testing for our V1 production release.