I can provide some high level recommendations so you have a starting point, but mostly you have to run your bench test your usecase to figure out the right sizing for your workload. Here are some of the things to consider when sizing your clusters:
Number Of Shards
Based on your future load characteristics I would recommend to start with atleast 4k history shards. Number of history shards is a setting which cannot be updated after the cluster is provisioned. For all other parameters you could start small and scale your cluster based on need with time but this one you have to think upfront about your maximum load.
Temporal is write intensive and uses LWT feature of Cassandra. 50% of the writes are LWT. I expect your workload to be mostly bounded by database capacity. Nice thing about Cassandra is that it scales horizontally and you just increases the size of cluster for Temporal to scale on throughput. We recommend to use replication factor of 3 for core Temporal keyspace. You want to start with 5 Cassandra hosts each with 16 cores and 32 GB memory.
I don’t think Cassandra implementation for visibility store would scale at the numbers you are looking at. You should start with ElasticSearch as the visibility store. This means you would also need Kafka as ES visibility relies on it to write visibility records in batches to ES store.
Temporal server consists of 4 roles. Although you can run all roles within same process but we highly recommend running them separately as they have completely different concerns and scale characteristics. It also makes it operationally much simpler to isolate problems in production. All of the roles are completely stateless and system scales horizontally as you spin up more instances of role once you identify any bottleneck. Here are some recommendations to use as a starting point:
Frontend: Responsible for hosting all service api. All client interaction goes through frontend and mostly scales with rps for the cluster. Start with 3 instances of 4 cores and 4GB memory.
History: This hosts the workflow state transition logic. Each history host is running a shard controller which is responsible for activating and passivating shards on that host. If you provision a cluster with 4k shards then they are distributed across all available history hosts within the cluster through shard controller. If history hosts are scalability bottleneck, you just add more history hosts to the cluster. All history hosts form its own membership ring and shards are distributed among available nodes in the hash ring. They are quite memory intensive as they host mutable state and event caches. Start with 5 history instances with 8 cores and 8 GB memory.
Matching: They are responsible for hosting TaskQueues within the system. Each TaskQueue partition is placed separately on all available matching hosts. They usually scale with the number of workers connecting for workflow or activity task, throughput of workflow/activity/query task, and number of total active TaskQueues in the system. Start with 3 matching instances each with 4 cores and 4 GB memory.
Worker: This is needed for various background logic for ElasticSearch kafka processor, CrossDC consumers, and some system workflows (archival, batch processing, etc). You can just start with 2 instances each with 4 cores and 4 GB memory.
Hope this will give you a good starting point. Let’s keep updating this thread when you have more specific questions.