Need for help: Recommended Cluster Configuration in production

chensha · October 31, 2022, 2:31am

Cluster:
frontend 8c 32G x 1, such large memory is used to meet the amout of long polling requests.
history 1c 2G x 20, use horizontal scaling to avoid too much work on dynamicconfig tuning…
matching 1c 2G x 20,
worker 1c 2G x 3
cassandra 4c 8G x 5
bench-worker 2C 4G x 20，the throughput for aggregator worker depends on its poller concurrency, so I make the worker options：
MaxConcurrentActivityTaskPollers: 16
MaxConcurrentDecisionTaskPollers: 32

Just use basic-load-test-workflow, 16 tasklists and each start a basic workflow using the config below:
{
“useBasicVisibilityValidation”: true,
“contextTimeoutInSeconds”: 10,
“failureThreshold”: 0.01,
“totalLaunchCount”: 10000,
“routineCount”: 8,
“waitTimeBufferInSeconds”: 30,
“chainSequence”: 1,
“concurrentCount”: 1,
“payloadSizeBytes”: 256,
“executionStartToCloseTimeoutInSeconds”: 600
}

Cassandra CPU usage is 80%+, and I got this: 600TPS in total, TPS for one core is < 50, I don’t think is good enough, ask for help.

My apology, we are using cadence, v0.23.2, but I think tuning method is similar for same architecture.

tihomir · October 31, 2022, 1:46pm

Not sure if we can provide specific tuning tips for Cadence, were you able to reach out to their community?

Were you able to use metrics to try to pinpoint possible bottleneck(s) (server metrics persistence latencies, sync match rate, workflow/shard lock contention, sdk metrics activity and workflow task schedule to start latencies).
Also not sure what number of history shards you configure.

chensha · November 1, 2022, 2:46am

sync match rate is almost 1.0 and history shard is 8192.
What tps is expected to achieve for one core cassandra under basic-load-test-workflow , e.g. 1 replica factor, is 100, 3 replica factor is 80 ?

Topic		Replies	Views
Throughput not improving Community Support	2	625	October 31, 2022
Tuning Temporal setup for better performance Community Support cassandra , performance , kubernetes	5	8878	November 13, 2021
Temporal throughput Community Support general-impl , best-practices	16	5682	January 20, 2025
Suggestions to increase worker throughput Community Support	7	2026	December 10, 2020
What is the recommended setup for running Cadence/Temporal with Cassandra on production? Community Support cassandra , production	10	16273	November 21, 2023

Need for help: Recommended Cluster Configuration in production

Related topics