Cassandra always consuming max heap memory

Hi,

I have deployed Cassandra on 3 node GKE cluster of 16 cores and 64GB memory each. I have configured Cassandra(3 pods) with max heap size of 16GB.

As you can see in above screenshot, all the pods are using their max heap memory (16GB * 3)even when any workflows are not being executed. And we don’t see any GC occuring either.

For our workflow runs(120 ms fixed due to worker execution time):
30tps: 278.56ms
50tps: 340.49 ms

But anything more than that, the response time goes in seconds. Hence we were suspecting cassandra to be the bottleneck.

Hence need your suggestion.

Hence we were suspecting cassandra to be the bottleneck.

Do you have server metrics and prometheus+grafana set up? Need to look at your persistence latencies:

p95:
histogram_quantile(0.95, sum(rate(persistence_latency_bucket{}[1m])) by (operation, le))

p99
histogram_quantile(0.99, sum(rate(persistence_latency_bucket{}[1m])) by (operation, le))‘’

and focus on CreateWorkflowExecution, UpdateWorkflowExecution, UpdateShard operations.
This would help understand better what could be going on.