Temporal performance with golang microservice, Cassandra & Elasticsearch

Is there any more configuration that can be configured to improve the throughput?

One thing you can check is your persistence latencies. Try Prometheus query:

histogram_quantile(0.95, sum(rate(persistence_latency_bucket{}[1m])) by (operation, le))
for CreateWorkflowExecution, UpdateWorkflowExecution, UpdateShard operations. You will have to establish your own base line for these latencies, but is a good thing to look at.

Another thing to consider is worker capacity (number of pollers).
See this post and our worker tuning guide for more info.

Regarding shard count, try looking at shard lock contention:

histogram_quantile(0.99, sum(rate(lock_latency_bucket{operation="ShardInfo"}[1m])) by (le))

if this latency is too high indicates that your did not set enough shards.

How many frontend servers do you have handling client requests? Typically you want to have a L7 load balancer to round-robin client load among available frontend servers.

Another thing you could look at are resource exhausted issues.
Prometheus query:
sum(rate(service_errors_resource_exhausted{}[1m])) by (resource_exhausted_cause)

Look for error causes: RpsLimit, ConcurrentLimit, and SystemOverloaded. These are per service instance.