Low TPS (~60) Despite High Concurrency Settings - Need Help Identifying Bottleneck

vishu.nkkb · January 8, 2026, 4:21pm

Hi Temporal Community,

I’m experiencing unexpectedly low throughput (approximately 60 TPS) in my Temporal deployment, and scaling up pods doesn’t seem to improve performance. I’d appreciate any guidance on identifying the bottleneck.

Workflow Details:

Simple test workflow with 2 sequential activities (one logs “hello”, the other logs “world”)
Using Temporal SDK v1.38.0 with Go

Deployment Architecture:

Running Temporal services in separate pods (not all-in-one)
2 pods per service type (Frontend, History, Matching, Worker, ui, worker-app ( it has the activty and workflow configurations )
Each service has its own dedicated pods for better isolation

Temporal Server Configuration (per service pod):

Frontend Service:

no extra configurations, just membershipPort, grpcPort etc ports releted settings

History Service:

persistenceMaxQPS: 3000
persistenceGlobalMaxQPS: 0

Matching Service:

no extra configurations, just membershipPort, grpcPort etc ports releted settings

Persistence Layer:

History shards: 64
Database: PostgreSQL 12
Default store connections: maxConns=50, maxIdleConns=20, maxConnLifetime=1h
Visibility store connections: maxConns=400, maxIdleConns=100, maxConnLifetime=1h

Worker Configuration (per worker pod):

MaxConcurrentWorkflowTaskExecutionSize: 1000
MaxConcurrentActivityExecutionSize: 1000
MaxConcurrentLocalActivityExecutionSize: 1000
MaxConcurrentWorkflowTaskPollers: 64
MaxConcurrentActivityTaskPollers: 64
WorkerActivitiesPerSecond: not configured
have one taskqueue and all flow starts on the same, both pods of temporal-worker-app will have their one - one workers with above configurations,

Issue:
Despite these high concurrency settings and having 2 pods for each service type, I’m only achieving ~60 TPS. When I scale up the number of pods (tried increasing worker pods), the TPS remains the same, suggesting I’ve hit some kind of ceiling or bottleneck.

Questions:

With 64 history shards, 2 History service pods, and these worker settings, what could be limiting throughput to 60 TPS?
Could the History service’s persistenceMaxQPS of 3000 be throttling at a lower level?
Is the database connection pool (50 connections for default store) the bottleneck? Should I increase this?
With 2 pods per service, are there any service-level rate limits or configurations I’m missing?
Should I increase history shards beyond 64 for better parallelism across the 2 History pods?
Could the Matching service be the limiting factor with only 2 pods?

Also if any details are required please let me know,

Any insights would be greatly appreciated.

Few graphs you may refer
histogram_quantile(0.99, sum(rate(cache_latency_bucket{operation=“HistoryCacheGetOrCreate”}[1m])) by (le))

histogram_quantile(0.99, sum(rate(lock_latency_bucket{operation=“ShardInfo”}[1m])) by (le))

Also I see, getTaskQueueUserData, PollActivityTaskqueue, PollWorkflowTaskqueue is taking too long. is this expected?

Topic		Replies	Views
Temporal seems to hit scale wall Community Support performance	6	3713	March 29, 2024
Suggestions to increase worker throughput Community Support	7	2188	December 10, 2020
Temporal is slow to start burst of 1000s of workflows Server Deployment go-sdk	0	190	January 22, 2025
Throughput and scaling of temporal workers Server Deployment go-sdk	0	121	December 17, 2024
Seeing high latencies between two subsequent activity task executions Community Support java-sdk , cassandra	22	3128	July 19, 2022

Low TPS (~60) Despite High Concurrency Settings - Need Help Identifying Bottleneck

Related topics