Navigating through the internal of workflow lifecycle

Abhishek_Chatterjee · April 7, 2024, 8:23am

I have setup a single node temporal server and single node worker client backedby postgres. Its a SSD 8 core machine and I want to achieve 500TPS on a workflow with 3 activities each is doing a REST API call (mainly all IO operations). The REST API are hosted on the same network but different machine and is able to provide more than 700TPS on their own.
Below is my findings,

network is healthy (with 500TPS load)
DB is healthy - but I can see only 4-6 active connections with 50 idle connection always
irrespective of load (I tried with 10,100 and 500 concurrent requests), I am getting is 20TPS consistently wth max 300% CPU utilization.
as the load is 10 request/second the avg response time is 300-400ms where as it goes to 4sec when load is 500 req/sec

Below is my configuration,

Server Level
- history.numTasklistPartitions: 64
- history.persistence.numHistoryShards: 32
- history.defaultWorkflowTaskTimeout: “10s”
- matching.numTasklistPartitions: 64
- worker.taskQueue.activitiesPerSecond: 500
- worker.taskQueue.activitiesPerTaskQueue: 500
- worker.maxConcurrentActivityExecutionSize: 1000
- worker.maxConcurrentWorkflowTaskExecutionSize: 1000
- persistence.sql.maxConns: 250
- persistence.sql.maxIdleConns: 10
- persistence.sql.maxOpenConns: 200
Worker (Client) Level (refer: community forum)
- WorkerOptions#workflowPollThreadCount: 40
- WorkerOptions#activityPollThreadCount: 80
- WorkerOptions#maxConcurrentWorkflowTaskExecutionSize: 20
- WorkerOptions#maxConcurrentActivityExecutionSize: 40
- WorkerFactoryOptions#maxWorkflowThreadCount: 200
- WorkerFactoryOptions#workflowCacheSize: 20

It seems that the latency is somewhere between temporal server and client (task queue level?).

Is my configuration is enough to support 500tps or I am missing something. Actually I want to measure, how much TPS I can get with this single node setup (after appropriate tunning) so that I can extrapolate accordingly.
why the number of active db connection is always 4-6 having 50 idle connection despite of load when I have configured 32 shard count?

Please advice.

Topic		Replies	Views
Temporal throughput not improving Community Support cassandra , metrics	2	1104	October 2, 2022
Workflow Performance with Java SDK Community Support java-sdk	1	730	February 20, 2023
Temporal workflow piling up with high load. Not completing quick enough Community Support python-sdk , performance , workflow-options , postgresql	1	237	August 2, 2024
Temporal and concurrency Community Support mysql , scaling , performance	4	2266	July 10, 2020
Which database has the highest performance Community Support	0	1227	April 17, 2023

Navigating through the internal of workflow lifecycle

Related topics