Temporal throughput not improving

Uma · May 28, 2021, 7:14am

We are considering Temporal as a solution for our firm, as part of the same we are performing Temporal performance benchmarking with workflow design of each parent workflow have 50 child workflows and each child workflow will call 2 synchronous sequential activities. Overall time taken to perform both activity is around 30 ms.

We will be benchmarking for workload up to 600 K Parent workflows

Deployed resources:
Temporal services

History - 8 instances with 8 cores, 8GB memory each
Frontend - 8 instances with 6 cores, 4GB memory each
Matching - 6 instances with 4 cores, 4GB memory each
No Server workers
Cassandra as persistent store - Multi node cluster with sufficient resources
ES for visibility - Multi node cluster sufficient resources

Application workers are separate for Workflow(parent worker & child worker are on same instance) and activity

Workflow services - 10 instances with 3 cores, 4GB memory each
Activity services - 15 instances with 3 cores, 5GB memory each

Tried with multiple workload volume few of them are:

14K Parent workflow , Each with 50 child workflows,
Results/Finding :

Parent workflow throughput ~12 wf/sec.
Child workflow throughput ~ 150 wf/sec.
Total run time : 70 minutes to complete all parents

100K Parent workflow , each with 50 child workflows
Results/Finding

Parent workflow throughput ~ 2 wf/sec.
Child workflow throughput - 145 wf/sec. (Only around 470K completed)
Total run time : ~9 hours to complete all parents
We have observed the throughput towards the end of runtime is lesser and some of workflows are taking very long time to complete.

Expectation :
Parent workflow throughput ~ 170 wf/sec
Child workflow throughput ~ 4000 wf/sec

Queries:

Is the expected throughput achievable ?
Is there any workflow-activity design change will help to achieve such throughput ? will converting the parent-child workflow to single workflow just calling activity be helpful ?
We have done multiple combinations of vertical and horizontal scaling of the temporal services and workflow and activity services. We have not seen significant throughput improvement . Is there any known bottleneck ?
What should be desirable combination of number of task queues and their partitions with number of workers running for the task queue for workflow and activity ?
We have tried changing the below default configuration
WorkerOptions.setMaxConcurrentActivityExecutionSize from 200 to 2000
WorkerOptions.setMaxConcurrentWorkflowTaskExecutionSize 200 to 2000
WorkerOptions.setWorkflowPollThreadCount - 2 to 10
WorkerOptions.setActivityPollThreadCount - 5 to 20
we have not seen significant throughput improvement. What is suitable configuration for such load ?

Wenquan_Xing · May 28, 2021, 7:51am

what is the size of your cassandra cluster?
child workflow is comparably expensive than activity.
make sure numHistoryShards is large enough, try 16K as a start, ref: temporal/development.yaml at v1.9.2 · temporalio/temporal · GitHub
closely monitor the CPU / mem util of your setup, i guess the existing capacity is not enough (after changing the number of shards above, see 3)
parent to child ratio is 1:50, so i would expect the Expectation section to also follow the same ratio?
maybe worth coming to our slack channel and talk about your workflow design
we also have cloud, if you are interested

chaitanya · October 2, 2022, 6:53pm

Did you ever get this resolved? I am looking at similar numbers for my temporal workflows. I am wondering what setup and what config worked for you.

Could please share?

Topic		Replies	Views
Temporal throughput Community Support general-impl , best-practices	16	5717	January 20, 2025
Temporal seems to hit scale wall Community Support performance	6	3429	March 29, 2024
Tuning Temporal setup for better performance Community Support cassandra , performance , kubernetes	5	8940	November 13, 2021
High Activity Latency Community Support	2	516	March 21, 2021
Suggestions to increase worker throughput Community Support	7	2038	December 10, 2020

Temporal throughput not improving

Related topics