What factors count for Workflow Schedule to Start Latency?

I did a performance test on a CentOS system with 8 processors and 64 GB memory.
My test case is a simple workflow which contains a single trivial activity.

I found with the concurrency raised up, the throughput hit a limit, around 100 executions/second. A large workflow schedule to start latency was observed, and it increased with the number of concurrent requests, as shown below:

On the other hand, the latency of activity looks acceptable.

I confirmed that all the machine resources did not hit their limits, including cpu/memory/diskIO. According to the CPU usages, it seems none of the temporal components overloaded.

And I tried tuning the parameters mentioned here, but none of them made much difference.

So I want to know what factors affect the Workflow Schedule to Start Latency. Any suggestions that may help to find the bottleneck in the test above would be appreciated.

And I tried tuning the parameters mentioned here, but none of them made much difference.

You can follow a full Workers tuning guide for step by step walk through parameters and the corresponding metrics to check from the Worker side.

I would totally expect the default Worker config to start bump into the maxConcurrentWorkflowTaskExecutionSize value on 100+ workflows/second if you have one worker only. So I would point attention to this parameter and the corresponded worker_task_slots_available metric value.

From a server side somebody else will help better, but it’s a good idea to check db latency metrics.

Sure, I will check the metrics you mentioned.
More informations about my test:
I used to upgraded my Cassandra DB from 3.11 to 4.0.6, and 2x throughput was obtained. But the behavior pattern does not change.
I tried creating more duplicates of each temporal components, but did not make much improvements.