What factors count for Workflow Schedule to Start Latency?

whitecrow · August 29, 2022, 1:15am

I did a performance test on a CentOS system with 8 processors and 64 GB memory.
My test case is a simple workflow which contains a single trivial activity.

I found with the concurrency raised up, the throughput hit a limit, around 100 executions/second. A large workflow schedule to start latency was observed, and it increased with the number of concurrent requests， as shown below:

On the other hand, the latency of activity looks acceptable.

I confirmed that all the machine resources did not hit their limits, including cpu/memory/diskIO. According to the CPU usages, it seems none of the temporal components overloaded.

And I tried tuning the parameters mentioned here, but none of them made much difference.

So I want to know what factors affect the Workflow Schedule to Start Latency. Any suggestions that may help to find the bottleneck in the test above would be appreciated.

spikhalskiy · August 29, 2022, 1:53am

And I tried tuning the parameters mentioned here, but none of them made much difference.

You can follow a full Workers tuning guide for step by step walk through parameters and the corresponding metrics to check from the Worker side.

I would totally expect the default Worker config to start bump into the maxConcurrentWorkflowTaskExecutionSize value on 100+ workflows/second if you have one worker only. So I would point attention to this parameter and the corresponded worker_task_slots_available metric value.

From a server side somebody else will help better, but it’s a good idea to check db latency metrics.

whitecrow · August 29, 2022, 2:14am

Sure, I will check the metrics you mentioned.
More informations about my test:
I used to upgraded my Cassandra DB from 3.11 to 4.0.6, and 2x throughput was obtained. But the behavior pattern does not change.
I tried creating more duplicates of each temporal components, but did not make much improvements.

Topic		Replies	Views
High schedule to start latency Community Support java-sdk , helm , cassandra , metrics	1	766	June 14, 2022
Why is there a high latency to start a workflow Community Support go-sdk , general-impl	6	2369	January 22, 2023
Scaling Concurrent Execution of Activites and Workflows Community Support java-sdk	3	1734	October 20, 2022
Workflows getting stuck after some N workflows with timers Community Support go-sdk , helm , cassandra , cadence	6	1722	May 4, 2021
Performance guidelines Community Support java-sdk , mysql , performance	14	4001	July 10, 2021

What factors count for Workflow Schedule to Start Latency?

Related topics