Hi Team,
We are facing a situation in executing our workflows in Temporal, where we are seeing increased temporal_workflow_task_schedule_to_start_latency which ranges between 1s to 5000s. Our workflows runs for an elongated period of time, may be from a few days to a week. Our activities are shorlived, which does not take more than 50ms on average and hence we used Local Activities. So far we have tried numerous performance tuning, but to no avail. Our workflow count is fairly high, 1M - 1.5M workflows are in running state at any given time.
- Our workflow_task_schedule_to_start_latency is abnormally high
- The workers are less loaded (CPU < 50%), but memory utilization is ~90 - 100%
- Worker task slots are always available (always ~200)
So, as per the Worker Performance documentation, we have set the following values (in Java SDK):
WorkerOptions workerOptions = WorkerOptions.newBuilder().setMaxConcurrentActivityTaskPollers(100)
.setMaxConcurrentWorkflowTaskPollers(100)
.setDefaultDeadlockDetectionTimeout(30000)
.setMaxConcurrentLocalActivityExecutionSize(1000)
.setMaxConcurrentWorkflowTaskExecutionSize(1000)
.build();
However, so far we haven’t been able to fix the slow running workers issue. Any help will be highly appreciated.
Thanks.