Cannot figure out how to reduce the activity start latency

We have an application that uses temporal to manage many parallel activities that are called by one workflow - for example we have on workflow that simultaneously needs to kick off 5k activities.

After man hours of changing many settings, we cannot seem to reduce the latency for these activities being started.

We’ve run a simple experiment - 1 workflow with 1k parallel executed activities that simply wait for 1 second - and its taking on average 1 minute for it to finish.

Things we’ve tried without success:

  1. increasing workers
  2. Increasing poller number
  3. increasing history shards
  4. increasing concurrency limit on activities

No matter what after setting up grafana monitoring and just looking at what’s happening, we get 1k activities that get picked up relatively immediatley but just waqit to start for a very long time as a bottleneck gets worked through.

Any help would be greatly appreciated :folded_hands: - i’ve now spent 2 days trying to figure out this bottleneck without success

Not sure if you have looked at setMaxTaskQueueActivitiesPerSecond parameter, this value ensures that there are enough activities queued up for workers to pick.

You can set it while defining Worker as shown below:

        Worker worker = factory.newWorker("RateLimitedQueue", WorkerOptions.newBuilder()
                        .setMaxConcurrentWorkflowTaskExecutionSize(x)
                        .setMaxWorkerActivitiesPerSecond(y)
                        .setMaxTaskQueueActivitiesPerSecond(z)
                        .build()

In my testing I have found increasing setMaxTaskQueueActivitiesPerSecond reduces waiting time in between activities.