Process scaling

Hello! We are designing a process that periodically collects metrics from our clients, performs some calculations, and stores the data in an external OLAP system. We feel that we are missing something in terms of scalability. Here are some key details: we have around 600-700k clients, the data is stored in different subsystems, the data itself consists of counters and some non-complex structures, but there are some calculations involved. Currently, the process looks something like this: activity 1 - fetch a chunk of 500 clients, launch a goroutine for each client, each client is processed in its own goroutine, and the process only completes when all goroutines in the chunk have finished, rescheduling itself with the offset for the next chunk. The activities in the goroutines are roughly as follows: activity 2 - fetch metric 1, activity 3 - fetch 10 metrics, activity 4 - perform calculations. Then there is activity 5 (common) that writes the result to the OLAP for all clients at once.

The problem is that we are trying to scale by increasing the number of goroutines, but we have already reached the maximum concurrent activity execution parameter. At the same time, there is no improvement in performance, the load on the Temporal API is almost negligible, and the worker consumes a laughable amount of resources.

What is the best way to scale such workloads? One option is to start a control process first, which will divide the entire client selection into chunks and launch the main processes as subprocesses. In this case, we can significantly reduce the chunk size and not use goroutines, but instead request metrics for all clients in a chunk at once. Are there any other options?

| we have already reached the maximum concurrent activity execution parameter

Assume you are talking about workflow.Options->MaxConcurrentActivityExecutionSize (default 1K), if not please let us know.

Yes one part of scaling your use case would include worker scaling as in optimizing your worker based on worker options such as MaxConcurrentActivityExecutionSize and poller counts as well as their resources (cpu/mem). Once optimized you would scale out by number of worker processes.

Your worker resources as well as you activities will play a role in how many activities you can process concurrently. There is a number of ways you can approach your batch use case.
One way could be iterator patter where you could start a child workflow for a batch of clients, when completed parent can continueasnew and start the next batch handled by child workflows again.

Another one could be via a long-running, heartbeating activity. This might be useful when you want to process a very large set of records and don’t want to worry about workflow execution event count limits but implies that you would need to deal with parallelization yourself. Heartbeat details would allow you to continue processing from where you left off in case of activity worker restart for example.

Another option could be a “sliding window” approach, see sample here that would allow you to set parallelization on single execution and can also scaly by number of these sliding window workflows.

Either way I think optimizing your workers to be able to process as many activities concurrently will play a role in your overall processing time of all records.