Hello! We are designing a process that periodically collects metrics from our clients, performs some calculations, and stores the data in an external OLAP system. We feel that we are missing something in terms of scalability. Here are some key details: we have around 600-700k clients, the data is stored in different subsystems, the data itself consists of counters and some non-complex structures, but there are some calculations involved. Currently, the process looks something like this: activity 1 - fetch a chunk of 500 clients, launch a goroutine for each client, each client is processed in its own goroutine, and the process only completes when all goroutines in the chunk have finished, rescheduling itself with the offset for the next chunk. The activities in the goroutines are roughly as follows: activity 2 - fetch metric 1, activity 3 - fetch 10 metrics, activity 4 - perform calculations. Then there is activity 5 (common) that writes the result to the OLAP for all clients at once.
The problem is that we are trying to scale by increasing the number of goroutines, but we have already reached the maximum concurrent activity execution parameter. At the same time, there is no improvement in performance, the load on the Temporal API is almost negligible, and the worker consumes a laughable amount of resources.
What is the best way to scale such workloads? One option is to start a control process first, which will divide the entire client selection into chunks and launch the main processes as subprocesses. In this case, we can significantly reduce the chunk size and not use goroutines, but instead request metrics for all clients in a chunk at once. Are there any other options?