So I’m using the python SDK, and I’m not sure if its something that makes sense… but I’m curious if theres a way to have temporal workers pick up and handle a single workflow from start to finish before taking on another - or some variation of this?
The reason I ask is, we create one workflow per “job” - in our case, a job is about 5 activities, each that progress synchronously one after the other, to process a call recording file and ultimately insert a bunch of extra data into a database. These “jobs” can take anywhere from 1-20 minutes depending on the size and complexity. We’re currently trying to process the past few months worth of data, so we load, say 50k jobs in, each their own workflow. When the workers start picking up activities to accomplish, it’s seemingly semi-randomly picking up the first activity across all of the workflows, then progresses through each activity across the entire workflow queue.
As a result, it takes a very long time before we have any single workflow completed, but /many/ jobs are 30-60% done all at the same time. I think I understand why this is happening, but I’m curious if there are other strategies that would help avoid this outcome. The only strategy I’ve come up with so far is to deliberately only put 5000 jobs in at a time and wait for them to finish before queueing the rest, but maybe there’s a better way?