Hi Everyone,
I am using temporal to query Gemini using the batch api with high loads that comes in bursts, i.e. often several thousand workflows/activities will be started simultaneously and then there are long times with near zero activity.
In order to limit the extraction to the Gemini Batch API rates of 100 concurrent batch requests I am wondering what the best approach is. Due to the requirement of concurrency of the workflow ‘MaxTaskQueueActivitiesPerSecond’ is not a workable solution from what I understand about it (might be wrong?). We currently have workflows that are started per user activity (in our case e.g. a document upload) and each workflow would then initialize different child workflows and activities that will call the Gemini (or other model provider) APIs. Based on this discussion
it seems that all Gemini API calls should in either case be done via activities.
I hence currently see 2 ways of implementing this:
-
Via putting all the batch requests into an activity and then limiting the concurrent number of activities (e.g. via a slot supplier
) in the worker. I can do this by scheduling the activities using a specific task queue and create a worker that listens on that task queue.
However, my concern would here be if this limit will be applied across all workers and hence if the API rate limits will be surpassed? My second question here is how I need to tune the different workers to ensure overall best performance (and direction would be helpful)?
In this instance would activities just be queued and I would need to ensure a sufficiently high timeout for each activity? -
By using either a batch sliding window technique
or a semaphore workflow to monitor the number of active queries and globally coordinate this. This seems like the more complicated solution but perhaps more scalable, depending on how well the first works.
I wanted to ask for advise what implementation is generally advised here from others who have already more temporal experience.
Thank you ahead!