I am running a subscription management system on temporal, where each of my subscription is a separate workflow, this is in production for quite some time and severing us very well.
Now with time, we need a way to do bulk cleanup of unused/ underused subscriptions.
In my code, i have methods like
now, when end is invoked we will have to do massive cleanup of our infrastructure, and it is presently handled using a set of activity.
This is all working fine…
Since each of my subscription is a separate workflow, when some one sends end signal to say about 1000 workflows(subscriptions/plans) a massive cleanup is triggered across applications, and it causes many few upstream application to overwhelm,
since the retry policies are similar, again the retries for all the 1000s of workflows happens almost about the same time , causing upstream systems to overwhelm again and again, and causing down times.
What is the best way to enhance my workflows so that not more than 20 or 30 cleanups happens at a time, without disturbing my downstream and upstream systems.
i.e. i would still want to accept bulk end/cleanup signals , but the actual cleanup should happen in phased manner without overwhelming upstream systems.
Should i move the cleanup logic to a sperate workflow with say somethng like
and let the subscription workflow await for a cleanupAckowelgement signal?
Also, i would like to clarify that the actual work done (cleanups in upstream are done by various other teams and they do not use temporal).so all those are actually REST Calls and these REST Calls are overwhelming upstream systems.