Resource Exhausted Errors : Due to the running worker connecting to specific task queues where tasks are not present

Detailed Context and Problem Statement
We have approximately close to 2000 workers per namespace in a cluster (2 namespace to be specific), with each worker associated with a unique task queue. These workers are long-lived and we intentionally do not stop them after workflow completes.
However, this setup is causing some serious issues:

  • We’re frequently encountering ResourceExhausted errors for these namespaces which has more number of worker and task queues.
  • Our database writer instance is under heavy load, with CPU utilization spiking to 98–99%. (DB specs: db.r6g.large )

It seems like the combination of long running huge worker count and associated unique task queue might be contributing to this resource exhaustion and database strain.
Has anyone else faced similar challenges? Would love to hear suggestions on:

  • How to better manage a large number of long-lived workers/task queues.
  • Best practices for reducing pressure on the DB and avoiding ResourceExhausted errors.
  • Whether our DB instance type may be insufficient for this scale. (if not whats the recommendation for scaling) ?

Any advice or architectural guidance would be greatly appreciated!