We have a use case that one activity is a long-running GPU-based ML process. We are running Temporal inside K8 cluster and the idea is to wrap worker executing this activity as K8 Job and use Keda ScaledJob for the autoscaling. Therefore, the worker should just pick one task, process it (potentially long-running) and then terminate.
How to achieve this (or something similar) in Temporal (python SDK)?
Right now, Temporal workers are meant to be always running. Scale-to-zero does not work well with Temporal workers at this time. There is not a good, reliable way today to know there is a backlog of pending activities across a namespace.
When you schedule the activity from the workflow, you can also orchestrate what is needed to start a worker to handle that activity. Most users run an activity on a worker (on the workflow task queue, not the resource-constrained task queue) that does whatever is needed to start the worker on the resource-constrained task queue to run the activity.