Hello everyone !
I’m just starting with temporal (which looks awesome), and I’m trying to solve a specific use case for a POC.
We are running long ML training jobs in docker/python. I wish I could build a workflow that:
1 - Spawn a cloud instance (B)
2 - Start a docker/python job (activity?)
3 - Output job logs in cadence UI (activity?)
4 - Kill the instance
Context:
- We use go-sdk
- We do not use kubernetes yet …
- We are using GCP
Hypothesis:
- Temporal service would run on an instance A
- Temporal main worker would run on instance A
- I should dynamically start a temporal worker on instance B to wrap/monitor docker/python job locally rather than remotely through SSH. But I suspect there is something wrong with the idea, since I cannot guarantee local job/worker binding when there is multiple jobs running in parallel.
Any ideas on the best practice to achieve that ? (I might be thinking wrong)
Thank you very much, have a great day.