Can anyone recommend a simple system designed to spin up short-lived cloud VMs with GPUs for inference? Something that can automatically spin up the VM and start services, and then turn it off when it’s no longer in use?
I’d like to be able to run my own models without censorship from the inference-as-a-service providers an I don’t mind paying a few dollars/hour while I’m actively using the LLMs, but I don’t want to forget to turn it off and buy myself 24/7 uptime for $thousands/mo. I’m comfortable with technical solutions (running scripts, Ansible playbooks, etc) but want something that’s as seamless and fast to start as reasonably possible and also guards against just forgetting to turn the damn thing off.