Proposal of LLM hosted in a co-funded host

DanIngenius · 1 year ago

Proposal of LLM hosted in a co-funded host

georgejrjrjr · 1 year ago

Now that it is possible to host many LoRAs off of one base model, a scrappy intelligence cooperative might:

Standardize on a base model. Ideally one that can fit comfortably on a gaming card with room for batch inference and context.
Share LoRAs.
Host cooperatively, share load w/ DNS.

DanIngenius · 1 year ago

That’s a great idea and approach, how would that work?

georgejrjrjr · 1 year ago

The broad outline:
* You would need an easy way for people to throw their GPU idle time at a cluster, and a reason to do so (i.e., what do your hosts get out of the deal?).

* You need an easy way to ingest datasets for training LoRAs.

* You’d need an automated pipeline to turn those fine-tuning datasets into aligned LoRAs, to be propagated to your inference nodes.

* You’d probably want to think about retrieval, and whether you would like that to be part of the story (and whether it puts you at additional legal risk).

* You’d need a fast inference server with S-LoRA (or whatever the leading method for batch inference with LoRAs is next week).

* You would need an HTTPS server on the front end that terminates TLS for all your endpoints, and routes API requests to the appropriate LoRA.

* You need a way to keep those certificates and inference server addresses up to date in spite of churn.

* You need to figure out your cost model, and revenue sharing model for your hosting providers if applicable, ideally one that doesn’t involve a cryptocurrency unless you have a limitless legal budget and you are based in El Salvador and personal friends with the Bukele family.

From the generality of your question, your best bet would probably be to hire me ;-).

DanIngenius · 1 year ago

Thanks for your detailed reply, I don’t think crowd sourcing GPUs is feasible or desired but the idea of only using different LORAs is interesting, can the LORAs be loaded separately from the models? Be able to load the model once and use two separate LORAs?

georgejrjrjr · 1 year ago

One base model, dozens maybe hundreds of adapters would be the goal.