I saw an idea about getting a big LLM (30/44 Gb) running fast in a cloud server.

What if this server would be scalable in potency and the renting shared in a group of united users?

Some sort of DAO to get it started? Personally i would love to link advanced LMS’s up to SD generation etc. And OpenAI is too sensitive for my liking. What do you think?

  • DanIngeniusOPB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    That’s a great idea and approach, how would that work?

    • georgejrjrjrB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      The broad outline:
      * You would need an easy way for people to throw their GPU idle time at a cluster, and a reason to do so (i.e., what do your hosts get out of the deal?).

      * You need an easy way to ingest datasets for training LoRAs.

      * You’d need an automated pipeline to turn those fine-tuning datasets into aligned LoRAs, to be propagated to your inference nodes.

      * You’d probably want to think about retrieval, and whether you would like that to be part of the story (and whether it puts you at additional legal risk).

      * You’d need a fast inference server with S-LoRA (or whatever the leading method for batch inference with LoRAs is next week).

      * You would need an HTTPS server on the front end that terminates TLS for all your endpoints, and routes API requests to the appropriate LoRA.

      * You need a way to keep those certificates and inference server addresses up to date in spite of churn.

      * You need to figure out your cost model, and revenue sharing model for your hosting providers if applicable, ideally one that doesn’t involve a cryptocurrency unless you have a limitless legal budget and you are based in El Salvador and personal friends with the Bukele family.

      From the generality of your question, your best bet would probably be to hire me ;-).

      • DanIngeniusOPB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Thanks for your detailed reply, I don’t think crowd sourcing GPUs is feasible or desired but the idea of only using different LORAs is interesting, can the LORAs be loaded separately from the models? Be able to load the model once and use two separate LORAs?

        • georgejrjrjrB
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          One base model, dozens maybe hundreds of adapters would be the goal.