Have got some server hardware: One i use for games @ 18c36t 3,2Ghz 128Gb RAM (GTX970 so GPU processing is a no-go i assume) the other similar, but will have 256Gb. What’s best for these?

i’m only starting out and don’t understand the terms and measurements yet, but i am in the process and preparing the softwares to try. i would like to focus around the best options available to me.

Thanks

  • candre23B
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Yes, your GPU is too old to be useful for offloading, but you could still use it for prompt processing acceleration at least.

    With your hardware, you want to use koboldCPP. This uses models in GGML/GGUF format. You should have no issue running models up to 120b with that much RAM, but large models will be incredibly slow (like 10+ minutes per response) running on CPU only. Recommend sticking to 13b models unless you’re incredibly patient.

    • andromediansOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Thanks, was preparing a bit already. Have some variations of the software- and Wizard-Vicuna13B-Uncensored.Q5_K_M, which stood out to me for some reason…lzlv (q5km) from a review in this sub, a 70b one (the best). Can alternate depending on need. All the ones from gpt4all, and 7b(+unfiltered) q4 Lora-s, which for some reason are the only ones hosted on torrent sites.

      • candre23B
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        70b models will be extremely slow on pure CPU, but you’re welcome to try. There’s no point in looking on “torrent sites” for LLMs - literally everything is hosted on huggingface.

        • andromediansOPB
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          i now know…what i was looking for was if maybe some disallowed, more powerful versions are on there…nay…time to start exploring soon