• 1 Post
  • 19 Comments
Joined 11 months ago
cake
Cake day: October 30th, 2023

help-circle

  • candre23BtoLocalLLaMA@poweruser.forum55B Yi model merges
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    It’s a new foundational model, so some teething pains are to be expected. Yi is heavily based on (directly copied, for the most part) llama2, but there are just enough differences in the training parameters that default llama2 settings don’t get good results. KCPP has already addressed the rope scaling, and I’m sure it’s only a matter of time before the other issues are hashed out.



  • Yes, your GPU is too old to be useful for offloading, but you could still use it for prompt processing acceleration at least.

    With your hardware, you want to use koboldCPP. This uses models in GGML/GGUF format. You should have no issue running models up to 120b with that much RAM, but large models will be incredibly slow (like 10+ minutes per response) running on CPU only. Recommend sticking to 13b models unless you’re incredibly patient.