As requested, this is the subreddit’s second megathread for model discussion. This thread will now be hosted at least once a month to keep the discussion updated and help reduce identical posts.

I also saw that we hit 80,000 members recently! Thanks to every member for joining and making this happen.


Welcome to the r/LocalLLaMA Models Megathread

What models are you currently using and why? Do you use 7B, 13B, 33B, 34B, or 70B? Share any and all recommendations you have!

Examples of popular categories:

  • Assistant chatting

  • Chatting

  • Coding

  • Language-specific

  • Misc. professional use

  • Role-playing

  • Storytelling

  • Visual instruction


Have feedback or suggestions for other discussion topics? All suggestions are appreciated and can be sent to modmail.

^(P.S. LocalLLaMA is looking for someone who can manage Discord. If you have experience modding Discord servers, your help would be welcome. Send a message if interested.)


Previous Thread | New Models

  • ttkciarB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Mostly I’m still using slightly older models, with a few slightly newer ones now:

    • marx-3b-v3.Q4_K_M.gguf for “fast” RAG inference,

    • medalpaca-13B.ggmlv3.q4_1.bin for medical research,

    • mistral-7b-openorca.Q4_K_M.gguf for creative writing,

    • NousResearch-Nous-Capybara-3B-V1.9-Q4_K_M.gguf for creative writing, and probably for giving my IRC bots conversational capabilities (a work in progress),

    • puddlejumper-13b-v2.Q4_K_M.gguf for physics research, questions about society and philosophy, “slow” RAG inference, and translating between English and German,

    • refact-1_6b-Q4_K_M.gguf as a coding copilot, for fill-in-the-middle,

    • rift-coder-v0-7b-gguf.git as a coding copilot when I’m writing python or trying to figure out my coworkers’ python,

    • scarlett-33b.ggmlv3.q4_1.bin for creative writing, though less than I used to.

    I also have several models which I’ve downloaded but not yet had time to evaluate, and am downloading more as we speak (though even more slowly than usual; a couple of weeks ago my download rates from HF dropped roughly in third, and I don’t know why).

    Some which seem particularly promising:

    • yi-34b-200k-llamafied.Q4_K_M.gguf

    • rocket-3b.Q4_K_M.gguf

    • llmware’s “bling” and “dragon” models. I’m downloading them all, though so far there are only GGUFs available for three of them. I’m particularly intrigued at the prospect of llmware-dragon-falcon-7b-v0-gguf which is tuned specifically for RAG and is supposedly “hallucination-proofed”, and llmware-bling-stable-lm-3b-4e1t-v0-gguf which might be a better IRC-bot conversational model.

    Of all of these, the one I use most frequently is PuddleJumper-13B-v2.