Models Megathread #2 - What models are you currently using?

Technical_Leather949 · 2 years ago

Models Megathread #2 - What models are you currently using?

LeanderGem · 2 years ago

I’m really digging https://huggingface.co/TheBloke/PsyMedRP-v1-20B-GGUF for storytelling. I wish I could use a higher GGUF but it’s all that I can manage atm.

tortistic_turtle · 2 years ago

openhermes 2.5 as an assistant

tiefighter for other use

ttkciar · 2 years ago

Mostly I’m still using slightly older models, with a few slightly newer ones now:

marx-3b-v3.Q4_K_M.gguf for “fast” RAG inference,
medalpaca-13B.ggmlv3.q4_1.bin for medical research,
mistral-7b-openorca.Q4_K_M.gguf for creative writing,
NousResearch-Nous-Capybara-3B-V1.9-Q4_K_M.gguf for creative writing, and probably for giving my IRC bots conversational capabilities (a work in progress),
puddlejumper-13b-v2.Q4_K_M.gguf for physics research, questions about society and philosophy, “slow” RAG inference, and translating between English and German,
refact-1_6b-Q4_K_M.gguf as a coding copilot, for fill-in-the-middle,
rift-coder-v0-7b-gguf.git as a coding copilot when I’m writing python or trying to figure out my coworkers’ python,
scarlett-33b.ggmlv3.q4_1.bin for creative writing, though less than I used to.

I also have several models which I’ve downloaded but not yet had time to evaluate, and am downloading more as we speak (though even more slowly than usual; a couple of weeks ago my download rates from HF dropped roughly in third, and I don’t know why).

Some which seem particularly promising:

yi-34b-200k-llamafied.Q4_K_M.gguf
rocket-3b.Q4_K_M.gguf
llmware’s “bling” and “dragon” models. I’m downloading them all, though so far there are only GGUFs available for three of them. I’m particularly intrigued at the prospect of llmware-dragon-falcon-7b-v0-gguf which is tuned specifically for RAG and is supposedly “hallucination-proofed”, and llmware-bling-stable-lm-3b-4e1t-v0-gguf which might be a better IRC-bot conversational model.

Of all of these, the one I use most frequently is PuddleJumper-13B-v2.

Illustrious_Sand6784 · 2 years ago

Golaith-120B (specifically the 4.85 BPW quant) is the only model I use now, I don’t think I can go back to using a 70B model after trying this.

swagonflyyyy · 2 years ago

Mistral-7B-Instruct 4_K quant and openhermes2.5-7B-mistral 4_K quant. Still testing the waters but starting with these two first.

DrVonSinistro · 2 years ago

Because a model can be divine or crap with some settings, I think its important I specify that I use:

Deepseek 33b q8 gguf with the Min-p setting (I love it very much)

Source of my Min-p settings: (1) Your settings are (probably) hurting your model - Why sampler settings matter : LocalLLaMA (reddit.com)

Secret_Joke_2262 · 2 years ago

70b Storytelling q5 k m

HvskyAI · 2 years ago

I’m late to the party on this one.

I’ve been loving the 2.4BPW EXL2 quants from Lone Striker recently, specifically using Euryale 1.3 70B and LZLV 70B.

Even at the smaller quant, they’re very capable, and leagues ahead of smaller models in terms of comprehension and reasoning. Min-P sampling parameters have been a big step forward, as well.

The only downside I can see is the limitation to context length on a single 24GB VRAM card. Perhaps further testing of Nous-Capyabara 34B at 4.65BPW on EXL2 is in order.

multiverse_fan · 2 years ago

What would have happened if ChatGPT was invented in the 17th century? MonadGPT is a possible answer.

TheBloke/MonadGPT-GGUF