Is there any way to speed up the MythoMax-L2-13B on a 6GB GPU?

OverallBit9 · 1 year ago

Is there any way to speed up the MythoMax-L2-13B on a 6GB GPU?

aseichter2007 · 1 year ago

So, you dont have enough ram to fit that model. It’s actually overrunning your ram entirely and using the wrong kind of vram, virtual ram, aka paged memory.

Idk what you’re trying to do but the best answer is openhermes 2.5 mistral 7B Q3 and 4k context or similar or maybe Rocket 3B Q6 would be even faster.

Hermes is king. I understand why you want that model, but 13bQ8 is huge, 17GBish memory at 8k context.

it will speed up if you get it off the hard drive at least, try a Q3k_l if you’re determined to run mythomax.

Civil_Ranger4687 · 1 year ago

Never use the Q_8 versions of GGUFs unless most/all of the model can comfortably fit into your VRAM. The Q_6 version is much smaller, and almost the same quality.

For your setup, I would use mythomax-l2-13b.Q4_K_M.gguf.

OverallBit9 · 1 year ago

In my tests Q4 is giving me the same amount of tokens as Q5 so I decided to use Q5, first time tesint text gen locally with models, thank you very much for explaining I am getting used to it now and understanding what the settings do.

Civil_Ranger4687 · 1 year ago

Yeah there’s so much to learn I’m still figuring a lot out too.

Good tip for settings: Play around mostly with temperature, top-p, and min-p.