Hi! I have a similar setup, 5950x, 64GB Ram and 2x3090’s, how did you manage to load a exl2 120B model?
Hi! I have a similar setup, 5950x, 64GB Ram and 2x3090’s, how did you manage to load a exl2 120B model?
Hi! It’s the first time I’m seeing SPR, any resource where I can learn more about it? I’ve seen privateGPT, I believe it’s a front end that lets you upload files and I guess it build a database using something like chromaDB that learns what you feed it and takes it into consideration when giving answers, is that right?
Interesting! I had more succeed for some reason with gguf models, as those work everywhere using koboldcpp and ooba’s. I didn’t know that exllamasv2 was better for context. I will try it. That backend is for EXL2 formats right? I had the impression it was better for speed, I didn’t know about the context takes up less vram
thanks a lot! I was not sure about how context affected VRAM usage. So each model has a maximum context size and using more will take more vram, thanks!
Does the 200K mean that it has up to 200k context size? Is the context limited by the model or can you just set it to whatever a long as you have enough VRAM. Also, if a GGUF model for example takes 20GB vram for example. That’s with the “default” context size? Can it be less if you decrease the context or more if you increase it ?
Could someone explain what is “Vicuna format”?
What motherboard do you have that can run 3x GPU’s?
Update: I just saw that I had the GPU layers at 0, so it was running all in CPU then?
The slider goes from 0 to 128, how do I know what to pick?
I’m still looking for a good text-to-speech or speech-to-speech that is good and that you can use your own recordings. Any ideas?