LocalLLaMA@poweruser.forumEnglish · 1 year ago

Do GGUF not take all the VRAM needed when loaded?

2

1

Do GGUF not take all the VRAM needed when loaded?

LocalLLaMA@poweruser.forumEnglish · 1 year ago

2

Is this normal behavior?

I’m still learning but I noticed that if I load a normal LLM like https://huggingface.co/teknium/OpenHermes-2-Mistral-7B it will take all the VRAM available (I have a 3080 10GB).

But when I load the quantized model like https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GGUF it will take almost nothing of the VRAM, maybe like 1GB?

Is this normal behaviour?

Chat

AaaaaaaaaeeeeeB
link
fedilink
English
arrow-up
1·
1 year ago
for cpu only it is not viewable due to mmap-loading which saves time during startup. to view, use --no-mmap