Alien Top
  • Communities
  • Create Post
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
bullerwinsB to LocalLLaMA@poweruser.forumEnglish · 2 years ago

Do GGUF not take all the VRAM needed when loaded?

message-square
message-square
2
link
fedilink
1
message-square

Do GGUF not take all the VRAM needed when loaded?

bullerwinsB to LocalLLaMA@poweruser.forumEnglish · 2 years ago
message-square
2
link
fedilink

Is this normal behavior?

I’m still learning but I noticed that if I load a normal LLM like https://huggingface.co/teknium/OpenHermes-2-Mistral-7B it will take all the VRAM available (I have a 3080 10GB).

But when I load the quantized model like https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GGUF it will take almost nothing of the VRAM, maybe like 1GB?

Is this normal behaviour?

alert-triangle
You must log in or register to comment.
  • AaaaaaaaaeeeeeB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    for cpu only it is not viewable due to mmap-loading which saves time during startup. to view, use --no-mmap

  • bullerwinsOPB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    Update: I just saw that I had the GPU layers at 0, so it was running all in CPU then?
    The slider goes from 0 to 128, how do I know what to pick?

    https://preview.redd.it/snrkzjg43v1c1.png?width=1442&format=png&auto=webp&s=b356f72d5deaa5a49e19fbf3e91d0c22e2bc333b

LocalLLaMA@poweruser.forum

localllama@poweruser.forum

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !localllama@poweruser.forum

Community to discuss about Llama, the family of large language models created by Meta AI.

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 4 users / day
  • 4 users / week
  • 4 users / month
  • 4 users / 6 months
  • 3 local subscribers
  • 4 subscribers
  • 1.03K Posts
  • 5.96K Comments
  • Modlog
  • mods:
  • communick@poweruser.forum
  • BE: 0.19.11
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org