LocalLLaMA@poweruser.forumEnglish · 2 years ago

M1/M2/M3: increase VRAM allocation with `sudo sysctl iogpu.wired_limit_mb=12345` (i.e. amount in mb to allocate)

1

M1/M2/M3: increase VRAM allocation with `sudo sysctl iogpu.wired_limit_mb=12345` (i.e. amount in mb to allocate)

LocalLLaMA@poweruser.forumEnglish · 2 years ago

If you’re using Metal to run your llms, you may have noticed the amount of VRAM available is around 60%-70% of the total RAM - despite Apple’s unique architecture for sharing the same high-speed RAM between CPU and GPU.

It turns out this VRAM allocation can be controlled at runtime using sudo sysctl iogpu.wired_limit_mb=12345

See here: https://github.com/ggerganov/llama.cpp/discussions/2182#discussioncomment-7698315

Previously, it was believed this could only be done with a kernel patch - and that required disabling a macos security feature … And tbh that wasn’t that great.

Will this make your system less stable? Probably. The OS will need some RAM - and if you allocate 100% to VRAM, I predict you’ll encounter a hard lockup, spinning Beachball, or just a system reset. So be careful to not get carried away. Even so, many will be able to get a few more gigs this way, enabling a slightly larger quant, longer context, or maybe even the next level up in parameter size. Enjoy!

Chat

fallingdowndizzyvrB
link
fedilink
English
arrow-up
1·
2 years ago

So I guess the default is: 49152

It is. To be more clear, llama.cpp tells you want the recommendedMaxWorkingSetSize is. Which should match that number.