Is this accurate?
You must log in or register to comment.
Is this able to use CPU (similar to llama.cpp)?
I’m the author of this article, thank you for posting it! If you don’t want to use Medium, here’s the link to the article on my blog: https://mlabonne.github.io/blog/posts/ExLlamaV2_The_Fastest_Library_to_Run%C2%A0LLMs.html
Can you offload layers with this like GGUF?
I don’t have much VRAM / RAM so even when running a 7B I have to partially offload layers.
It’s not just great. It’s a piece of art.
I wish there was support for metal with ExLlamav2. :(
Hey he finally gets some recognition.
In my experience it’s the fastest and llama.cpp is the slowest.