ExLlamaV2: The Fastest Library to Run LLMs

alchemist1e9 · 2 years ago

MonkeyMaster64 · 2 years ago

Is this able to use CPU (similar to llama.cpp)?

mlabonne · 2 years ago

I’m the author of this article, thank you for posting it! If you don’t want to use Medium, here’s the link to the article on my blog: https://mlabonne.github.io/blog/posts/ExLlamaV2_The_Fastest_Library_to_Run%C2%A0LLMs.html

CardAnarchist · 2 years ago

Can you offload layers with this like GGUF?

I don’t have much VRAM / RAM so even when running a 7B I have to partially offload layers.

kpodkanowicz · 2 years ago

It’s not just great. It’s a piece of art.

SomeOddCodeGuy · 2 years ago

I wish there was support for metal with ExLlamav2. :(

a_beautiful_rhind · 2 years ago

Hey he finally gets some recognition.

tgredditfc · 2 years ago

In my experience it’s the fastest and llama.cpp is the slowest.