ExLlamaV2: The Fastest Library to Run LLMs

alchemist1e9 · 1 year ago

ExLlamaV2: The Fastest Library to Run LLMs

tgredditfc · 1 year ago

Thanks for sharing! I have been struggling with llama.cpp loader and GGUF (using oobabooga and the same LLM model), no matter how I set the parameters and how many offloaded layers to GPUs, llama.cpp is way slower to ExLlama (v1&2), not just a bit slower but 1 digit slower. I really don’t know why.

ExLlamaV2: The Fastest Library to Run LLMs

ExLlamaV2: The Fastest Library to Run LLMs

Just a moment...