Exllama v2 vs. llama.cpp (All layes offloaded to GPU)

WinterUsed1120B to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

Will there be a significant difference in speed and quality between LLama v2 GPTQ using Exllama v2 and LLama v2 GGUF using llama.cpp by offloading all the layers to GPU?

You must log in or register to comment.

Chat