Will there be a significant difference in speed and quality between LLama v2 GPTQ using Exllama v2 and LLama v2 GGUF using llama.cpp by offloading all the layers to GPU?
You must log in or register to comment.
Will there be a significant difference in speed and quality between LLama v2 GPTQ using Exllama v2 and LLama v2 GGUF using llama.cpp by offloading all the layers to GPU?