Is this accurate?

  • tgredditfcB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Thanks for sharing! I have been struggling with llama.cpp loader and GGUF (using oobabooga and the same LLM model), no matter how I set the parameters and how many offloaded layers to GPUs, llama.cpp is way slower to ExLlama (v1&2), not just a bit slower but 1 digit slower. I really don’t know why.