I’m using llama models for local inference with Langchain , so i get so much hallucinations with GGML models i used both LLM and chat of ( 7B, !3 B) beacuse i have 16GB of RAM.
So Now i’m exploring new models and want to get a good model , should i try GGUF format ??
Kindly give me suggestions if someone using Local models with langchain at production level .

  • tortistic_turtleB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    GGUF won’t change the level of hallucination, but you are right that most newer language models are quantized to GGUF, so it makes sense to use one.