First time testing local text model I don’t know much yet.I’ve seen people with 8GB cards complaining that text generation is very slow so I don’t have much hope about that but still… I think I need to do some configuration, when generating text my SSD is at 100% reading 1~2gb/s while my GPU does not reach 15% usage.
Using RTX 2060 6GB, 16GB RAM.
This is the model I am testing ( mythomax-l2-13b.Q8_0.gguf): https://huggingface.co/TheBloke/MythoMax-L2-13B-GGUF/tree/main

  • Civil_Ranger4687B
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Never use the Q_8 versions of GGUFs unless most/all of the model can comfortably fit into your VRAM. The Q_6 version is much smaller, and almost the same quality.

    For your setup, I would use mythomax-l2-13b.Q4_K_M.gguf.

    • OverallBit9OPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      In my tests Q4 is giving me the same amount of tokens as Q5 so I decided to use Q5, first time tesint text gen locally with models, thank you very much for explaining I am getting used to it now and understanding what the settings do.

      • Civil_Ranger4687B
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        Yeah there’s so much to learn I’m still figuring a lot out too.

        Good tip for settings: Play around mostly with temperature, top-p, and min-p.