I’m using ollama and I have a RTX 3060 TI. Using only 7B models.

I tested with Mistral 7B, Mistral-OpenOrca and Zephyr, they all had the same problem where they kept repeating or speaking randomly after some amount of chatting.

What could it be? Temperature? VRAM? ollama?

  • ntn8888B
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    I’ve noticed this extensively when running locally on my 8gb rx580. And the issue is pretty bad… I’ve run exactly the models you stated.

    But when I run on (big) cloud GPU on vast.ai (eg on rtx 3090 or A6000) the problem vanishes…

    vast.ai is pretty cheap ($10 deposit)you can experiment on there and see.