UPDATE: I forgot to mention that I used q8 of all models.

So I’ve had a lot of interest in the non-code 34b Finetunes, whether it’s CodeLlama base or Yi base. From the old Samantha-34b and Synthia-34b to the new Dolphin-Yi and Nous-Capybara 34b models, I’ve been excited for each one because it fills a gap that needs filling.

My problem is that I can’t seem to wrangle these fine-tunes into working right for me. I use Oobabooga (text-gen-ui), and always try to choose the correct instruction template either specified on the card or on TheBloke’s page, but the models never seem to be happy with the result, and either get confused very easily or output odd gibberish from time to time.

For both Yi models, I am using the newest ggufs that TheBloke put out… yesterday? Give or take. But I’ve tried the past 2-3 different ggufs for the same model he’s updated with when they came out.

The best luck I’ve had with the new Yi models was doing just plain chat mode with my AI Assistant’s character prompt as the only thing being sent in, but even then both Yi fine-tunes that I tried eventually broke down after a few thousand context.

For example, after a bit of chattering with the models I tried a very simple little test on both: “Please write me two paragraphs. The content of the paragraphs is irrelevant, just please write two separate paragraphs about anything at all.” I did that because previous versions of these two struggled to make a new line, so I just wanted to see what would happen. This absolutely confused the models, and the results were wild.

Has anyone had luck getting them to work? They appear to have so much potential, especially Nous Capybara which went toe to toe with GPT-4 in this benchmark, but I’m failing miserably at unlocking its full potential lol. If you have gotten it to work, could you please specify what settings/instructions you’re using?

  • a_beautiful_rhindB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Using exl2 I have not had too many issues with the models. I did not d/l the GGUF because I read all the issues concerning BOS tokens.

    They are similar to 70b but with poorer reasoning. With proper template they follow most instructions.

    I’ve just been plodding along with them using dynamic temperature and min_P