Hi, I have searched for a long time on this subreddit, in Ooba’s documentation, Mistral’s documentation and everything, but I just can’t find what I am looking for.

I see everyone claiming Mistral can handle up to 32k context size, however while it technically won’t refuse to generate anything above like 8k, the output is just not good. I have it loaded in Oobabooga’s text-generation-webui and am using the API through SillyTavern. I loaded the normal Mistral 7B just to check, but with my current 12k story, all it can generate is gibberish if I give it the full context. However, I also checked using other fine-tunes of Mistral.

What am I doing wrong? I am using the GPTQ version on my RX 7900 XTX. Is it just advertising that it won’t crash until 32k or something, or am I doing something wrong for not getting coherent output above 8k? I did mess with the alpha values, and while doing so does eliminate the gibberish, I do get the idea that the quality does suffer somehow.

  • SomeOddCodeGuyB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I don’t believe messing with alpha values is a good idea, but I’ve never done it on any model. My Mistral 7B instance in chat mode had no trouble with a conversation extending past 9k tokens

    This is the part that threw me off, and why Im interested in the answers from this post.

    Normally, on a Llama 2 model for instance, I’d use alpha to increase the context past the regular cap. For example, on XWin 70b with a max seq length of 4096, I run it at 1.75 alpha and 17000 rope base to kick the context to 6144.

    Codellama is a little different. I don’t need to touch the alpha for it to use 100,000 tokens, but the rope base has to be at 1,000,000. So its 1 alpha, rope base 1,000,000, 1 compress == 100,000 tokens.

    But then there’s mistral. Mistral loads up and is like “I can do 32,000 tokens!” and has 1 alpha, 0 rope base, 1 compress. And the readme files on the models keep showing “4096” tokens. So I’ve been staring at it, scratching my head, unsure whether it can do 32k, 4k, does it need rope, etc.

    I just keep loading it in 4096 until I have a chance to look it up lol