Hi, I have searched for a long time on this subreddit, in Ooba’s documentation, Mistral’s documentation and everything, but I just can’t find what I am looking for.
I see everyone claiming Mistral can handle up to 32k context size, however while it technically won’t refuse to generate anything above like 8k, the output is just not good. I have it loaded in Oobabooga’s text-generation-webui and am using the API through SillyTavern. I loaded the normal Mistral 7B just to check, but with my current 12k story, all it can generate is gibberish if I give it the full context. However, I also checked using other fine-tunes of Mistral.
What am I doing wrong? I am using the GPTQ version on my RX 7900 XTX. Is it just advertising that it won’t crash until 32k or something, or am I doing something wrong for not getting coherent output above 8k? I did mess with the alpha values, and while doing so does eliminate the gibberish, I do get the idea that the quality does suffer somehow.
I think it really depends on the finetune, for example, Mistral-Instruct is able to summarize or extract information from a 32K context, for writing, you will have to find a finetuned model for that task