https://huggingface.co/TheBloke/MistralLite-7B-GGUF

This is supposed to be a 32k context finetune of mistral. I’ve tried the recommended Q5 version in both GPT4all and LMStudio, and it works for normal short prompts but hangs and produces no output when I crank up the context length to 8k+ for data cleaning. I tried it cpu only (machine has 32GB of RAM so should be plenty) and hybrid with the same bad outcomes. Curious if there’s some undocumented ROPE settings that need to he adjusted.

Anyone get this to work with long prompts? Otherwise, what do y’all recommend for 32k+ context with good performance on data augmentation/cleaning, with <20B params for speed?

  • Chromix_B
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    You wrote that it works for short prompts. Did you also try slightly longer prompts (up to 4k tokens)? This doesn’t hit the sliding window yet, but still resulted in not much useful output for me and some others.