Are there any tricks to speed up 13B models on a 3090?

Currently using the regular huggingface model quantized to 8bit by a GPTQ capable fork of KoboldAI.

Especially when the context limit changes, it’s pretty slow and far from even remotely real time.

  • DustGrouchy1792OPB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Can I get koboldcpp working with sillytavern without too much of a headache?