Question about the possibility of running large models on a 3070ti 32gb ram, what’s the best way to run them if possible, without quality loss?

Speed isn’t an issue, just want to be able to run such models ambiently.

  • reallmconnoisseurB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 年前

    From my understanding, if you say you want to run the models without quality loss, then quantized models are not exactly what you are looking for, at least not below a certain threshold. With your setup you should be able to run 7B models in 8-bit.

    For everything beyond that you’ll need higher quantized models (e.g., 4-bit), which also introduce higher quality loss.

    There was this post a while back which lined out the hardware requirements for 8-bit and 4-bit, for GPU and CPU setups. Of course you can go even higher with quantization and run even larger models, but it’ll introduce more loss as well.