Quantizing 70b models to 4-bit, how much does performance degrade?

ae_datavizB to

LocalLLaMA@poweruser.forumEnglish · 3 years ago

The title, pretty much.

I’m wondering whether a 70b model quantized to 4bit would perform better than a 7b/13b/34b model at fp16. Would be great to get some insights from the community.

Chat