How much does Quantization actually impact models? - KL Divergence Tests

kindacognizant · 3 years ago

How much does Quantization actually impact models? - KL Divergence Tests

kpodkanowicz · 3 years ago

you are on fire. This is your yet another great post - btw. i changed perplexity scripts to only measure responses after the instruction and using for example, the evol dataset. The preset is configured accordingly to the model - i got completely different results than normal perplexity - interestingly, when running code isntructions on normal model and for instance roleplay instructions on coding model not just perpelxity is around 1 vs. 3 but also degradate differently

CardAnarchist · 3 years ago

Hi there, you seem like the man to ask on this somewhat related topic to the OP,

I’ve recently found out that models output different results based on the number of layers loaded into GPU. I’ve been told that more layers loaded in = better output.

How does the loss asociated with layers not in GPU compare to the loss say between quants?

kindacognizant · 3 years ago

That doesn’t seem correct in the slightest.

CardAnarchist · 3 years ago

I thought it odd myself. So much so that I thought SillyTavern was bugged but that wasn’t the case.

It’s pretty easy to test yourself. Just use Koboldcpp to load in say 31 layers generate some output on seed 1 then, restart Koboldcpp with 30 layers.

Example of 31 layers of a 7B vs 30 layers on the same seed.

Each seed works the same if the layers are close enough it seems like. The output starts exactly the same before branching off.

It’s worth mentioning that the person who told me the quality was “better” with more layers loaded in simply said it was as far as he recalled.