Goliath-120B - quants and future plans

AlpinDale · 2 years ago

Goliath-120B - quants and future plans

FPham · 2 years ago

I suspect that it behaves sort of as if you have (fictious) Xwin and Eurayle adapter and apply it as catsum which sums the rank (so 2x256 rank would became 512 rank!) but improves the response only a tiny bit.

But in this case we are summing “virtual” rank of two 70b models. The model could be a smidgen smarter, but not that much because a huge chunks of weights are overlapping. We are wasting probably 80b parameters :) that do not contribute.

A correct test has to be done between the Sum and both Xwin and Eurayle to see the actual result. I’ve seen it many times with fine-tuning when I attributed the good response to the fine-tune, but in fact it was mostly due to the prior model, when I A/B and the fine-tune really added only a tiny bit.

I’m honestly more interested in the opposite way to make models smaller while maybe loosing only a smidgen of knowledge.