Goliath-120B - quants and future plans

AlpinDale · 1 year ago

Goliath-120B - quants and future plans

AlpinDale · 1 year ago

Makes sense the benchmark results would be surprisingly low for goliath. After playing around with it for a few days, I’ve noticed two glaring issues:

it tends to make slight spelling mistakes
it hallucinates words They happen rarely, but frequent enough to throw off benchmarks. I’m very positive this can be solved by a quick full finetune over a 100 or so steps, which would align the layers to better work together.

noeda · 1 year ago

Not sure if you misread, but it’s actually high, i.e. it’s better than Xwin and Euryale it’s made out of (in this particular quick test).

It beat all the 70B models I tested there, although the gap is not super high.

AlpinDale · 1 year ago

Yes well it should perform much higher than that. Turboderp ran MMLU at 3.25bpw and it was performing worse than other 70B models. I assume quantization further degrades the spelling consistency.