Goliath-120B - quants and future plans

AlpinDale · 1 year ago

Goliath-120B - quants and future plans

BalorNG · 1 year ago

A quick question: does it imply that it has 160 layers as a result? Afaik, Falcon has 80 layers (like Llama), and original GPT3 had 96. “Stack more layers” ©

Sooo… is “stacking 1000 phi 1.3b together” is a recipe for AGI? :)

AlpinDale · 1 year ago

The stacking wasn’t as simple as just taking one model and putting it on top of another. I took multiple layer ranges from each model (except first and last few, which are xwin only) and then stacked those slices on top of each other. In the end, the model has 136 layers because that’s how many I specified in the ranges. Otherwise we’d have a ~135B model (can’t stack input and output layers, they need to be unique and non-repeating).