Dear Model Mergers, Have You Solved Merger of Different Model Families?

BayesMind · 1 year ago

Dear Model Mergers, Have You Solved Merger of Different Model Families?

BayesMind · 1 year ago

reading the readme, it sounds like they’re running some attention heads that were either already same-dimensioned across both models, or, they may have included a linear projection layer to accomplish it. Then, they say they trained on 10M tokens to “settle in the transplant”, which doesn’t sound like enough to me, and they concur this model isn’t useful until further training.