Dear Model Mergers, Have You Solved Merger of Different Model Families?

BayesMind · 2 years ago

Dear Model Mergers, Have You Solved Merger of Different Model Families?

BayesMind · 2 years ago

This doesn’t seem cost-effective for what you’d get.

I agree, which is why I’m bearish on model merges, unless you’re mixing model families (IE mistral + Llama).

These franken-merges are just interweaving finetunes of the same base model in a way that, it’d make more sense to me if they just collapsed all params into a same-sized model via element-wise interpolation. So, merging weights makes sense, but running params in parallel like these X-120B, there’s no payout I can see in doing that beyond collapsing the weights.