Dear Model Mergers, Have You Solved Merger of Different Model Families?

BayesMind · 1 year ago

Dear Model Mergers, Have You Solved Merger of Different Model Families?

a_beautiful_rhind · 1 year ago

Wonder how L1 65b would do with L2 70b.

BayesMind · 1 year ago

Not for the kind of merging I’ve seen. But I remember a paper back in the day that suggested you could find high-dimensional axes within different models, and if you rotated the weights to align, you could merge different models to your advantage, and maintain knowledge from both seed models. This included models that were trained from different initializations.

I think that the only reason this franken-merging works is because people are mostly just merging finetunes of the same base, so these high-d vectors are already aligned enough that the mergers work.

mcmoose1900 · 1 year ago

Git rebasin claims to do this.

But its untested on large models. There is a branch for it in mergekit, as well as a stable diffusion implementation (which works fantastically as a regular merger).

BayesMind · 1 year ago

rebasin! I was trying to recall this, thank you. Can it mix model families, do you know? I thought it was just for identical architectures.