Hi everyone, I’d like to share something that I’ve been working on for the past few days: https://huggingface.co/nsfwthrowitaway69/Venus-120b-v1.0
This model is the result of interleaving layers from three different models: Euryale-1.3-L2-70B, Nous-Hermes-Llama2-70b, and SynthIA-70B-v1.5, resulting in a model that it larger than any of the three used for the merge. I have branches on the repo for exl2 quants at 3.0 and 4.85 bpw, which will allow the model to run in 48GB or 80GB of vram, respectively.
I love using LLMs for RPs and ERPs and so my goal was to create something similar to Goliath, which is honestly the best roleplay model I’ve ever used. I’ve done some initial testing with it and so far the results seem encouraging. I’d love to get some feedback on this from the community! Going forward, my plan is to do more experiments with merging models together, possibly even going even larger than 120b parameters to see where the gains stop.
I will set this to run overnight on Hellaswag 0-shot like I did here on Goliath when it was new: https://old.reddit.com/r/LocalLLaMA/comments/17rsmox/goliath120b_quants_and_future_plans/k8mjanh/
Thanks for the model! I started investigating some approaches to combine models and see if it can be better than its individual parts. Just today I finished code to use a genetic algorithm to pick out parts and frankenstein 7B models together (trying to prove that there is merit to this approach using smalelr models…but we’ll see).
I’ll report back on the Hellaswag results on this model.
Thanks! I’m eager to see the results :)
Any tips/attempts on frankensteining 2 yi-34b models together to make a ~51B model?
We need 2 or 3 yi stacked together and then face them off vs 70b.
Exactly what I was thinking. I just fail miserably each time I merge the layers.
possibly even going even larger than 120b parameters
I didn’t know that was possible, have people made a 1T model yet?
Sadly doesn’t work on 48gb like the other 120b. It can only fit sub 2048 context otherwise it goes OOM.
Crap, what’s your setup? I tested it with a single 48GB card but if you’re using 2x 24 then it might not work. I’ll have to make a 2.8 bpw quant (or get someone else to do it) so that it’ll work with card splitting.
I have 2x3090 for exl2. I have tess and goliath and both fit with ~3400 context so somehow your quant is slightly bigger.
Venus-120b is actually a bit bigger than Goliath-120b. Venus has 140 layers and Goliath has 136 layers, so that would explain it.
Makes sense… it’s doing pretty well. Like the replies. Set the limit to 3400 in tabby, no oom yet but using 98%/98%. I assume this means I can bump up the other models past 3400 too if I’m using tabby and autosplit.
That’s a great work!
Just a question… Have anyone tried to fine tune one of those “Frankenstein” models? Some time ago (when the first “Frankenstein” came out, it was a ~20B model) I read here on reddit that lots of users agreed that a fine tune on those merged models would have “better” results since it would help to “smooth” and adapt the merged layers. Probably I lack the technical knowledge needed to understand, so I’m asking…
Tess-XL-1.0… so far I didn’t like the results.
Is that a LORA or a full fine tune?
Hell yea! No Xwin. I hate that model. I’m down for the 3 bit. I didn’t like tess-XL so far so hopefully you made a david here.
I used this dataset for the quants: https://huggingface.co/datasets/jasonkstevens/pippa-llama2-chat/tree/refs%2Fconvert%2Fparquet/default/train
I still have this feeling in my gut that closedai have been doing this for a while. It seems like a free lunch.
I don’t think so, this is something you do when you’re GPU poor, closedai would just not undertrain their models in the first place.