Hi everyone, I’d like to share something that I’ve been working on for the past few days: https://huggingface.co/nsfwthrowitaway69/Venus-120b-v1.0

This model is the result of interleaving layers from three different models: Euryale-1.3-L2-70B, Nous-Hermes-Llama2-70b, and SynthIA-70B-v1.5, resulting in a model that it larger than any of the three used for the merge. I have branches on the repo for exl2 quants at 3.0 and 4.85 bpw, which will allow the model to run in 48GB or 80GB of vram, respectively.

I love using LLMs for RPs and ERPs and so my goal was to create something similar to Goliath, which is honestly the best roleplay model I’ve ever used. I’ve done some initial testing with it and so far the results seem encouraging. I’d love to get some feedback on this from the community! Going forward, my plan is to do more experiments with merging models together, possibly even going even larger than 120b parameters to see where the gains stop.

  • noedaB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I will set this to run overnight on Hellaswag 0-shot like I did here on Goliath when it was new: https://old.reddit.com/r/LocalLLaMA/comments/17rsmox/goliath120b_quants_and_future_plans/k8mjanh/

    Thanks for the model! I started investigating some approaches to combine models and see if it can be better than its individual parts. Just today I finished code to use a genetic algorithm to pick out parts and frankenstein 7B models together (trying to prove that there is merit to this approach using smalelr models…but we’ll see).

    I’ll report back on the Hellaswag results on this model.

  • xadiantB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Any tips/attempts on frankensteining 2 yi-34b models together to make a ~51B model?

      • xadiantB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Exactly what I was thinking. I just fail miserably each time I merge the layers.

  • AaaaaaaaaeeeeeB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    possibly even going even larger than 120b parameters

    I didn’t know that was possible, have people made a 1T model yet?

  • a_beautiful_rhindB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Sadly doesn’t work on 48gb like the other 120b. It can only fit sub 2048 context otherwise it goes OOM.

    • nsfw_throwitaway69OPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Crap, what’s your setup? I tested it with a single 48GB card but if you’re using 2x 24 then it might not work. I’ll have to make a 2.8 bpw quant (or get someone else to do it) so that it’ll work with card splitting.

      • a_beautiful_rhindB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        I have 2x3090 for exl2. I have tess and goliath and both fit with ~3400 context so somehow your quant is slightly bigger.

        • nsfw_throwitaway69OPB
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          Venus-120b is actually a bit bigger than Goliath-120b. Venus has 140 layers and Goliath has 136 layers, so that would explain it.

          • a_beautiful_rhindB
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            Makes sense… it’s doing pretty well. Like the replies. Set the limit to 3400 in tabby, no oom yet but using 98%/98%. I assume this means I can bump up the other models past 3400 too if I’m using tabby and autosplit.

  • Distinct-Target7503B
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    That’s a great work!

    Just a question… Have anyone tried to fine tune one of those “Frankenstein” models? Some time ago (when the first “Frankenstein” came out, it was a ~20B model) I read here on reddit that lots of users agreed that a fine tune on those merged models would have “better” results since it would help to “smooth” and adapt the merged layers. Probably I lack the technical knowledge needed to understand, so I’m asking…

  • a_beautiful_rhindB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Hell yea! No Xwin. I hate that model. I’m down for the 3 bit. I didn’t like tess-XL so far so hopefully you made a david here.

  • ambient_temp_xenoB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I still have this feeling in my gut that closedai have been doing this for a while. It seems like a free lunch.

    • CharuruB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I don’t think so, this is something you do when you’re GPU poor, closedai would just not undertrain their models in the first place.