Hi everyone, I’d like to share something that I’ve been working on for the past few days: https://huggingface.co/nsfwthrowitaway69/Venus-120b-v1.0

This model is the result of interleaving layers from three different models: Euryale-1.3-L2-70B, Nous-Hermes-Llama2-70b, and SynthIA-70B-v1.5, resulting in a model that it larger than any of the three used for the merge. I have branches on the repo for exl2 quants at 3.0 and 4.85 bpw, which will allow the model to run in 48GB or 80GB of vram, respectively.

I love using LLMs for RPs and ERPs and so my goal was to create something similar to Goliath, which is honestly the best roleplay model I’ve ever used. I’ve done some initial testing with it and so far the results seem encouraging. I’d love to get some feedback on this from the community! Going forward, my plan is to do more experiments with merging models together, possibly even going even larger than 120b parameters to see where the gains stop.

  • a_beautiful_rhindB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Sadly doesn’t work on 48gb like the other 120b. It can only fit sub 2048 context otherwise it goes OOM.

    • nsfw_throwitaway69OPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Crap, what’s your setup? I tested it with a single 48GB card but if you’re using 2x 24 then it might not work. I’ll have to make a 2.8 bpw quant (or get someone else to do it) so that it’ll work with card splitting.

      • a_beautiful_rhindB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        I have 2x3090 for exl2. I have tess and goliath and both fit with ~3400 context so somehow your quant is slightly bigger.

        • nsfw_throwitaway69OPB
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          Venus-120b is actually a bit bigger than Goliath-120b. Venus has 140 layers and Goliath has 136 layers, so that would explain it.

          • a_beautiful_rhindB
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            Makes sense… it’s doing pretty well. Like the replies. Set the limit to 3400 in tabby, no oom yet but using 98%/98%. I assume this means I can bump up the other models past 3400 too if I’m using tabby and autosplit.