Hi everyone, I’d like to share something that I’ve been working on for the past few days: https://huggingface.co/nsfwthrowitaway69/Venus-120b-v1.0

This model is the result of interleaving layers from three different models: Euryale-1.3-L2-70B, Nous-Hermes-Llama2-70b, and SynthIA-70B-v1.5, resulting in a model that it larger than any of the three used for the merge. I have branches on the repo for exl2 quants at 3.0 and 4.85 bpw, which will allow the model to run in 48GB or 80GB of vram, respectively.

I love using LLMs for RPs and ERPs and so my goal was to create something similar to Goliath, which is honestly the best roleplay model I’ve ever used. I’ve done some initial testing with it and so far the results seem encouraging. I’d love to get some feedback on this from the community! Going forward, my plan is to do more experiments with merging models together, possibly even going even larger than 120b parameters to see where the gains stop.

  • a_beautiful_rhindB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I have 2x3090 for exl2. I have tess and goliath and both fit with ~3400 context so somehow your quant is slightly bigger.

    • nsfw_throwitaway69OPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Venus-120b is actually a bit bigger than Goliath-120b. Venus has 140 layers and Goliath has 136 layers, so that would explain it.

      • a_beautiful_rhindB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Makes sense… it’s doing pretty well. Like the replies. Set the limit to 3400 in tabby, no oom yet but using 98%/98%. I assume this means I can bump up the other models past 3400 too if I’m using tabby and autosplit.