There has been a lot of movement around and below the 13b parameter bracket in the last few months but it’s wild to think the best 70b models are still llama2 based. Why is that?

We have 13b models like 8bit bartowski/Orca-2-13b-exl2 approaching or even surpassing the best 70b models now

  • candre23B
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    It’s adorable that you think any 13b model is anywhere close to a 70b llama2 model.

  • obvithrowaway34434B
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Mistral has already shown that it’s mostly about the data rather than the model. So why waste loads of money and time on training something that no average consumer can run locally?

  • a_beautiful_rhindB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    What do you mean? Someone just posted 100,200 and 600b models and several 120b models have released past couple of weeks.

    • SlimxshadyxB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      Those models can’t be accessed, they say it’s “too dangerous to be released”

  • WaterPeckerB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Who pays for all this training on all these models we see knocking about and I don’t mean the ones released by the big companies? Like who has the resources to train a 70b model? Like one of the guys below said 1.7 million GPU hours for example thats pretty friggin expensive no?