Look at this, apart Llama1, all the other “base” models will likely answer “language” after “As an AI”. That means Meta, Mistral AI and 01-ai (the company that made Yi) likely trained the “base” models with GPT instruct datasets to inflate the benchmark scores and make it look like the “base” models had a lot of potential, we got duped hard on that one.

https://preview.redd.it/vqtjkw1vdyzb1.png?width=653&format=png&auto=webp&s=91652053bcbc8a7b50bced9bbf8638fa417387bb

  • mcmoose1900B
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    The problem is trusting these common benchmarks in the first place… And VCs making investing decisions based on them.

    It’s insane. Its like a years old, published SAT test is the only factor for getting a job or an investment, and no one bothered to check if you’re just blatently cheating instead of cleverly cheating.

    • Wonderful_Ad_5134OPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      I know right, getting that much investment on something you can easily cheat makes me sick

  • FPhamB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Shouldn’t be the proof in the pudding?

    If Mistral 7B is better than most other 7b models, then they did something right, no?

    I understand that the base model then can inherit some biases - but it’s onto them that they didn’t cleaned those “As and AI…” answers strings from their dataset. So despite this, it performs better.

  • a_beautiful_rhindB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    GPT slop gonna GPT slop.

    I hate that phrase so much too. Even if they used anything else. Some think they’re being clever and change it to “as an AI”.

  • Wonderful_Ad_5134OPB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Llama2 has been pre-trained on old data (before the chatGPT AI poisoning was significant)

    https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md

    “Data Freshness The pretraining data has a cutoff of September 2022, but some tuning data is more recent, up to July 2023.”

    “Model Dates Llama 2 was trained between January 2023 and July 2023.”

    StableLM3b has been trained on more recent datasets (cutoff of march 2023) yet it doesn’t have this amount of chatgpt poisoning in it

    https://huggingface.co/stabilityai/stablelm-base-alpha-3b-v2

    https://preview.redd.it/gl46fo50n10c1.png?width=518&format=png&auto=webp&s=c7cae52b292dcba45dee735a4ca7efac5630a4ae