Look at this, apart Llama1, all the other “base” models will likely answer “language” after “As an AI”. That means Meta, Mistral AI and 01-ai (the company that made Yi) likely trained the “base” models with GPT instruct datasets to inflate the benchmark scores and make it look like the “base” models had a lot of potential, we got duped hard on that one.

https://preview.redd.it/vqtjkw1vdyzb1.png?width=653&format=png&auto=webp&s=91652053bcbc8a7b50bced9bbf8638fa417387bb

  • FPhamB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Shouldn’t be the proof in the pudding?

    If Mistral 7B is better than most other 7b models, then they did something right, no?

    I understand that the base model then can inherit some biases - but it’s onto them that they didn’t cleaned those “As and AI…” answers strings from their dataset. So despite this, it performs better.