• vatsadevB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Well, the model is trained on refinedWeb, which is 3.5T, so a little below chinchilla optimal for 180b. Also, all the models from the falcon series seem to feel more and more undertrained,

    • The 1b model was good, and is still good after several newer gens
    • the 7b was capable pre llama 2
    • 40b and 180b were never as good