So Mistral-7b is a pretty impressive 7B param model … but why is it so capable? Do we have any insights into its dataset? Was it trained very far beyond the scaling limit? Any attempts at open reproductions or merges to scale up # of params?

  • Monkey_1505B
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Knowledge is a strange goal for any model when we have the internet. IMO. Just connect your model to a web search.