Why is Mistral-7b so capable? Any ideas re: dataset?

Fun_Tangerine_1086B to

LocalLLaMA@poweruser.forumEnglish · 3 years ago

So Mistral-7b is a pretty impressive 7B param model … but why is it so capable? Do we have any insights into its dataset? Was it trained very far beyond the scaling limit? Any attempts at open reproductions or merges to scale up # of params?

Chat