LocalLLaMA@poweruser.forumEnglish · 1 year ago

Could multiple 7b models outperform 70b models?

1

Could multiple 7b models outperform 70b models?

LocalLLaMA@poweruser.forumEnglish · 1 year ago

If i have multiple 7b models where each model is trained on one specific topic (e.g. roleplay, math, coding, history, politic…) and i have an interface which decides depending on the context which model to use. Could this outperform bigger models while being faster?

Chat

vasileerB
link
fedilink
English
arrow-up
1·
1 year ago
yes, this is done by Mixture of Experts (MoE)

and we already have this type of examples:

coding - deepseek-coder-7B is better at coding than many 70B models

answering from the context - llama2-7B is better than llama-2-13B at openbookqa test

https://preview.redd.it/1gexvwd83i2c1.png?width=1000&format=png&auto=webp&s=cda1ee16000c2e89410091c172bf4756bc8a427b

etc.