If i have multiple 7b models where each model is trained on one specific topic (e.g. roleplay, math, coding, history, politic…) and i have an interface which decides depending on the context which model to use. Could this outperform bigger models while being faster?
yes, this is done by Mixture of Experts (MoE)
and we already have this type of examples:
coding - deepseek-coder-7B is better at coding than many 70B models
answering from the context - llama2-7B is better than llama-2-13B at openbookqa test
https://preview.redd.it/1gexvwd83i2c1.png?width=1000&format=png&auto=webp&s=cda1ee16000c2e89410091c172bf4756bc8a427b
etc.