If i have multiple 7b models where each model is trained on one specific topic (e.g. roleplay, math, coding, history, politic…) and i have an interface which decides depending on the context which model to use. Could this outperform bigger models while being faster?
You must log in or # to comment.
yes, this is done by Mixture of Experts (MoE)
and we already have this type of examples:
coding - deepseek-coder-7B is better at coding than many 70B models
answering from the context - llama2-7B is better than llama-2-13B at openbookqa test
etc.
I really like the idea, i think multiple 13b models would be awesome! Managed by a highly configured routing model that is completely uncensored is something i want to do, i want to crowd fund a host with this, DM if you are interested!

