If i have multiple 7b models where each model is trained on one specific topic (e.g. roleplay, math, coding, history, politic…) and i have an interface which decides depending on the context which model to use. Could this outperform bigger models while being faster?
You must log in or register to comment.
yes, this is done by Mixture of Experts (MoE)
and we already have this type of examples:
coding - deepseek-coder-7B is better at coding than many 70B models
answering from the context - llama2-7B is better than llama-2-13B at openbookqa test
etc.
I really like the idea, i think multiple 13b models would be awesome! Managed by a highly configured routing model that is completely uncensored is something i want to do, i want to crowd fund a host with this, DM if you are interested!