Could multiple 7b models outperform 70b models?

freehuntx · 1 year ago

Could multiple 7b models outperform 70b models?

vasileer · 1 year ago

yes, this is done by Mixture of Experts (MoE)

and we already have this type of examples:

coding - deepseek-coder-7B is better at coding than many 70B models

answering from the context - llama2-7B is better than llama-2-13B at openbookqa test

https://preview.redd.it/1gexvwd83i2c1.png?width=1000&format=png&auto=webp&s=cda1ee16000c2e89410091c172bf4756bc8a427b

etc.

DanIngenius · 1 year ago

I really like the idea, i think multiple 13b models would be awesome! Managed by a highly configured routing model that is completely uncensored is something i want to do, i want to crowd fund a host with this, DM if you are interested!