on the hugging face leaderboard, i was a bit surprised by the performance of falcon 180b.
do you have any explanation of how?
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
You must log in or register to comment.
Well, the model is trained on refinedWeb, which is 3.5T, so a little below chinchilla optimal for 180b. Also, all the models from the falcon series seem to feel more and more undertrained,
- The 1b model was good, and is still good after several newer gens
- the 7b was capable pre llama 2
- 40b and 180b were never as good