Open LLM Leaderboard vs Reality: How do you evaluate "good" ?

BlueMetaMind · 2 years ago

Open LLM Leaderboard vs Reality: How do you evaluate "good" ?

BlueMetaMind · 2 years ago

Or if you are just playing around, you just write/search for a post on reddit (or various LLM related discords) asking for best model for your task :D

I made this post as an attempt to collect best practices and ideas.

use GPT4 to evaluate output of llama.

That’s always a good option probably but I try to avoid using openAI all together.