Is Open LLM Leaderboard reliable source ? yi:34B is at the top but I get better results with neural-chat:7B model

grigio · 3 years ago

Is Open LLM Leaderboard reliable source ? yi:34B is at the top but I get better results with neural-chat:7B model

meetrais · 3 years ago

Same experience here. I got excellent results from quantized models of Intel-Neural-7B and Mistral-7B but bad results with quantized model of Yi-34B.

FPham · 3 years ago

My private finetunes are about text rewriting - input text paragraph - rewrite it in a certain style.

No 7b finetuned model can grasp the idea of submitted text in entirety, tried maybe 100 different runs. It would make a common mistake of “someone” who just scan the text quickly while also watching youtube on a phone, failing to comprehend who is who or what the paragraph is about.

13b with the same finetuning does much better - it would comprehend the relations. For example if two people are speaking, it can keep track who is who, even without mentioning it in the text.

33b - gets even further - sometimes surprise with the way it understand the text. And so the rewritten text is a mirror image of the input, just with different style

7b are impressive if you want a small local LLM to give you answers on questions, but that’s probably the limit. If you want an assistant that can also do other things, then it falls short, because your instructions are not necessary understood fully.