PookaMacPhellimenB to LocalLLaMA@poweruser.forumEnglish · 2 年前Qwen-72B releasedhuggingface.coexternal-linkmessage-square41linkfedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkQwen-72B releasedhuggingface.coPookaMacPhellimenB to LocalLLaMA@poweruser.forumEnglish · 2 年前message-square41linkfedilink
minus-squarePookaMacPhellimenOPBlinkfedilinkEnglisharrow-up1·2 年前https://preview.redd.it/sdofti9odg3c1.jpeg?width=1792&format=pjpg&auto=webp&s=d6f56d56c3596924ea61e1e5429018c0222907d2 Amazing capabilities on some benchmarks if true.
minus-squarea_slay_nubBlinkfedilinkEnglisharrow-up1·2 年前Bit disappointed by the coding performance but it is a general use case model. It’s insane how good gpt 3.5 is for how fast it is.
minus-squareambient_temp_xenoBlinkfedilinkEnglisharrow-up1·2 年前Apparently the chat version has about 64 for humaneval.
minus-squareSecret_Joke_2262BlinkfedilinkEnglisharrow-up1·2 年前What do these tests mean for LLM? There are many values, and I see that in most cases qwen is better than gpt4. In others it is worse or much worse
https://preview.redd.it/sdofti9odg3c1.jpeg?width=1792&format=pjpg&auto=webp&s=d6f56d56c3596924ea61e1e5429018c0222907d2
Amazing capabilities on some benchmarks if true.
Bit disappointed by the coding performance but it is a general use case model. It’s insane how good gpt 3.5 is for how fast it is.
Apparently the chat version has about 64 for humaneval.
What do these tests mean for LLM? There are many values, and I see that in most cases qwen is better than gpt4. In others it is worse or much worse