I use in both cases q4_K_M
Same experience here. I got excellent results from quantized models of Intel-Neural-7B and Mistral-7B but bad results with quantized model of Yi-34B.
My private finetunes are about text rewriting - input text paragraph - rewrite it in a certain style.
No 7b finetuned model can grasp the idea of submitted text in entirety, tried maybe 100 different runs. It would make a common mistake of “someone” who just scan the text quickly while also watching youtube on a phone, failing to comprehend who is who or what the paragraph is about.
13b with the same finetuning does much better - it would comprehend the relations. For example if two people are speaking, it can keep track who is who, even without mentioning it in the text.
33b - gets even further - sometimes surprise with the way it understand the text. And so the rewritten text is a mirror image of the input, just with different style
7b are impressive if you want a small local LLM to give you answers on questions, but that’s probably the limit. If you want an assistant that can also do other things, then it falls short, because your instructions are not necessary understood fully.