I have tried to set up 3 different versions of it, TheBloke GPTQ/AWQ versions and the original deepseek-coder-6.7b-instruct .
I have tried the 34B as well.
My specs are 64GB ram, 3090Ti , i7 12700k
In AWQ I get just bugged response (“”“”“”“”“”“”“”") until max tokens,
GPTQ works much better, but all versions seem to add unnecessary * at the end of some lines.
and gives worse results than on the website (deepseek.com) Let’s say il ask for a snake game in pygame, it usually gives an unusable version, and after 5-6 tries il get somewhat working version but still il need to ask for a lot of changes.
While on the official website il get the code working on first try, without any problems.
I am using the Alpaca template with adjustment to match the deepseek version (oogabooga webui)
What can cause it? Is the website version different from the huggingface model?
lol, I will stop wasting my time now - I spent roughly 3 hours today trying to get it to work :D Mostly around GGUF
I found the fix for this issue (Tested by me only, thanks to u/FullOf_Bad_Ideas for the suggestion)
reduce the Repetition penalty to 1, the code will be much better, and closely resemble what is generated on the website. (tested multiple times with pong and snake)
works for me with the latest llama.cpp on Windows (CPU only, AVX)
command
`main -m …/models/deepseek-coder-6.7b-instruct.Q4_K_M.gguf -p “### Instruction\n:write Snake game in python\n### Response:” -n 2048 -e`
result