I updated to the latest commit because ooba said it uses the latest llama.cpp that improved performance. What I suspect happened is it uses more FP16 now because the tokens/s on my Tesla P40 got halved along with the power consumption and memory controller load.
You can fix this by doing:
git reset --hard 564d0cde8289a9c9602b4d6a2e970659492ad135
to go back to the last verified commit that didn’t kill performance on the Tesla P40. Not sure how to fix this for future updates so maybe u/Oobabooga can chime in.
You must log in or register to comment.