I’m trying to run zephyr-7b, on my local machine with an RX580 8G using Text generation web UI. It works for the most part but sometimes gets into giving unrelated responses. After which I have to restart the app! Sometimes it even prints out right out gibberish…

I’m running zephyr-7b-beta.Q4\_K\_M.gguf\. With the following options:

n-gpu-layers: > 35
n_ctx: 8000

And parameters:

max_new_tokens: 2000
top_p: 0.95
top_k: 40
Instruction Template: ChatML

But if I run the above exact setup on a cloud GPU (vast.ai) it runs perfect… What am I doing wrong?