Can I run llama2 13B locally on my Gtx 1070? I read somewhere minimum suggested VRAM is 10 GB but since the 1070 has 8GB would it just run a little slower? or could I use some quantization with bitsandbytes for example to make it fit and run more smoothly?
Edit: also how much storage will the model take up?
You must log in or register to comment.
I run 7B’s on my 1070. ollama run llama2 produces between 20 and 30 tokens per second in ubuntu.