Why is a single a100 so slow?

Radiant-Practice-270 · 1 year ago

nuvalab · 1 year ago

That sounds like CPU speed. What you see from `watch nvidia-smi -d -n 0.1` while you’re running inference ?

a_beautiful_rhind · 1 year ago

Something is wrong with your environment. even P40s give more than that.

Other option is you don’t get enough tokens to get proper t/s speed. What was the total inference time?

easyllaama · 1 year ago

Try use GGUF, this format likes single GPU especially you have 80GB vram. I think you can run 70gb GGUF with all layers in GPU.