How fast is 3090 for Codellama 70B 4/8bit?

Snoo-83094 · 2 years ago

How fast is 3090 for Codellama 70B 4/8bit?

Herr_Drosselmeyer · 2 years ago

With a 3090 and sufficient system RAM, you can run 70b models but they’ll be slow. About 1.5 tokens/second. Plus quite a bit of time for prompt ingestion. It’s doable but not fun.

a_beautiful_rhind · 2 years ago

one is not enough

flossraptor · 2 years ago

With a dedicated 3090 (another card for OS) a 34b 5bpw just fits and runs very fast. Like 10-20t/s. The quality is good for my application, but I’m not coding.