40x or more speedup by selecting important neurons

koehr · 2 years ago

40x or more speedup by selecting important neurons

paryska99 · 2 years ago

Future is going to be interesting. With this kind of CPU speedup we can run blazing fast LLMs on a toaster if it has enough RAM.

MoffKalast · 2 years ago

Would be interesting to see if this can help speed up CPU inference with regular RAM, after all 128 GB of DDR5 only costs like $300 which is peanuts compared to trying to get any where close as much VRAM.

If it scales linearly then one could run a 100B model at the speed of a 3B one right now.

sahil1572 · 2 years ago

Remind Me! 15 Day “ 40x BERT ”

Acceptable_Can5509 · 2 years ago

Basically gpt 4 turbo