40x or more speedup by selecting important neurons

koehr · 1 year ago

40x or more speedup by selecting important neurons

paryska99 · 1 year ago

Future is going to be interesting. With this kind of CPU speedup we can run blazing fast LLMs on a toaster if it has enough RAM.

MoffKalast · 1 year ago

Would be interesting to see if this can help speed up CPU inference with regular RAM, after all 128 GB of DDR5 only costs like $300 which is peanuts compared to trying to get any where close as much VRAM.

If it scales linearly then one could run a 100B model at the speed of a 3B one right now.

sahil1572 · 1 year ago

Remind Me! 15 Day “ 40x BERT ”

Acceptable_Can5509 · 1 year ago

Basically gpt 4 turbo