40x or more speedup by selecting important neurons

koehr · 1 year ago

40x or more speedup by selecting important neurons

fallingdowndizzyvr · 1 year ago

Fingers crossed this can put a small dent on Nvidia’s stock price.

If it works that way, it will only be short term. Since the only reason it doesn’t run on a GPU is because of conditional matrix OPs. So the GPU makers will just add them. Then they’ll will be back on top with the same margins again.

Also, they say the speedup decreases with more layers. So the bigger the model, the less the benefit. A 512B model is much bigger than a 7B model thus the speedup will be much less. Possibly none.