https://arxiv.org/abs/2311.10770

“UltraFastBERT”, apparently a variant of BERT, that uses only 0.3% of it’s neurons during inference, is performing on par with similar BERT models.

I hope that’s going to be available for all kinds of models in the near future!

  • paryska99B
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Future is going to be interesting. With this kind of CPU speedup we can run blazing fast LLMs on a toaster if it has enough RAM.

  • MoffKalastB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Would be interesting to see if this can help speed up CPU inference with regular RAM, after all 128 GB of DDR5 only costs like $300 which is peanuts compared to trying to get any where close as much VRAM.

    If it scales linearly then one could run a 100B model at the speed of a 3B one right now.