[R] Exponentially Faster Language Modelling

lexected · 2 years ago

[R] Exponentially Faster Language Modelling

we_are_mammals · 2 years ago

78x speedup over the optimized baseline feedforward implementation

So they are 78x faster than MKL using the same number of cores?

we_are_mammals · 2 years ago

I think DistilBERT needs to be in Table 2, since it’s their most direct competitor: it trades off accuracy for speed, and requires extra training effort, like their approach.

Still, if they are about 20x faster than DistilBERT using cuBLAS, that’s pretty amazing.

we_are_mammals · 2 years ago

has 4095 neurons but selectively uses only 12 (0.03%) for inference

an extra 0 in there