Has anyone already read this new article on ArXiv? https://arxiv.org/abs/2311.10770

Looks very promising, potential inference acceleration of PyTorch x30, and when implemented on native CUDA x117, and also an estimate of the maximum acceleration x341 times.

As far as I understand, this is achieved by replacing traditional forward propagation layers with so-called fast forward propagation layers.

Is there anyone here with real experience of contributing to the development of PyTorch, llama.cpp or releasing open models, what do you say to this?