I heard about this on Twitter from some people in the field, in relation to OpenAI’s new breakthrough.
Is there a summary paper, like the ‘All you need is attention’ paper, that goes over this?
Also, how specifically does this relate to and/or add on to Large Language Models?
Cheers