Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My kingdom for renaming this paper to something like "Tensor Product Attention is a Memory-Efficient Approach for Long-Sequence Language Modeling"


If you don’t like the title, wait till you see this acronym: „… we introduce the Tensor ProducT ATTenTion Transformer (T6), a new model architecture…“


There is a famous transformer model named T5 from Google, and also S4, S4 and S6 (Mamba) in the LLM space, so it is not unusual naming.


Yes, but T5 is at least a normal acronym: Text-To-Text Transfer Transformer (albeit a bit forced)


That it's not unusual tells us that too many researchers in the field are chasing citations and fame at the expense of doing quality work.


Mm. That or all sharing a sense of humour/in-jokes: I'm sure I'm not the only one here who immediately thought of "GOTO is all you need" and "Attention considered harmful"


Right. But then, what they did to the title to make it collapse down to T6 is even worse than what I did to my nickname back in high school to squeeze in a long forgotten inner joke about our city's municipal sanitation department (MPO).


Ironically, both are true!


"... is all you need" isn't unusual either, and yet GGP isn't happy about it (and I understand why)


I propose T-POT (Tensor Product attentiOn Transformer)


TPOT already exists in the ML field, it was a somewhat popular autoML package a few years ago if I remember correctly and still seems to be around: https://github.com/EpistasisLab/tpot2




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: