Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Audio is 1 dimensional so the usual RoPE position encoding should handle it like it does for text tokens. You only need extra position encoding for higher-dimensional stuff like images.
 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: