Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
“sync,corrected by elderman” issue in ML translation datasets spread on internet (duckduckgo.com)
3 points by mvolfik on March 17, 2023 | hide | past | favorite | 2 comments


I can't find the true origin of this, but (unless I'm missing some old internet joke) it seems like some language models have some corrupt training data frequently including a string like "== sync, corrected by elderman ==". Now searching for this phrase yields a ton of random results occurring in places where you would expect automatically translated spam. Some interesting mentions I found:

- it historically appeared in autotranslated game chats in Arena of Valor game https://www.reddit.com/r/arenaofvalor/comments/btykru/commen... - mention on GitHub repo of a translation model https://github.com/Helsinki-NLP/Opus-MT/issues/62

I'm curious to see if anyone else has interesting encounters with this


i think that might've come from the rtfm.mit.edu FAQ archives, there were several documents there that had multiple language versions and were great bait for things needing translated text inputs.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: