Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Many statistical machine translators like Google Translate are very sensitive to the availability of bilingual corpus. In this case GT seems to have learned that -子 means either a dumpling (餃子) or a female name ending with -ko, but haven't seen enough corpus to determine that the preceding 淳 is pronounced either Atsu- or Jun- in given context so it is guessing. Combined with the user-contributed corpus this can be rather disastrous: several machine translators had translated Japanese "初音ミク" [1] to Korean "이명박" [2] ;-)

[1] https://en.wikipedia.org/wiki/Hatsune_Miku

[2] https://en.wikipedia.org/wiki/Lee_Myung-bak



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: