Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> There were a couple of reviews where I found myself laughing because I implicitly understood the review but the Croatian words were completely made up (Googling the word you won't get any results).

Is it possible some of those were actually Hungarian words being copied verbatim into the Croatian text? Or maybe close variants of the Hungarian words (such as in a different grammatical case)?

I've seen before, when translating from one language to another, if Google Translate doesn't understand a word, it just copies it verbatim from the source to the target. I don't know whether this is true, but it seems possible to me it might even sometimes "normalise" the word when doing so (e.g. if it doesn't recognise the word, but recognises it as being in genitive case, it might convert it to nominative and stick the word "of" in front when translating it to English, assuming the source language's grammar is sufficiently regular to permit it to convert an unrecognised word to a different case.) I've definitely seen it transliterate unknown words in the source language before (when translating from languages with non-Latin scripts.)

Doing something like Hungarian to Croatian is likely worse, because it is probably being chained through English instead of being translated directly, which doubles the possibility of odd things like this happening. Since both Hungarian and Croatian have grammatical case, if it is doing the kind of "case-based normalisation of unknown words" I was talking about, it might do it twice (Hungarian->English then again for English->Croatian), creating even weirder results.



Hungarian and Croatian are very different. There's only a few things that might match (like vegetable/fruit names) but verbs and nouns are very different.

The words that I was surprised by were with proper declension (taking gender into account) and sometimes there were very unique verbs that felt right in Croatian but if you search any form it just doesn't exist anywhere.


If you can remember any of the Hungarian links you observed this behaviour on, and if you can cite any of the specific words from the machine translation you are talking about, I would be appreciative – not that I know any Hungarian or Croatian, but still this topic has piqued my interest




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: