While this is great, I like to give a shout out to DeepL Translator [1]. I'm not affiliated with them but I like to recommend them to people who like to step out of Google ecosystem. I am using DeepL for about a year now and I mostly use it for NL<->EN, DE<->EN. So far, I never felt that the translation is off, or terrible, and if not better in some cases, it's as good as Google Translator.
> So far, I never felt that the translation is off, or terrible, and if not better in some cases, it's as good as Google Translator.
In my experience Deepl is consistently, without fail, considerably better than Google Translate. I basically use Google now only for more exotic language pairs, or full-page translations.
I've found that sometimes what Google does for these is translate from Language A > English, and then English > Language B, which leads to bizarre results.
I have noticed that too. When translation libre from French to Azerbaijani, it gave the Azerbaijani word for free as in free of charge, rather than free as in freedom. That is an ambiguity that is mostly limited to English, maybe some other languages, but definitely neither French nor Azerbaijani.
From reports, Google Translate has effectively created its own internal (AI GD ML) metalanguage, which can interpolate between languages it's not been specifically trained on. E.g., with Japanese <-> English and Korean <-> English, Google Translate can manage Japanese <-> Korean, without being specifically trained to do so.
So yes, there's an intermediary language. But it's not English.
The intermediary language is not English, but due to the way the training set is constructed (pairs of texts in various languages, with the vast majority of pairs having English as one of the languages, is my understanding) it can be very hard to tell apart from English sometimes.
For example, translating "рубанок" ("plane", in the sense of the carpenter's tool) from Russian to Polish used to produce "samolot" ("airplane") in Google translate up until sometime earlier this year, because in the intermediate representation "plane" was ambiguous just like it is in English. It looks like that particular bit is fixed now, which is at least progress! Maybe they've been adding more non-English text pairs...
That's almost certainly accurate. The metalanguage / interlingua isn't English, but being based on A <-> English and B <-> English training, is all but certainly influenced by English grammar, words, and idioms, in ways that direct A <->B training would not be.
It seems to be 'fixed' now, but once I was trying to translate from Hindi to Nepali (which are actually closely-related languages, think Italian and Spanish) with a simple sentence along the lines 'Ram came', where 'Ram' is a common Hindu name (effectively: 'John'), written in Devanagari with a long vowel: राम (rām).
And I gave the Hindi input in devanagari (राम आ गया), but still the Nepali translation ended up being the equivalent of 'the sheep came' (भेडा आयो), so somewhere along the line it seemed to be treating the name राम (rām) as equivalent to the English string 'ram' and translating accordingly.
So if the intermediate language isn't English, it certainly has some English-like properties....
Supposedly, but it behaves suspiciously like English in practice, perhaps because of the input data (lots of texts originally in English then translated to many languages and fed in)
Since we're complaining about Google Translate, I'd like to mention how ridiculous their "verified translation" system is. It works by throwing automatic translations at people who, in their majority, have never studied English, and expecting them to tell whether it's right or wrong, but what happens is that most just confirm whatever they get as being correct. As a result, at least for Portuguese, many of them, if not most, are just plain wrong.
Considering Translate is such an important product, I can't fathom why they just don't hire a single linguist (or just anyone who isn't completely clueless, really) per language to register decent translations, or at least import them from a real dictionary...
Well, the more general problem about asking people visiting Google Translate to verify if a translation from A->B is correct is that generally people visiting Google Translate didn't know how to translate A->B or weren't very sure.
How many people on Google Translate actually are able to reasonably verify translations? Relatively few - and those qualified few who might poke at Translate out of curiosity are just as likely not to feel inclined to offer free labour to Google.
> Considering Translate is such an important product, I can't fathom why they just don't hire a single linguist (or just anyone who isn't completely clueless, really) per language to register decent translations, or at least import them from a real dictionary...
Perhaps professional translators rather than linguists. I imagine they have some linguists on the project, but they're likely to be more NLP-type linguists.
The difficulty is that they're interested not just in word-level meaning/translation-accuracy, but also phrase- and clause-level accuracy, and those are really large (i.e. theoretically infinite) spaces.
I've heard that the Spanish<->English translations aren't too bad.
Sometimes I have seen English words injected verbatim/untranslated in the middle of a phrase, when asked to translate between two non-English languages.
This is pretty standard, even in human (non-automated) translation.
One example is technical/service documentation for a heavy machinery company I worked for. The tech writers were based in Germany, but O&M Manuals were required in Japanese for sale in Japan. Those docs were translated using English as a "pivot language". Usually dictated by pricing (fewer German+Japanese translators = much higher cost).
But one advantage of automated translations is that there shouldn't have to be a pivot language.
And in any case, for German->Japanese, going through English probably has a lower cost. But for Hindi->Nepali, you'll lose a lot of information as Hindi and Nepali are closely related and similar not only in terms of correspondences between vocabulary items, but also grammatical structures, which is effectively 'thrown away' if there's an English, or close-enough-to-English-to-effectively-be-English intermediate translation language. (Not to mention the inefficiencies of the equivalent of sending a package from Delhi to Kathmandu via London.....)
I think in the case of automated translations it's a function of training data and confidence rather than cost. If you don't have a corpus of translation data for the source/target language combination to draw from, you're essentially forced into a pivot model.
For Spanish, at least, it definitely isn't better than Google.
And recently everything I had to look up while reading Gabriel García Márquez, DeepL didn't know. After enough failures I gave up and returned to Google Translate for the remainder of the book.
Google may have a wider range of words, but DeepL is definitely much better in grammar and idioms, which Google tends to translate literally.
Moreover, Google often provides a single target word; whereas DeepL allows you to select from a range of synonyms clicking on a word, and will adjust the sentence accordingly to use the new word. When Google gets the context wrong and provides the wrong meaning for the translation, DeepL's capability to translate with a different meaning is invaluable.
same here. Admittedly it supports just a few language pairs but the translation quality is consistently and considerably better than Google Translate and other major offerings.
During a C1 German course I took, I tried writing an essay In English, using deepl and then submitting it to the teacher. Only manual thing done is choosing the correct alternative from the list of words deepl gives you.
The Teacher said that it was amazing and that many native students she had couldn't write that well.
This seems really quite variable. The second sentence I tried DE->EN ended up with an awkward and confusing literal translation of a phrase that Google Translate handled well.
I find the deepl translations in the available languages very good, much better than google translate. Unfortunately, the selection of languages is (still) very limited.
It looks useful but the lack of non-European languages makes me slightly suspicious, I wonder if their approach generalises to Arabic, Chinese, Japanese, etc.
It depends on which part of their approch you focus on. I'd expect their machine translation model to be sufficiently general to support basically any language given enough training data. Yes, the languages you listed have some edge cases, but so do languages they already support fine. For example, the lack of spaces separating words in Chinese and Japanese can be handled by the same word-piece segmentation they need for German compound words.
The bigger problem is likely to be lack of training data. Unless they have a pile of cash to pay professional translators to produce a parallel corpus, the alternative is to scrape translations from the internet. Basically, crawl the same site multiple times with different Accept-Language headers and try to align the results. Crucially, this depends on an existing ecosystem of bilingual websites with high-quality human translations.
According to DeepL's website they're a spin-off of Linguee, who provide a search service for exactly that kind of parallel data. So before DeepL starts supporting any given language pair, you should expect it to appear in Linguee first. https://en.wikipedia.org/wiki/Linguee
Edit: It took me a while to figure out how to select a different language on https://linguee.com (Their UI seems broken using mobile Firefox Preview.) Appending /english-chinese and /english-japanese to the URL shows that they already support those two, and the alignment of translations appears reasonable to me. No /english-arabic, though.
The European Union, the Swiss confederation, Belgium, Canada and other multilingual states, the European patent office and many international organization provide a huge corpus of professionally translated documents and reports for major European languages. Not so much for Japanese.
If you know of a good source for large piles of docs that have been accurately/naturally translated eng/ch/jp, I'm sure deepl would be interested. As another poster pointed out, any deep learning NLP project boils down to quantity+quality of data. I'm assuming adversarial approaches don't work well in this context but I'm not very familiar with nlp research.
Right but that's kind of the essence of machine translation. You won't always have high quality parallel training data for all languages so you have to find a way to thrive with low quality data.
Google has clearly made it their mission to solve that problem, and I'd say they've been rather successful.
I lived in Germany for a bit and Deepl is what everyone recommended over Google for professional translation. Google is great to figure out how to ask for your schnitzel at the shop, but Deepl is for when you want to make a deal.
I’m not sure about European English, but at least in American, it’s not normal to say “I live in Germany since a bit”. If you don’t live there anymore, you could say “I lived in Germany for a bit”. If you still live there, you could say “I have lived in Germany for a bit”
Based on some (limited) experience with people for whom German is L1 who are speaking English I suspect (without knowing any German myself) that this is a typical formation that a native German speaker would use when intending to form a "for a bit" phrase in English. tl;dr, I think it was a joke.
I have used it for translating several academic writeups, including proposals and papers from English to German. It works like a charm. Also, translates German official letters from banks and government, to English pretty well.
OK, some observations. It looks like they used the UN official documents as a part of their corpus, so it translates regular news from Russian into English almost perfectly. I was actually stunned how good the translation was.
But once you step away from it, quality goes down. I tried translating random pieces of Russian literature and it makes obvious mistakes. It can't even manage the structure of sentences, never mind word choice.
Translations from English are also bad. For example, it translated "I never felt that the translation is off" as "I never felt that the translation is turned off".
In the firm I made my internship, we have to use German in every communication. I use DeepL to check my E-mail or help me write speech for the presentation, etc (not a native speaker). The translator is wonderful!
I'll be blown away when it does near-perfect JP/EN translation (which it doesn't even seem to support). No machine translation has ever been close to being remotely good when it comes to JP/EN, including the ones developed by GAFAM.
I use it since 2 years, mainly for French->English and occasionaly for English->French. I love it. It understand very well idioms and propose excellent translations. Even for traduction of single words, it is far better than google.
As far as I've seen that doesn't support translating web pages other than copy-pasting the text you want to translate... Unless I'm missing something. So HN, tell me, how do I get to that functionality, if it's there?
I was hoping to use them for an application, but their API pricing is orders of magnitude higher than Google's and Microsoft's. I guess they must be focused on the web application primarily.
Sometimes after writing a text I decide to throw it into DeepL for shits and giggles and the translations are pretty much as good as native every time.
The demo of https://www.mozilla.org/de/ shows what I consider to be a big missed opportunity in the approach: it only suggests a machine translation, and not to switch to the English page, which is probably a better course of action.
Specifically, the page is marked up with such tags as these:
I would prefer the browser to first suggest “do you want to switch to the English (British) version of this page?”
Sure, I may actually want a machine translation of the text I see on page—the content may be different on the other language’s page; but if the page indicates alternatives are available, I’d prefer to try that first.
That's so true! Semantic annotations in the HTML header are a missed opportunity for web standards. I'm not meaning Dublin Core (https://www.dublincore.org/ -- declined somehow with SEO engineering), but relational data:
From these six examples, modern desktop Firefox/chrome only displays adding the search engine (when clicking in the search/omni box). RSS died a long death (the availability was removed in Firefox a few months ago). And all other information was only ever displayed by the good old Opera browser, who had a special symbol bar for relational data and even overloaded the prev/next browser navigation buttons with prev/next header suggestions from the page.
Before you say that Google has world-class ML engineers and Mozilla doesn't even have a chance of competing, consider that the quality of the startup deepl's translator is comparable and possibly even better to the quality of giants like Google or Microsoft. So this seems like one of those problems where small companies can still be competing with GAFAM-scale ones.
So much of machine translation is the quality of your parallel corpus and your preprocessing. I've got a theory that Google actually has too much data, and maybe too much imperfect data. Also, Deepl came out of Linguee (https://en.wikipedia.org/wiki/Linguee), who appear to have a massive number of parallel texts...I really believe whoever who has the (well-curated) texts has the (NMT) world.
As an aside, I don't know who needs to know this, but we are really lucky right now in that we now have tools that enable pretty much anyone to train a language model and translate with it, not just giant or small companies. It's not necessarily going to be any good, but the tools are all right there for us plebes and it's pretty fun.
Google is now just a sluggish tech bully and only shows world-beating performance through acquisitions (such as, sadly, their acquisition of DeepMind).
The Google of 1999 - that a lot of us thrilled to - is long dead.
I am in the translation business, and although we do not use MT so far for the verticals we focus on, perhaps what shocks me most is Deepl Alexa 90-day trend [0]. It is so smooth it seems faked.
It says "[t]ranslate from any language", but doesn't seem to support Chinese or Japanese at all.
To me, translating other Western languages to English is never a problem. I can use any half-decent online translation services and the result is totally understandable.
When you're just trying to read stuff to understand it, "understandable" is good enough. I'll be waiting for a looong time for CN/JP/KR translation to reach anywhere close to Not Utterly Incoherent.
Me, too, me, too. Knowing a little bit of Chinese, I can understand that this is not an easy job for an AI. Too much context dependend stuff. Or maybe, there are simply not enough bilingual documents for the training. That's the great advantage of the European union :)
"Translate from any language" in the context of deepl.com website means "detect language".
AFAIK, only the devil can translate from any language (not even Google Translate can translate from any language, in spite of attributed evilness to Google).
All the Google ML experts in the world don't matter if Google translate isn't staffed(no idea if it is or it isn't). It's quite possible that deepl has more engineers and ML experts working on the problem than Google does. Google's deep pockets and easily transferrable roster does have the ability to let it play catch-up very quick though, if anyone with influence thinks deepl has gotten too far ahead.
Deliverable is funding/contract language. At month 3 any bigger project bringing different partners together is still in a very early coordination stage.
It should be noted that this project is funded by the European union. You can find all funding details, reports and publications at https://cordis.europa.eu/project/rcn/219608/factsheet/en (the project started in January 2019, so there are not much reports yet).
It is amazing that they work so closely together with Firefox, so the project result will be really of a use for not only European citizens but people on all the world.
I think that all the "hidden in the cloud" services like what Google Photos does, translations, well basically every service Google has, must go "edge computing" in the next years. The hardware side is already powerful enough, even in cheap Android smartphones, we just miss a better, dedicated software.
Basically an Android phone without many Google Services, although there are some, like Google identity & authentication, which useful and online by their nature and you cannot get rid of them easily. And Google Services come as a pack...
Google is doing that already. They recently released on-device realtime live video captioning for example.
There will always be a performance gap between multi-gigabyte server ML models and client side ones though - and it's up to the users how they prefer the privacy Vs performance tradeoff.
Except that would allow people to reverse engineer the code which I think is one the main reasons there is still a lot of services that are in the cloud.
Thank you Mozilla! The translation capabilities are the only reason I still have Chrome installed on my phone and laptop. And they are vital to me, since I live in a country whose language I still don't speak.
I wonder if this explains why Mozilla unexpectedly remotely disabled Page Translator add-on yesterday after previously permitting it to be distributed as a side-load for two years. https://github.com/jeremiahlee/page-translator/issues/26
Likely? I’m inclined to believe that there’s something wrong with your addon, rather than Firefox being anticompetitive about an extension that doesn’t compete with a feature that doesn’t exist yet.
When Mozilla rejected it from being distributed on AMO years ago because it included Google and Microsoft translation libraries externally, they said it could be distributed as a side load, just not on AMO. It hasn’t been updated in a year. The policy position changed yesterday.
Bergamot was there for at least months (for example, the JD for the Bergamot-Mozilla collaboration was posted before this May [1]) and only advertised a few days ago. It is most likely a coincidence, given that Google and Microsoft has full rights to complain to Mozilla. Or, in the reverse direction, they may have complained because of public articles about Bergamot.
I made a post roughly two hours ago on the Firefox Reddit about your add-on getting disabled (a poster there provided me with an excellent workaround), and now come back to HN and this is the top post.
I really really hope this is a coincidence vs. Mozilla having some sort of tiff with Google and us being in the crossfire.
When implementing this please take into account that sometimes people are ok with reading content in another language.
I don't use chrome that much so maybe it's a configuration thing I never looked into, but the nagging bar asking me if I want to translate every single english page is very annoying. Not a native english speaker but confortable enough to never need the page translated.
Nice. Firefox has been missing this for a while, one of the reasons I often end up switching back to Chrome as it does the translation in-place / in the DOM rather than using a Google Translate proxy frame like Firefox addons do.
Some years ago when Google translate was released I made a large web app compatible so I didn't have to translate everything manually. Then Google crippled it. One of the cool feature was that you could chat with other users and it would be translated in real time. I'm now planning to add translations to another app, but if it could be done Automatically in Firefox, like with Google translate when it first came out, that would be so awesome. Would save me a lot of work.
I have a habit of recommending FF to my colleagues. We're based in Germany, and many of us are non-natives. On-page translation is the most common reasons they stick to Chrome.
I hope when this is included in a stable release, it'll convince some of them to make the shift.
I'm having trouble finding which languages this supports. I am mostly in need of eastern Asian languages like Chinese, Korean and Japanese. Will this support them?
I second this. Bergamot's website mentions Europe ("...increases the uptake of language technologies in Europe...") but nothing about Asia, which isn't very encouraging.
It's we're funded by the EU and the languages we're contractually obliged to launch are select EU languages. However, if successful, we hope Mozilla and the community will build more languages. We are are that CJKV is an important market.
This is great. I hate relying on the translation services of Microsoft and Google, but I’ve been using them since they’re typically more convenient to use than separate applications.
I'm glad this isn't cloud-based. I keep logged out of google as a rule, and at least the last time I checked you had to be logged in to use Google Translate.
FWIW, DeepL has restrictions on IP depending on how often you translate things. I've had to connect to various VPNs to continue a conversation without having to login.
It is a good translation tool (and I've used several, Google, Bing) and sure, it's accurate and all but DeepL doesn't compare when it comes to conversational translation or, voice to text etc. If it's simple copy and pasting it does that sufficiently well.
Wow, I didn't think that people actually use on-site-translation.
I know that translation became a lot better (using deepl [0] personally, check it out if you haven't heard of it!) in recent years, but on-site-translation always seemed quite goofy to me.
I wonder if this caters more to a broader (elder) audience, making Firefox more attractive to less tech-affine folks? Seems like a critical feature for "your grantparents computer", doesn't it?
I'm not sure why you'd come to the conclusion that nobody uses on-site translation. Have you never visited a non-English site that you wanted to read/navigate but couldn't? Even if the translation isn't perfect, or even great, it's better than understanding 0 words and it's more than enough to navigate a page or to capture the essence of an article. It isn't a daily occurrence for me, but it happens often enough that I'm glad that this is happening. Switching to Chrome just for that was annoying.
Being based in Europe, I use on-site translation all the time when looking at things in other countries near me - not everything is translated into English.
Honestly I hope Mozilla will allow me to disable onsite translation to the point where I wouldn't know it's available.
I'm sure it's a nice feature to lots of people, but I never want to see it. It's just another annoyance that I don't wish to deal with. Dealing with multiple language is still something browsers and websites struggle with. Yes, I know I have a Danish IP, and my language settings is British English, but just give me the original Swedish version of the content it's FINE.
Much more useful feature, for my use case, would be language detection of input fields, so I don't have to switch between dictionaries to get the right spellchecker.
> Yes, I know I have a Danish IP, and my language settings is British English, but just give me the original Swedish version of the content it's FINE.
Given these settings, how would you expect the browser to know you want to see the Swedish content? You have explicitly indicated that you want British English (if available).
Funny. I have the opposite problem. I have a German IP, my OS/region/locale is set to American English, and sites regularly give me the German version even if I don't want/need it and even if I've previously been to the site and set it to English. MDN is the biggest offender for me right now because I use it nearly every day, and every single time I go to the damn site it's in German and I don't want it in German.
Simple: give me a popup the first time it is not clear what language I want content in. Put a don't ask me ever again tickbox on this popup.
I will click No, keep content language and tick don't ask me ever again and be super happy.
If you really want me to like you, put a tickbox in the settings so I can save this as default into my other browsers (that don't use cookies / don't save data)
I would expect/hope setting intl.accept_languages to include "da,sv,en" to be enough to make it stop translating those (in about:config or through the GUI)
I have no way of judging the correctness of any given translation of a language I can't already read, so I would prefer that the browser doesn't do any automatic translation. Personally I would prefer to not be able to read the content of a site, compared to getting a translation that may be wrong, even if it's clearly indicated that it's an automated translation.
Because it's very much site dependent what language I want. I don't expect the browser to transmit ANY kind of language information to the site.
For instance I never want the Danish version of Wikipedia, but I do want the Danish version of a Danish news site, or the Swedish version of a Swedish website. There's no real way to indicate those preferences to the browser, so it would be best if it never try to deal with language.
I have some "advanced" language settings, but they are almost always ignored. For example google search ignores it and uses some kind of geolocation to decide what languages I want my search results from.
I do not see the association between elder audiences and on-site translation. Let's say you are a techie and you understand zero Polish yet you got a job in Poland and you want to buy some equipment for your computer from https://www.x-kom.pl/ - how would you do it without on-site translation?
I'm in that situation and i pretty much use Chrome exclusively for on-site translation.
I use on-site machine translation a lot. In fact, I even use that for languages I do understand---at the reduced rate though, alleviating my mental overhead. Of course, you always have to understand that the machine translation can go off and both the common sense and the domain knowledge (including even the tiniest bit of knowledge about the source language) are important.
Not heaving in-page translation is the main reason my friends wouldn't switch to Firefox. It's also why I sometimes launch Chrome. Living in a foreign country, you often you have to login and fill in forms in a language you don't know. Being able to translate the whole page in place is extremely useful.
Bergamot has an entire work package devoted to helping you fill out forms in a language you don't speak, while being confident that your answers are correct. We're testing out concepts here: https://github.com/zouharvi/ptakopet .
[1]: https://www.deepl.com/translator