Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Mozilla releases local machine translation tools as part of Project Bergamot (blog.mozilla.org)
490 points by Vinnl on June 2, 2022 | hide | past | favorite | 123 comments


It passes the "Turkey" <=> "turkey" test: "In Turkey they sometimes eat turkey." => "In der Türkei essen sie manchmal Truthahn." :D

Super cool! Real-time translation, in the browser, running locally! And sure, not state of the art / on the level of deepl, but on the level of Google Translate, 2015ish, maybe? Amazing!


However, "Turkey is not a common food in Turkey." != "Die Türkei ist kein gemeinsames Essen in der Türkei."


Also, „common“ should be translated to „üblich“ instead of „gemeinsam“. „Gemeinsam“ is more like „collective“ as in „a collective effort“.


As usual, deepl doesn't disappoint.


Well, that's technically ambiguous in English too. I don't think many people are eating their own country ;) .


Hmm, is it ambiguous then? Seems there's only one interpretation


Well, there's what people would typically interpret it as. Then you have the translation here which is a correct interpretation, but not that informative.


Interestingly, Google fails on both of these sentences and always translates "Turkey" as the country. DeepL on the other hand gets it right in both cases.



They actually put an umlaut into the official name to really make sure it won't be used correctly internationally?


I was thinking about ISO-3166 part, but it seems standard contains already some names with special character. i.e, Réunion. https://en.m.wikipedia.org/wiki/List_of_ISO_3166_country_cod...


Just reformulate:

"Türkiye quit being called Turkey cold turkey"


This is incredible and super important. For all the blunders of Mozilla in the last decade, they still have some great projects. I am also grateful of them not scrapping common voice.


In the long run, I am a super huge fan of Mozilla and Firefox. I am using it right now. After a 10 year stint of using Chrome exclusively I now use Firefox as my main driver. Unfortunately I still need to keep chrome around for weird situations where the website developer only tested in Chrome (Yes this still exists. A shopping cart in a popular website - cough Home Depot cough cough - that recently failed me in Firefox worked in Chrome. I haven't tried in a couple weeks hopefully that is fixed.)


Also important, because now it seems at least in germany, on Google translate, there is the translate website button missing. From Switzerland I saw the button lately when I tryed. I don't know if it is because go to cencored (russian) sites. My company blocks google translate anyway, probably because of the same reason.


Try copying the url into the translator's text field. It's how I've been using it for years.


Are you sure?

Under Google Übersetzer I see three button: Text, Dokumente, Websites


Yes, at least last week. The Internet from work is routed to Germany (Paderborn). Last week the third button "Websites" was missing. From home in switzerland the button was there.

Also if you entered "translate website" Google sugested Yandex on second place ..


it's not their project, all they did was to write a form in JS

the whole project is a EU funded one, all done in the university of Edinburgh

https://cordis.europa.eu/project/id/825303

you giving full credit to Mozilla is dishonest, to say the least

it aligns to their past projects, including using Mullvad and slapping a Mozilla sticker on top of it to claim it as their own

also it is super funny to read this:

> H2020-EU.2.1.1. - INDUSTRIAL LEADERSHIP - Leadership in enabling and industrial technologies - Information and Communication Technologies (ICT) MAIN PROGRAMME

little do they know, EU never learn


All they did was ‘write a form in JS’???

> Our solution to that was to develop a high-level API around the machine translation engine, port it to WebAssembly, and optimize the operations for matrix multiplication to run efficiently on CPUs.


Hi, I am part of the team who developed this and the author of the article. You can ask me anything about it if you have questions.


Is this open source? I don't see a github link anywhere, and I'm not sure if the models are freely usable.

EDIT: maybe this is it: https://github.com/mozilla/firefox-translations-models

also some info here: https://github.com/mozilla/firefox-translations-training


Yes, the extension is being developed here [1] and the engine[2]/wasm[3] wrapper here.

The models and training pipeline are in the urls your posted and the evaluation of those models are hosted here [4]

[1] https://github.com/mozilla/firefox-translations [2] https://github.com/browsermt/bergamot-translator [3] https://github.com/browsermt/bergamot-translator/tree/main/w... [4] https://github.com/mozilla/firefox-translations-models/tree/...



Did you mean link to the source code? GitHub shouldn't be equated with open source—especially considering the core product, GitHub, is closed source.


Mozilla heavily uses GitHub to host repositories for basically all of their projects (with possibly just the exception of Firefox itself). Anyone familiar with Mozilla’s open source efforts knows they use GitHub heavily, so stating that I didn’t see the source on their preferred platform is perfectly reasonable.

Your comment is pedantic and unhelpful. It is effectively the same as overhearing someone asking another person for a Kleenex, then choosing to interject and lecture that person on the difference between Kleenex and tissue paper when the other person does have actual Kleenex-brand tissue. Yes, I know what open source is. Yes, Mozilla uses GitHub. I even provided links to relevant GitHub repositories by Mozilla once I found what I was looking for.


Thanks for the extension!

Are you planning on adding a “select some text → right click → Translate in a tooltip” feature? It'd be extremely useful for language learners.


Thanks to you for giving the extension a try! This is a very interesting feature, and I'd like to consider it. Would you mind filing an issue in the repo so we could track and discuss it here [1]?

[1] https://github.com/mozilla/firefox-translations/issues


Another user has posted a link to that issue: https://github.com/mozilla/firefox-translations/issues/358. I upvoted it instead of creating a new one.


+1. This was the first thing I tried to do and was surprised this feature doesn't exist. Most often, I don't encounter entire webpages in foreign languages, but rather small snippets of text.

It seems there is an open issue for this: https://github.com/mozilla/firefox-translations/issues/358


Hotkey + hover is very nice for words, fwiw.


Can you please add the list of languages to the description of the addon[1]? For a translation addon this is crucial information imho.

[1] https://addons.mozilla.org/en-US/firefox/addon/firefox-trans...


Just did it, thanks for the suggestion!


Thanks, really wonderful project!

I tried it quick and a few comments:

It looks like both the web page version and browser extension download from storage.googleapis.com, which is not ideal (obviously much better than the text to be translated being sent). IMO, this should be clarified in the extension description. Are there any plans for Mozilla to host this data? (Well, Amazon I guess to match addons.mozilla.org)

https://mozilla.github.io/translate/

I noticed the web page has Polish but not the extension. Is it not considered sufficiently high quality yet? (Looks like that might be the case from a quick test.)

As with others, I first tried to translate a bit of text not the whole page. Another super helpful feature likely already on a todo list is the ability to see alternative translation options.

Thanks again, this is great and I'm surprised it isn't a larger download. I am curious about any size/quality tradeoffs being made.

Also, playing with the web page a bit English->Spanish I see it does a whole text translation such that adding new sentences affects the translation of earlier ones, sometimes in very odd ways (with one combination of words it translated "Spanish->English" to "español-esplén"). It seems to sometimes produce "Spanish" words that don't seem to be actual words in Spanish as best I can tell and even accoding to its own Spanish->English. It seems like a way to indicate a section to be translated as a unit might be helpful.


> Are there any plans for Mozilla to host this data? (Well, Amazon I guess to match addons.mozilla.org)

I'll bring this internally for discussion.

> I noticed the web page has Polish but not the extension. Is it not considered sufficiently high quality yet? (Looks like that might be the case from a quick test.)

The Polish model finished training after we had the extension reviewed and signed for distribution, so that's why we could already integrate it in the website, which is controlled just by my team, and not in the extension yet, but we'll ship it in the next version which might come next week.

This feature you requested is something that I consider interesting and important, would you mind filing an issue in the repo so we could track it [1]?

We did not find any large quality issues after quantizing the models.

In regards to the neologism in Spanish, that is yes a condition that the consortium is aware and working to remediate it.

[1] https://github.com/mozilla/firefox-translations/issues


Thanks! Looks like I picked a bad test case for Polish, I tried a few more and some look perfect (boring corporate text) and others around the same as Google Translate (I don't know Polish or Spanish and mostly looked at song lyrics). I don't know if specific examples are helpful, but it looks like it particularly has a tough time with "W poprzek wpław" and I think a few more issues in:

https://www.musixmatch.com/lyrics/Teskno/Woda-t%C4%99skno-gr...

I'll add a few feature requests to the issue tracker (if not there already) hopefully later today. Including a request to put that web page in the extension :).


Hi,

This is an awesome project, congratulations!

Could you share details about the machine translation engine that is used (or where to find out more about it)? Are there any plans to open source the extension code (with the WebAssembly optmizations that are mentioned in the article)?

Thanks.


A fork of marian-dev[1] is the underlying machine-translation engine:

- https://github.com/browsermt/marian-dev

Development of higher-level code wrapping around marian-dev make suitable for the browser-extension happens at:

- https://github.com/browsermt/bergamot-translator

Some of the WebAssembly optimizations are available in bergamot-translator/marian-dev. Rest are in Firefox source-code. A start point could be https://bugzilla.mozilla.org/show_bug.cgi?id=1720747.

Extension code is open-source, and linked already in other comments: - https://github.com/mozilla/firefox-translations

[1] https://github.com/marian-nmt/marian-dev


You can find the engine used here [1], the API built around it here [2] and its WASM port here [3] and the WebAssembly matrix multiplication optimizations are here [4]

[1] https://marian-nmt.github.io/

[2] https://github.com/browsermt/bergamot-translator

[3] https://github.com/browsermt/bergamot-translator/tree/main/w...

[4] https://github.com/mozilla/gecko-dev/tree/master/third_party...


At least the code parts seem to be on GitHub: https://github.com/browsermt


Hi! This is an amazing project and will be really useful! Thank you! I understand that the project is funded by EU so the focus is on European languages but are there any plans to add CJK or other languages ?


Yes, that's something we've been discussing internally and is being considered. In this meantime, please feel free to file an issue in the repo [1] so we could track it:

[1] https://github.com/mozilla/firefox-translations/issues


Chinese language is what I most commonly want to translate. Is there any planned support for this?


Yes, like I said above that's something we've been discussing internally and is being considered. In this meantime, please feel free to file an issue in the repo [1] so we could track it: [1] https://github.com/mozilla/firefox-translations/issues


Hello, your tool is great! I have questions about the future of the project, is it planned to add languages in the future? The European Union grant ends in June and will the project continue to develop and add more languages in the future?


What can we do as users or contributers to help improve the accuracy of this extension? It's already amazing and would love to see it get even better.


Thanks for using it! Best way currently is to keep using and reporting issues on [1]. You can see how the models are trained on [2] and file issues there too.

[1] https://github.com/mozilla/firefox-translations/issues [2] https://github.com/mozilla/firefox-translations-training


Google Translate code is present on many web sites to provide automatic translations of text. Could your translate code be uploaded to a server and embedded in web page to provide the same functionality?


I'm not aware of any actively maintained projects that give you this out of the box, but these two could be starting points for such a project.

Mozilla implemented a REST service based on (an earlier version of) bergamot-translator [1]. You could use that as a replacement for the WASM component in the addon's code.

I also know of some full-page translation demo code that uses the python bindings of bergamot-translator [2]. That's basically a web proxy a la Goole Translate.

Lastly, marian, the translation software that's being used, has a web server as well [3]. It does not support HTML though.

EDIT: see also my earlier comment for using it with Node or Python [4], which you could use to implement a simple web API.

[1] https://github.com/mozilla/translation-service

[2] https://github.com/jerinphilip/tagtransfer

[3] https://marian-nmt.github.io/docs/#web-server

[4] https://news.ycombinator.com/item?id=31599231


Thanks!


Sure, like I mentioned in the article, you can embed the engine and the models in any web page to be run in a browser with proper WebAssembly and SIMD support.

You can have an example on how we did here [1] and test it here (I recommend using Firefox) [2]

That way you don't need a server and everything is processed in the browser, so no need of google translate, or any cloud service to have translations embedded in any website anymore.

[1] https://github.com/mozilla/translate [2] https://mozilla.github.io/translate


What is the dataset used for training the model? Where did the data come from?


All of them are freely available. Most of them through mtdata [1]. The exact list of the datasets is in the firefox-translations-training pipeline configuration file [2].

[1] https://pypi.org/project/mtdata/

[2] https://github.com/mozilla/firefox-translations-training/blo...


really great stuff! any plans for this on firefox mobile?


Yes, we are investigating how to support Firefox for Android.


The sooner we move AI to 127.0.0.1 the better, enough with The Cloud powerhouses.

Yes there’s work to be done, resilience, power efficiency, responsiveness, but it’s the right direction for everything that involves private computing.


Also the right direction if we intend to put this stuff into actual robots.


I've been using the extension [1] for a bit and, while it doesn't support too many languages, for the ones it does it's pretty cool to have it all running locally.

[1] https://addons.mozilla.org/firefox/addon/firefox-translation...


neat, but it looks like it was just released, so how were you using it before?

as an aside, pretty sad to see the project page, https://browser.mt/ , requiring not just javascript but specifically google connections to work. to 5 different google properties, no less.


I work at Mozilla, so got a sneak preview (and also the first bugs) :)

(Of course technically the work was out there in the open already, since it's Mozilla.)

Agreed about the Bergamot website. I suspect it's not by Mozilla, but I'll see if I can ask someone to take a look, as I don't think all those connections should be necessary.


is there a way to opt websites/URLS/ url roots in to automatic translations? currently the "auto translate this tab" works but you have to select it manually and for me at least on linux it makes the window more than 100% so i cannot use the right side of browser in this tab.

you know a simple setting "always translate this website" and an option to hide the bar


Do you know if this will be open-sourced, or if the repo is already available?


I think this is probably the source https://github.com/mozilla/firefox-translations

edit: and for the actual translations https://github.com/mozilla/bergamot-translator


awesome, thanks! i also suspect it's by the EU coalition behind bergamot, so probably beyond mozilla's jurisdiction, but it doesn't hurt to ask.


.mt huh thats a new one for me


Do you know what the pipeline looks like for new language pairs being added? This is really, really, really awesome

I'm also immediately curious about using it headless outside the browser


The training pipeline is also on Github! [1]

I was experimenting with running the wasm version of bergamot-translator (the translation engine used by the addon) in node [2].

However, if you want more performance, using the Python library [3] or the native C++ interface [4] gets you further because the wasm build is limited to a single thread and thus a blocking interface, and can't use all the processor specific optimisations that are in the native builds.

EDIT: Another option is using translateLocally [5], which is a Qt desktop app based on bergamot-translator. It has a native messaging API that is designed as a much faster alternative to the wasm build for browser extensions, but it can also be used from Python [6].

[1] https://github.com/mozilla/firefox-translations-training

[2] https://gist.github.com/jelmervdl/a4c8b6b92ad88a885e1cbd51c6...

[3] https://colab.research.google.com/drive/1AHpgewVJBFaupwAbZq0...

[4] https://github.com/browsermt/bergamot-translator/blob/main/a...

[5] https://github.com/XapaJIaMnu/translateLocally

[6] https://github.com/XapaJIaMnu/translateLocally/blob/master/s...


Looks lovely! Offline translations are very welcome in a world where the most important translation engines are also run by the world's biggest data hoarders.

Sadly, the extension either doesn't work on mobile or Mozilla couldn't be bothered to add it to the whitelist.


What I need from this is to be able to select text and just have it translated in a tooltip (or whatever.) This is what I'm using the Simple Translate Firefox add-on for but unfortunately it sends data to Google.


It would be nice to have something like that for desktop, but on mobile, iOS handles it amazingly.

You can select text almost anywhere (from browser to even from a screenshot/image; literally anywhere you are able to select text), and in a tooltip above the word, one of the few options is translate. I love the UX of it, as it is super intuitive and unobtrusive, and works pretty much instantaneously It runs fully locally, no connection required. Slides a native OS pane over the page to show possible translations along with pronunciations and other extra info.

Sidenote: other features in that tooltip are pretty nifty too. Aside from the obvious copy/cut/paste/share, i found "look up" to be quite useful when i see a word I've not encountered before. It pulls another native OS pane that shows dictionary definitions and extra info like the wikipedia link. And the actual dictionary definitions are local too afaik.


Android had the same feature, assuming apps don't disable the tooltip. Selecting text on my phone brings a nice context menu for cut/copy/paste/search/translate/encrypt (that last one was added by OpenKeychain, a PGP app).

It doesn't come with a dictionary built in, but the search button becomes an online dictionary in a pinch. Any dictionary app could extend the menu to add a local dictionary, of course.


Android has this too. BUT:

When my phone is on portrait mode (almost always), I don't see the translate option until I tap on the three dots.

The translation isn't instant. It takes a second to show up, and then takes up the top of the screen.

I'd much prefer a UI similar to the Zhongwen Chrome extension.


> It takes a second to show up, and then takes up the top of the screen.

I think this could have to do with it not being local...


Yeah, Android has the same.


This is awesome, but...

> This set of requirements posed a number of technological challenges to the team: the translation engine was entirely written in programming languages that compile to native code. We needed a way to streamline the distribution of the project in order to avoid the overhead involved in providing builds compatible with all platforms supported by Firefox — that would be impracticable to scale and maintain.

Does Firefox really support so many different platforms and archs that CI builds are unrealistic?


(Former Mozilla employee, here)

I'm completely speculating, but it's probably a matter of not wanting to complicate iterating on the translation engine by introducing a bunch of cruft from the Firefox build system (which, though it uses GNU make under the hood, is very much bespoke and complicated).

Since the translation engine is intended to run on a product that hosts WASM, they might as well just build to that.


The upside of using WASM is that the extension itself can be easily ported to other browsers and platforms. The UI uses Firefox specific APIs but the parts that take the HTML from a page and push it through the translation engine would also work in any Chrome-based browser.

(Edit: also free sandboxing of a blob of C++ code that needs to handle arbitrary input from the web!)


The point here is that it simpler to have just one NMT engine in WebAssembly shipped bundled in the extension than runs in all platforms for free, instead of having to support and distribute multiple native builds of the engine that would be sideloaded and communicate with the extension via native messaging.

When the project started for example, the engine has incompatible with ARM, which we could just get for free with WebAssembly. The performance penalty of doing that was minimal for the user eyes, and so far we received less than a handful of complains about it.

Bundling the entire engine on Firefox was never an option for many obvious reasons.


Only Debian alone builds Firefox for 7 different CPU architectures (12 if you count outdated packages too).


It's unfortunate that it doesn't translate Japanese, and reading Japanese-only resources is a common hurdle in the retro game modding/development community.


Not to mention the icon for the extension is "あ⇆a".


This is awesome. I've been using https://translatelocally.com/ a bit, which is the same backend, and been amazed at how they managed to squeeze those neural net models, the Norwegian–English one that's downloaded by translateLocally is just 15M! That's less than Chrome transferred to Google about you while you were reading this comment.[citation needed]

Apparantly Kenneth Heafield https://neural.mt/ of kenlm fame has been coordinating the project, also doing good work like organizing shared tasks on Efficient MT https://statmt.org/wmt21/efficiency-task.html


For i18n in my own projects, I typically use tools like gettext and involves lots of volunteers to do the translations. I might try out these neural machine translation tools to see how they fare. I also wonder if these machine translation tools are trained on a corpus of gettext datasets.


Excited for this, especially after Mozilla removed my and several other add-ons that simply loaded Google Translate's JavaScript library onto pages in 2019. https://www.jeremiahlee.com/posts/page-translator-is-dead/

Sadly, no support for Svenska (Swedish) to English yet.


Would be nice if desktop environments like GNOME/KDE and email clients like evolution/kmail could integrate this too.


Stuck at "Loading translation engine..." from a long time. Tried German and Spanish. Can't tell if it's downloading some model data or something's failed. I suggest some kind of progress indicator.


You might want to report that here, if it's not reported already: https://github.com/mozilla/firefox-translations/issues


Weird. I had a numeric progress indicator, and the model got downloaded in just a couple of seconds.


Awesome, tested the German model on dw.com, surprisingly fast and accurate.


I wonder why French is absent.

Meanwhile they have Persian which is not even in the EU.


French is being trained by the consortium and is part of the scope of the grant, so we should have it at some point in June and we'll ship and update as soon we have it.


I wouldn't comment on the absence of French vis-a-vis other languages. It's just slightly surprising because English <-> French is honestly a very widely studied translation sub-task, with an enormous amount of parallel corpora available for training these models.


Well, there's this and the fact that it is the second most popular language in the european union, which sponsor the project.


Should the EU languages get preferential treatment ?


"This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825303 ."


I love it. I wish it could translate Chinese to English.


This is amazing. Hope it can be adapted for Unix command line use.


Try https://translatelocally.com/ – it's the same backend, and should be usable in git HEAD from the cli: https://github.com/XapaJIaMnu/translateLocally/issues/51#iss...


Wow! Thanks!


Great that you contribute your own language pairs.


Can we use this on mobile?


The demo site works on mobile if you let it load the necessary content so if you're speaking from a web dev point of view: definitely.

As for the addon, on Android you'll need to install an unstable version of Firefox and configures a custom addon list in an addons.mozilla.org account that includes it so you can download it.

On iOS there isn't any option to download addons as far as I'm aware. On mobile Linux environments everything should work like on desktop.


I think the Firefox extension might not work on mobile because it hooks into some undocumented addon apis to draw that translation bar UI. Those might not be available on mobile.

The translation code itself should work on mobile. It's just some javascript & wasm (albeit with SIMD instructions not implemented in Safari's WASM vm…)


I just installed the extension on Fenix Nightly and indeed, it does not work.


You can't download any addon for Firefox on iOS because it's almost Safari, only looking a bit different. All browsers on iOS has to use WebKit so FF is not really FF here on iOS.


I know, but I don't know what iOS webkit does and doesn't allow. Injecting code and responding to events in a way that makes the most important webextension APIs work _sounds_ like it should be possible (though it would take lots of work).

Sad to hear there's no such feature on iOS. At least Android gets uBlock Origin in the stable channels. Hopefully this will change when Apple will eventually be forced to permit other browsers into the iOS app store.


EU is after Apple in terms of Appstore - they want to force Apple to allow different stores. Then FF for iOS (the "real" one) could be distributed this way.


Wow awesome!


[flagged]


How do you think translations work, exactly?


It’s what may be done to a text after the translation what makes me distrust it.

I speak a couple of languages, just in case you want to come back to your question regarding “how translations work”.


That's funny. I've just tried to translate "fuck you" to Russian and I got "трахать тебя" while Google Translate gives the more accurate "пошел на хуй".


Machine translations are accurate as a trebuchet past 300 yards, just a better than nothings. But they’re great tool so long user is aware.


In my experience, DeepL is still the undefeated leader when it comes to translating Russian obscenity, heh.


Try “Russian warship, go fuck yourself!” instead. It should work better.


What’s wrong with using cloud without sending any user id with the request?


"The telescreen received and transmitted simultaneously. Any sound that Winston made, above the level of a very low whisper, would be picked up by it; moreover, so long as he remained within the field of vision which the metal plaque commanded, he could be seen as well as heard. There was of course no way of knowing whether you were being watched at any given moment."


"How often, or on what system, the Thought Police plugged in on any individual wire was guesswork. It was even conceivable that they watched everybody all the time."


There's likely far more identifiable information in the actual text than a user ID provides.


Cloud assumes a constant, reliable internet connection, which is not the reality in most of the world. (Nor is it always desirable.)


If the data never leaves your device, then a third-party service never gets the opportunity to leak or misuse it. This is far more private.

How many stories have you heard about breaches due to accidentally mis-configured logging in web services? Also in the news lately was Twitter misusing 2fa phone numbers for advertising purposes.


What if what I'm trying to translate is sensitive information in itself?


In the current status quo you either make use of an api by indentifying yourself with a key or a browser session that is fingerprintable in a gazillion ways. There is no such thing as "not sending user ID" or if there is, it has a totally negligible reach.


If local translation can help me use a website without a query to some cloud server... who needs the cloud? No backend that will experience downtime, and someday be decommissioned. No money sink of cloud processing pressuring the product to advertise or monetise in unscrupulous ways.

I'm sure cloud processing is better in many ways. But if this is "good enough" I'd rather just do it all locally.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: