It passes the "Turkey" <=> "turkey" test: "In Turkey they sometimes eat turkey." => "In der Türkei essen sie manchmal Truthahn." :D
Super cool! Real-time translation, in the browser, running locally! And sure, not state of the art / on the level of deepl, but on the level of Google Translate, 2015ish, maybe? Amazing!
Well, there's what people would typically interpret it as. Then you have the translation here which is a correct interpretation, but not that informative.
Interestingly, Google fails on both of these sentences and always translates "Turkey" as the country. DeepL on the other hand gets it right in both cases.
This is incredible and super important. For all the blunders of Mozilla in the last decade, they still have some great projects.
I am also grateful of them not scrapping common voice.
In the long run, I am a super huge fan of Mozilla and Firefox. I am using it right now. After a 10 year stint of using Chrome exclusively I now use Firefox as my main driver. Unfortunately I still need to keep chrome around for weird situations where the website developer only tested in Chrome (Yes this still exists. A shopping cart in a popular website - cough Home Depot cough cough - that recently failed me in Firefox worked in Chrome. I haven't tried in a couple weeks hopefully that is fixed.)
Also important, because now it seems at least in germany, on Google translate, there is the translate website button missing. From Switzerland I saw the button lately when I tryed. I don't know if it is because go to cencored (russian) sites. My company blocks google translate anyway, probably because of the same reason.
Yes, at least last week. The Internet from work is routed to Germany (Paderborn). Last week the third button "Websites" was missing. From home in switzerland the button was there.
Also if you entered "translate website" Google sugested Yandex on second place ..
you giving full credit to Mozilla is dishonest, to say the least
it aligns to their past projects, including using Mullvad and slapping a Mozilla sticker on top of it to claim it as their own
also it is super funny to read this:
> H2020-EU.2.1.1. - INDUSTRIAL LEADERSHIP - Leadership in enabling and industrial technologies - Information and Communication Technologies (ICT) MAIN PROGRAMME
> Our solution to that was to develop a high-level API around the machine translation engine, port it to WebAssembly, and optimize the operations for matrix multiplication to run efficiently on CPUs.
Mozilla heavily uses GitHub to host repositories for basically all of their projects (with possibly just the exception of Firefox itself). Anyone familiar with Mozilla’s open source efforts knows they use GitHub heavily, so stating that I didn’t see the source on their preferred platform is perfectly reasonable.
Your comment is pedantic and unhelpful. It is effectively the same as overhearing someone asking another person for a Kleenex, then choosing to interject and lecture that person on the difference between Kleenex and tissue paper when the other person does have actual Kleenex-brand tissue. Yes, I know what open source is. Yes, Mozilla uses GitHub. I even provided links to relevant GitHub repositories by Mozilla once I found what I was looking for.
Thanks to you for giving the extension a try! This is a very interesting feature, and I'd like to consider it. Would you mind filing an issue in the repo so we could track and discuss it here [1]?
+1. This was the first thing I tried to do and was surprised this feature doesn't exist. Most often, I don't encounter entire webpages in foreign languages, but rather small snippets of text.
It looks like both the web page version and browser extension download from storage.googleapis.com, which is not ideal (obviously much better than the text to be translated being sent). IMO, this should be clarified in the extension description. Are there any plans for Mozilla to host this data? (Well, Amazon I guess to match addons.mozilla.org)
I noticed the web page has Polish but not the extension. Is it not considered sufficiently high quality yet? (Looks like that might be the case from a quick test.)
As with others, I first tried to translate a bit of text not the whole page. Another super helpful feature likely already on a todo list is the ability to see alternative translation options.
Thanks again, this is great and I'm surprised it isn't a larger download. I am curious about any size/quality tradeoffs being made.
Also, playing with the web page a bit English->Spanish I see it does a whole text translation such that adding new sentences affects the translation of earlier ones, sometimes in very odd ways (with one combination of words it translated "Spanish->English" to "español-esplén"). It seems to sometimes produce "Spanish" words that don't seem to be actual words in Spanish as best I can tell and even accoding to its own Spanish->English. It seems like a way to indicate a section to be translated as a unit might be helpful.
> Are there any plans for Mozilla to host this data? (Well, Amazon I guess to match addons.mozilla.org)
I'll bring this internally for discussion.
> I noticed the web page has Polish but not the extension. Is it not considered sufficiently high quality yet? (Looks like that might be the case from a quick test.)
The Polish model finished training after we had the extension reviewed and signed for distribution, so that's why we could already integrate it in the website, which is controlled just by my team, and not in the extension yet, but we'll ship it in the next version which might come next week.
This feature you requested is something that I consider interesting and important, would you mind filing an issue in the repo so we could track it [1]?
We did not find any large quality issues after quantizing the models.
In regards to the neologism in Spanish, that is yes a condition that the consortium is aware and working to remediate it.
Thanks! Looks like I picked a bad test case for Polish, I tried a few more and some look perfect (boring corporate text) and others around the same as Google Translate (I don't know Polish or Spanish and mostly looked at song lyrics). I don't know if specific examples are helpful, but it looks like it particularly has a tough time with "W poprzek wpław" and I think a few more issues in:
I'll add a few feature requests to the issue tracker (if not there already) hopefully later today. Including a request to put that web page in the extension :).
Could you share details about the machine translation engine that is used (or where to find out more about it)? Are there any plans to open source the extension code (with the WebAssembly optmizations that are mentioned in the article)?
You can find the engine used here [1], the API built around it here [2] and its WASM port here [3] and the WebAssembly matrix multiplication optimizations are here [4]
Hi! This is an amazing project and will be really useful! Thank you!
I understand that the project is funded by EU so the focus is on European languages but are there any plans to add CJK or other languages ?
Yes, that's something we've been discussing internally and is being considered. In this meantime, please feel free to file an issue in the repo [1] so we could track it:
Yes, like I said above that's something we've been discussing internally and is being considered. In this meantime, please feel free to file an issue in the repo [1] so we could track it:
[1] https://github.com/mozilla/firefox-translations/issues
Hello, your tool is great! I have questions about the future of the project, is it planned to add languages in the future? The European Union grant ends in June and will the project continue to develop and add more languages in the future?
Thanks for using it! Best way currently is to keep using and reporting issues on [1]. You can see how the models are trained on [2] and file issues there too.
Google Translate code is present on many web sites to provide automatic translations of text. Could your translate code be uploaded to a server and embedded in web page to provide the same functionality?
I'm not aware of any actively maintained projects that give you this out of the box, but these two could be starting points for such a project.
Mozilla implemented a REST service based on (an earlier version of) bergamot-translator [1]. You could use that as a replacement for the WASM component in the addon's code.
I also know of some full-page translation demo code that uses the python bindings of bergamot-translator [2]. That's basically a web proxy a la Goole Translate.
Lastly, marian, the translation software that's being used, has a web server as well [3]. It does not support HTML though.
EDIT: see also my earlier comment for using it with Node or Python [4], which you could use to implement a simple web API.
Sure, like I mentioned in the article, you can embed the engine and the models in any web page to be run in a browser with proper WebAssembly and SIMD support.
You can have an example on how we did here [1] and test it here (I recommend using Firefox) [2]
That way you don't need a server and everything is processed in the browser, so no need of google translate, or any cloud service to have translations embedded in any website anymore.
All of them are freely available. Most of them through mtdata [1]. The exact list of the datasets is in the firefox-translations-training pipeline configuration file [2].
The sooner we move AI to 127.0.0.1 the better, enough with The Cloud powerhouses.
Yes there’s work to be done, resilience, power efficiency, responsiveness, but it’s the right direction for everything that involves private computing.
I've been using the extension [1] for a bit and, while it doesn't support too many languages, for the ones it does it's pretty cool to have it all running locally.
neat, but it looks like it was just released, so how were you using it before?
as an aside, pretty sad to see the project page, https://browser.mt/ , requiring not just javascript but specifically google connections to work. to 5 different google properties, no less.
I work at Mozilla, so got a sneak preview (and also the first bugs) :)
(Of course technically the work was out there in the open already, since it's Mozilla.)
Agreed about the Bergamot website. I suspect it's not by Mozilla, but I'll see if I can ask someone to take a look, as I don't think all those connections should be necessary.
is there a way to opt websites/URLS/ url roots in to automatic translations? currently the "auto translate this tab" works but you have to select it manually and for me at least on linux it makes the window more than 100% so i cannot use the right side of browser in this tab.
you know a simple setting "always translate this website" and an option to hide the bar
I was experimenting with running the wasm version of bergamot-translator (the translation engine used by the addon) in node [2].
However, if you want more performance, using the Python library [3] or the native C++ interface [4] gets you further because the wasm build is limited to a single thread and thus a blocking interface, and can't use all the processor specific optimisations that are in the native builds.
EDIT: Another option is using translateLocally [5], which is a Qt desktop app based on bergamot-translator. It has a native messaging API that is designed as a much faster alternative to the wasm build for browser extensions, but it can also be used from Python [6].
Looks lovely! Offline translations are very welcome in a world where the most important translation engines are also run by the world's biggest data hoarders.
Sadly, the extension either doesn't work on mobile or Mozilla couldn't be bothered to add it to the whitelist.
What I need from this is to be able to select text and just have it translated in a tooltip (or whatever.) This is what I'm using the Simple Translate Firefox add-on for but unfortunately it sends data to Google.
It would be nice to have something like that for desktop, but on mobile, iOS handles it amazingly.
You can select text almost anywhere (from browser to even from a screenshot/image; literally anywhere you are able to select text), and in a tooltip above the word, one of the few options is translate. I love the UX of it, as it is super intuitive and unobtrusive, and works pretty much instantaneously It runs fully locally, no connection required. Slides a native OS pane over the page to show possible translations along with pronunciations and other extra info.
Sidenote: other features in that tooltip are pretty nifty too. Aside from the obvious copy/cut/paste/share, i found "look up" to be quite useful when i see a word I've not encountered before. It pulls another native OS pane that shows dictionary definitions and extra info like the wikipedia link. And the actual dictionary definitions are local too afaik.
Android had the same feature, assuming apps don't disable the tooltip. Selecting text on my phone brings a nice context menu for cut/copy/paste/search/translate/encrypt (that last one was added by OpenKeychain, a PGP app).
It doesn't come with a dictionary built in, but the search button becomes an online dictionary in a pinch. Any dictionary app could extend the menu to add a local dictionary, of course.
> This set of requirements posed a number of technological challenges to the team: the translation engine was entirely written in programming languages that compile to native code. We needed a way to streamline the distribution of the project in order to avoid the overhead involved in providing builds compatible with all platforms supported by Firefox — that would be impracticable to scale and maintain.
Does Firefox really support so many different platforms and archs that CI builds are unrealistic?
I'm completely speculating, but it's probably a matter of not wanting to complicate iterating on the translation engine by introducing a bunch of cruft from the Firefox build system (which, though it uses GNU make under the hood, is very much bespoke and complicated).
Since the translation engine is intended to run on a product that hosts WASM, they might as well just build to that.
The upside of using WASM is that the extension itself can be easily ported to other browsers and platforms. The UI uses Firefox specific APIs but the parts that take the HTML from a page and push it through the translation engine would also work in any Chrome-based browser.
(Edit: also free sandboxing of a blob of C++ code that needs to handle arbitrary input from the web!)
The point here is that it simpler to have just one NMT engine in WebAssembly shipped bundled in the extension than runs in all platforms for free, instead of having to support and distribute multiple native builds of the engine that would be sideloaded and communicate with the extension via native messaging.
When the project started for example, the engine has incompatible with ARM, which we could just get for free with WebAssembly. The performance penalty of doing that was minimal for the user eyes, and so far we received less than a handful of complains about it.
Bundling the entire engine on Firefox was never an option for many obvious reasons.
It's unfortunate that it doesn't translate Japanese, and reading Japanese-only resources is a common hurdle in the retro game modding/development community.
This is awesome. I've been using https://translatelocally.com/ a bit, which is the same backend, and been amazed at how they managed to squeeze those neural net models, the Norwegian–English one that's downloaded by translateLocally is just 15M! That's less than Chrome transferred to Google about you while you were reading this comment.[citation needed]
For i18n in my own projects, I typically use tools like gettext and involves lots of volunteers to do the translations. I might try out these neural machine translation tools to see how they fare. I also wonder if these machine translation tools are trained on a corpus of gettext datasets.
Stuck at "Loading translation engine..." from a long time. Tried German and Spanish. Can't tell if it's downloading some model data or something's failed. I suggest some kind of progress indicator.
French is being trained by the consortium and is part of the scope of the grant, so we should have it at some point in June and we'll ship and update as soon we have it.
I wouldn't comment on the absence of French vis-a-vis other languages. It's just slightly surprising because English <-> French is honestly a very widely studied translation sub-task, with an enormous amount of parallel corpora available for training these models.
The demo site works on mobile if you let it load the necessary content so if you're speaking from a web dev point of view: definitely.
As for the addon, on Android you'll need to install an unstable version of Firefox and configures a custom addon list in an addons.mozilla.org account that includes it so you can download it.
On iOS there isn't any option to download addons as far as I'm aware. On mobile Linux environments everything should work like on desktop.
I think the Firefox extension might not work on mobile because it hooks into some undocumented addon apis to draw that translation bar UI. Those might not be available on mobile.
The translation code itself should work on mobile. It's just some javascript & wasm (albeit with SIMD instructions not implemented in Safari's WASM vm…)
You can't download any addon for Firefox on iOS because it's almost Safari, only looking a bit different. All browsers on iOS has to use WebKit so FF is not really FF here on iOS.
I know, but I don't know what iOS webkit does and doesn't allow. Injecting code and responding to events in a way that makes the most important webextension APIs work _sounds_ like it should be possible (though it would take lots of work).
Sad to hear there's no such feature on iOS. At least Android gets uBlock Origin in the stable channels. Hopefully this will change when Apple will eventually be forced to permit other browsers into the iOS app store.
EU is after Apple in terms of Appstore - they want to force Apple to allow different stores. Then FF for iOS (the "real" one) could be distributed this way.
That's funny. I've just tried to translate "fuck you" to Russian and I got "трахать тебя" while Google Translate gives the more accurate "пошел на хуй".
"The telescreen received and transmitted simultaneously. Any sound that Winston made, above the level of a very low whisper, would be picked up by it; moreover, so long as he remained within the field of vision which the metal plaque commanded, he could be seen as well as heard. There was of course no way of knowing whether you were being watched at any given moment."
"How often, or on what system, the Thought Police plugged in on any individual wire was guesswork. It was even conceivable that they watched everybody all the time."
If the data never leaves your device, then a third-party service never gets the opportunity to leak or misuse it. This is far more private.
How many stories have you heard about breaches due to accidentally mis-configured logging in web services? Also in the news lately was Twitter misusing 2fa phone numbers for advertising purposes.
In the current status quo you either make use of an api by indentifying yourself with a key or a browser session that is fingerprintable in a gazillion ways. There is no such thing as "not sending user ID" or if there is, it has a totally negligible reach.
If local translation can help me use a website without a query to some cloud server... who needs the cloud? No backend that will experience downtime, and someday be decommissioned. No money sink of cloud processing pressuring the product to advertise or monetise in unscrupulous ways.
I'm sure cloud processing is better in many ways. But if this is "good enough" I'd rather just do it all locally.
Super cool! Real-time translation, in the browser, running locally! And sure, not state of the art / on the level of deepl, but on the level of Google Translate, 2015ish, maybe? Amazing!