DeepL is different in my opinion. They always focused on machine learning for languages.
They must have acquired fantastic data for their Models. Especially because of the business language and professional translations which they focus on.
They keep your intended message in tact and just refine it. Like a book post editing. Grammarly and other tools force you to sound like they think is best.
DeepL shows, in my opinion, how much more useful a model trained for specific uses is.
Also, Google Translate is really not a particularly good translator. It has the most public knowledge, but as far as translators go it's pretty poor.
DeepL is a step up, and modern LLMs are even better. There's some data here[0], if you're curious - DeepL is beaten by 24B models, and dramatically beaten by Sonnet / Opus / https://nuenki.app/translator .
I made a tool which translates sentences as you browse, for immersion[0]. I solved this by giving the model a code (specifically, "483") to return in any refusal. Then, if I detect that in the output, I fail over to another model+provider.
I also have a few heuristics (e.g. "I can't translate" in many different languages) to detect if it deviates from that.
I've ended up doing a lot of research into LLM translation, because my language learning tool (https://nuenki.app) uses it a lot.
I built something kinda similar, and made it open source. It picks the top x models based on my research, translates with them, then has a final judge model critique, compare, and synthesise a combined best translation. You can try it at https://nuenki.app/translator if you're interested, and my data is at https://nuenki.app/blog
Very neat, love how there’s a formality level selection! Google translate has such bad tendencies to use very formal language (at least when translating into Thai) that it’s almost useless in real life. Some English to Thai examples I tried so far have been quite natural.
I assumed Google errs on the side of formality because being informal in an inappropriate context is worse than being too formal for someone who is obviously not a native speaker. Not for Thai in particular, just in general.
Also very interesting! Excellent research design and presentation, too.
Your results accord with my own (much less systematic) tests of the translation of short texts by reasoning models. The issue becomes more fuzzy with the translation of longer texts, where quality is more difficult to evaluate objectively. I'll drop you an email with some thoughts.
I spent quite a while doing that, and I still kinda do. I quite frequently add new features for a single person! The problem is that 25 people who are happy with the product and likely to share it is still only 25 people, and there aren't many features left to implement - the product is pretty mature.
I could improve the definitions, but that'd cost huge amounts of money in order to use proper dictionaries rather than Wiktionary, or I could add support for multiple languages at once, which two people have asked for but which would require a lot of dev time, and after that there isn't really much left to change.
And I've tried asking where I can find people, and they generally suggest "subreddits" (most ban self-promotion, and I've already posted on the ones that don't), and "discords" (all of them are very hostile to self-promotion).
I think the problem might be due to my landing page? People are far more likely to convert if I've already explained it to them. But I'm not sure how to convert a text explanation into the landing page form.
To be clear, the main product is translating from English to the language you're learning - not the other way around - so that you immerse yourself while you browse.
The hybrid translator is kinda a side thing based on my research into LLM translation quality.
It's a bit more complicated than using the best LLM, because it combines the results of the best ones, but yeah, that's broadly how it works. I made that part open source, anyway - I'm not trying to sell the hybrid translator, just use it as a marketing tool.
But I guess this comes back to the fact that I think my landing page might explain it poorly. I'm just not sure how to explain "It finds English sentences in webpages, filters out so that it's only the ones at your difficulty level, then translates them into your target language, so you're immersed while you browse" in the form of a ultra-low-attention-span landing page.
AI progress has also made high quality language translation a lot cheaper. When I started https://nuenki.app last year, the options were exorbitantly priced DeepL for decent quality low latency translation or Sonnet for slightly cheaper, much slower, but higher quality translation.
Now, just a year later, DeepL is beaten by open models served by https://groq.com for most languages, and Claude 4 / GPT-4.1 / my hybrid LLM translator (https://nuenki.app/translator) produce practically perfect translations.
LLMs are also better at critiquing translations than producing them, but pre-thinking doesn't help at all, which is just fascinating. Anyway, it's a really cool topic that I'll happily talk at length about! They've made so much possible. There's a blog on the website, if anyone's curious.
Interesting. I'd be interested in testimonials alongside the options for promotion/putting it in the newsletter/etc. E.g. how many people actually read that newsletter?
It's interesting to see the Guardian using Fahrenheit ("50 degrees hotter") here. Unless it really is an entire 50 degrees C? I suppose that's plausible.
It's co-published with an American group, though, and they also use 50.
reply