Sorry to sound so negative, but it seems something halfway between an April Fool and some Kickstarter "scam" (possibly in good conscience, but still).
Both voice recognition and translation are HARD problems, for which even companies with tons of resources and talent like Apple and Google were able to provide only sketchy and partial solutions.
No to mention issues like battery life, connectivity (hard to believe this device would be able to work without a permanent connection to cloud servers doing the heavy lifting, like Siri does) etc...
I'm also skeptical. This is a problem where it's fairly easy to get 80% there but nearly impossible to get 100% there. Most speech recognition apps have trouble with thick accents. From there, imagine stumbling speech or people who change their thoughts in the middle of a sentence. Then there's the lag between input and output - you often can't translate a phrase or sentence until you know the whole thing and then you're adding computer computation time onto it.
I think this type of technology is inevitable, but I think we're still 50 years away from it being ubiquitous.
Nevermind thick accents. What about languages that have tens, even hundreds, of dialects?
Slovenian, for instance, has only 2,000,000 speakers. Yet there are 56 officially classified dialects. [1] They all vary in important ways, sometimes to the point that a speaker from one dialect has trouble understanding a speaker from another.
French is another language with hundreds of dialects. [2] My girlfriend is French, but when she goes to the Caribbean, she has trouble following conversation because their French is so much different than hers.
English makes for a wonderful example too. Not only does it have hundreds of dialects, there are at least 4 different Englishes. UK, US, Aussie, African-American Vernacular. They're starting to show signs of splitting into separate languages. They already have differences in vocabulary and grammar.
Then you can add slang on top of all this.
Oh and to add to all this confusion: In most European countries people learn the standardized form of their native language almost as if it was a foreign language.
I don't think we'll ever get to 100% accuracy with a tool like this. Even humans themselves can't do 100%.
In South Texas, I once encountered a man speaking English with a very strong Acadian accent. I had to listen to him speak for several minutes before I even realized that what he was speaking was in fact English. Accents and dialects are incredibly complex.
Exactly, considering the way language works with different language sentence conjugation, even simple things like adjectives in most romance languages have relative to the noun and article, a translation to a language like English in realtime would sound like someone with a head injury.
A phrase like "Did you see you see the big red truck? It was driving very fast would be:
¿Viste el gran camión rojo? Conducía muy rápido.
In real time this would be in English:
See the big truck red? Driving very fast.
That's just scratching the surface with the most basic communication. There's simply no way this product can work with any real accuracy.
I can understand your cynicism but I think this kind of product is possible. Maybe not perfect, but that the our consumer technology is already mostly there albeit not unified just yet. I'll explain by addressing some of your concerns:
> "Both voice recognition and translation are HARD problems, for which even companies with tons of resources and talent like Apple and Google were able to provide only sketchy and partial solutions."
I've recently been buying a lot of packages from Japan and found Google's camera translate to be surprisingly good. Bordering on sci-fi awesome.
As for voice recognition, that's something that's been consumer technology for a while. Smart TVs, mobile phones, home automation assistants, etc. It's pretty common technology now. Granted the results aren't 100%, they are still generally quite accurate - with the obvious caveats: depending on a person's accent and background noise.
> "No to mention issues like battery life"
That's a pretty minor point these days on devices with no LCD displays.
> connectivity (hard to believe this device would be able to work without a permanent connection to cloud servers doing the heavy lifting, like Siri does)
They do say it depends on a smartphone app, so my assumption was it works in a similar way to the Pebble watch. ie it's essentially just a bluetooth speaker where the phone will push the data to it.
>> "Both voice recognition and translation are HARD problems, for which even companies with tons of resources and talent like Apple and Google were able to provide only sketchy and partial solutions."
> I've recently been buying a lot of packages from Japan and found Google's camera translate to be surprisingly good. Bordering on sci-fi awesome.
OCR of short sentences from invoices or similar and fluid speech recognition of often grammatically incorrect sentences are problems of a very different magnitude. OCR is easy in comparison of actually spoken speech recognition -- especially if it is supposed to run in real time
> As for voice recognition, that's something that's been consumer technology for a while. Smart TVs, mobile phones, home automation assistants, etc. It's pretty common technology now. Granted the results aren't 100%, they are still generally quite accurate - with the obvious caveats: depending on a person's accent and background noise.
and they simply suck for any language than english. its easier to talk to my devices in english than in german. and this is despite my poor pronunciations
>> "No to mention issues like battery life"
> That's a pretty minor point these days on devices with no LCD displays.
not at that size. the power cell would have to be tiny
>They do say it depends on a smartphone app, so my assumption was it works in a similar way to the Pebble watch. ie it's essentially just a bluetooth speaker where the phone will push the data to it.
which would mean that they'd need bluetooth for permanent connectivity. this would drain the battery even more.
> "OCR of short sentences from invoices or similar and fluid speech recognition of often grammatically incorrect sentences are problems of a very different magnitude. OCR is easy in comparison of actually spoken speech recognition -- especially if it is supposed to run in real time"
OCR isn't comparable to speech recognition. Period. The problem needs to be broken into two parts: 1) speech recognition, 2) translation. I was addressing those parts separately. However you're right that translating natural spoken language, which is full of grammatical errors and other nuances, would be a greater challenge than short passages of official documents.
> "and they simply suck for any language than english. its easier to talk to my devices in english than in german. and this is despite my poor pronunciations"
I'll have to take your word for it. But suffice to say there are several competing speech recognition engines, some better than others. So it might be that your devices use an engine that's poor at recognising German. Although equally you might be using one of the better engines. Hard to say from this high level overview. However the fact that they do work against your pronunciation of English is promising as it at least demonstrates how well these engines can cope with different accents - which is one of the hardest stumbling blocks in building speech recognition.
> "not at that size. the power cell would have to be tiny"
Indeed but we've already seen tiny power cells in watches and other modern gadgets. Given the earpiece only really needs a speaker, bluetooth receiver, some kind of DAC, and a battery; half the device could be the battery.
Just to be clear, I'm not suggesting this thing would have a great battery life though. But I still wouldn't be surprised if lasts a few hours - which is still comparable to some smart phones (sadly).
> "which would mean that they'd need bluetooth for permanent connectivity. this would drain the battery even more."
I did say it would need bluetooth. But realistically bluetooth doesn't suck that much power. The speaker would easily be a heavier power draw.
To summarise: there are obviously going to be difficulties - clear technical challenges - in build this kind of device. But I do think the technology is now at a stage were we can at least prototype something. And that was the crux of argument. The earpiece might not be a perfect gadget, but those gadgets who are first to market are seldom perfect.
Meanwhile, Bragi's wireless headphones have a microphone for pass-thru audio, and bluetooth connectivity. Coupled with a smartphone, I'm curious why an app for this doesn't already exist?
Jibbigo was doing decent voice-to-voice translation entirely locally back in the iPhone 3GS days. You still had all the usual pitfalls of voice recognition and machine translation, but it was good enough to be useful.
Your FAQ link isn't loading for me (hugged to death?) but I watched the video on the page HN links to here and I see nothing in it that's infeasible. As with any voice recognition and machine translation product, you'll probably need to be patient and tolerant of many errors, and I wouldn't be surprised if they gloss over that, but it could still be useful.
Edit: the FAQ finally loaded. I don't see anything particularly remarkable in there. There are no promises for battery life, speed, or accuracy. The basic premise of offline voice-to-voice translation is totally doable and has been done.
All the technology to do this already exists. I have speech-to-text on my Pebble watch -- it goes up to a server, is processed, and is returned all relatively quickly. So you combine that with text translation tools that have existed for years and then text-to-speech.
I'm sure it's really just a specialized bluetooth headset, an app, and bunch of cloud services purchased from other vendors.
At one point, they had an app that did live-translation. But Facebook bought the tech and took it off the market. https://en.wikipedia.org/wiki/Jibbigo
Today, they are riding the deep-learning wave to get ever better/faster recognition and translation. So huge improvements are to be expected.
[1] The professor, Alex Waibel, also has a chair at CMU.
Realtime is not possible (right now). You allways have to wait for the voice recognition and then you translate.
In real life, when you listen to a person, you do it simultaneous and, even more important, you anticipate what the other person will say. Right now machines can't do that - they even don't come near that.
The promo video wasn't true real time either though. They just has the translate services running after each speech had concluded. So I think by "real time" they just mean "automatically".
It's not wet and squirmy like a babel fish, but I fear the consequences of it being able to fully translate product/sales-speak into engineer-speak and vice-versa.
Like Douglas Adams wrote:
"Meanwhile, the poor Babel fish, by effectively removing all barriers to communication between different races and cultures, has caused more and bloodier wars than anything else in the history of creation."
I'm skeptical we'll be seeing this device. I guess it'll mainly depend if they'd get funded and even then I only see this as a short lived gimmick. The machine translation software just isn't good enough. It's unmissable to me, but for real time translation, it's simply not there yet. Since they have to license the speech recognition and machine translation technologies, in the end they'll just be competing with bluetooth headsets that are suited for picking up voices in conversations and they'll simple draw the shortest straw.
Well, the regular bluetooth headsets wont do the job just yet, you want the microphone to be directional forward facing to keep it simple, preferably adaptable but that's more complex. The regular bluetooth headsets are aimed at picking up your own voice. So you'd want something more hearing aid like.
It's hard to imagine that it will work smoothly enough to be useful, at least right away, and more of an interesting gimmicky gadget. I'm looking forward to trying it regardless, assuming they succeed in releasing it (due September).
edit: Slightly misleading on their website. Says "Signup on the waitlist and be entered to win a FREE pair!" When you sign-up it then says they are giving away one per month and you can get one entry into the prize draw by doing certain things such as tweeting about them or liking them on FB, but that I have 0 entries for signing up to the waitlist. Meh, I would have signed up for updates anyway, and don't particularly care about a contest I'd likely not win, it's just the principle of misleading wording.
Reminds me of x.ai. You sign up, they put you on a huge "waitlist" that moves slowly, and periodically remind you that you can jump big chunks of that list by spamming people about the product on social media.
Asking people to whore themselves on social media is becoming a new startup marketing strategy. I hate stuff like this.
I have a bad enough time getting Siri to understand my Aussie accent. Throw in something as inaccurate as Google Translate and you'll have... A pile of garbage really.
I desperately need something to translate Japanese into my ear, and I'm probably just good enough to relay my point back without the other party needing an eng->jp version, so I would be a prime customer for this gadget. But the technological accuracy is just not yet there in either speech recognition or translation.
I'm from Scotland, and on the rare occassions time I need to phone a US voice activated phone line (e.g. Expedia.com's), I have to put on a ridiculous American accent for the system to understand me.
The japanese version will be pretty terrible, I'd imagine. Too much homophones and ambiguous wording. "sounandayo!" = そうなんだよ "that's right!" = 遭難だよ "it's an accident!"
The pitch on those are quite different and there is obviously context to rely on too. I mean whether the software can do all that is a question, but something like Siri does OK with Japanese speech recognition (at least not a lot worse than other languages from what I can tell).
I live in Montreal and my biggest gripe with Siri is her not being bilingual! There are so many street names or place names here that are French and when I want to send a message to a francophone friend I can only use handsfree mode in English.
Imagine that in Northern Ontario/Val d'Or French, which still has a vowel system that's almost dead in the south now. The grammar's not as different now as it was when I was a kid (that's more than a half-century ago) but that's not quite "standard Canadian French" either, let alone continental French.
I think making the device is the easy part, the hard part is doing the translation right in most of the cases, which both Google and Bing have been unable to do so far. 'Google Translate' app already has a communication mode where two people with different languages can talk, but if it worked good there was nothing preventing them to make such an earpiece.
Even if Google Translate worked well for text (and it doesn't), translating text may be easier than translating the spoken word. When we speak, more things are left unsaid, because we can pick them up from nonverbal cues or from context. We also don't produce nice, grammatically-correct, complete sentences, we produce fragments instead, and make mistakes, and go back and correct things. And we slur things in ways that make it difficult for voice recognition to understand us. We also use intonation, timing, and so on in speech, which a voice recognition-translation-text to speech roundtrip probably destroys. We use less standardised, colloquial language which can be highly specific to groups of people, both in vocabulary and grammar.
Real-time translation of speech is difficult even for interpreters. They have to be aware of the context (in some cases specifically prepared beforehand, depending on the topic) and make assumptions based on human experience. You'd need to make a computer that can do that, too.
I suspect that a babelfish with current technology will work fine if you're listening to a prepared speech from a political leader, and poorly in actual social situations if people speak normally.
They are something like 95% right, 1 wrong in 20[], which is pretty often. But in a face to face conversation, with so much non-verbal communication going on, it may be more than enough. Very different from cold, isolated translation or online.
It's certainly far ahead of having no language, or trying to look up a phrase book.
BTW It's like "word lens" for audio https://en.m.wikipedia.org/wiki/Word_Lens Google bought and made it free. I've tried it, and it works, it's very cool. But I don't know if it's actually that useful, or even gets used much, in the field.
[] EDIT sorry, those figures were just for speech recognition!
> They are something like 95% right, 1 wrong in 20
Which set of languages is this stat about? For example, trying English with Hindi (they don't support it right now), though Google Translate is impressive, there are still a lot of mistakes (much more than 1 in 20). For English to Spanish, I assume the error rate would be lesser in comparison.
It isn't that better. Admittedly I am only learning Spanish, but I get odd or incorrect translations far more often than 1 in 20 and because I am a beginner, I don't throw really difficult cases at it.
This is also text only use so ignoring things like accents (was that a p, v or b?) which can make the whole thing even more difficult.
For other languages, like my native Slovenian, it is even more ridiculous.
I'd like to see more audio augmented reality systems. Translation, taking baby cry down a notch, amplifying speech directed at you, autofiguring out best moment to tell you things you might need to know.
With baby everyting can go badly. I wasn't thinking about shutting out baby cry to get some sleep (sleeping with headphones?) but rather making it little less intrusive when it screems in your face and you are sufficiently aware that it does already. Also I was thinking about capping the volume of baby cry to some level (80dB?) not reducing it propotionately.
Besides, I'm not sure how the number of babies harmed because of not crying loud enough compares to the number of babies that were hurt because they cried too loud.
Loudness of baby cries is probably delicately finetuned for other times when incentive to feed you baby had to overcome the desire to feed youself bit more.
I think modern humans have enough other incentives to keep their babies alive and healthy.
Disclaimer: I have no children. I am not significantly bothered by other people babies crying.
If you wanted to shut out your baby crying while you sleep you wouldn't use rigid earphones with partial baby cry noise cancellation. You's just buy flexible comfy earplugs for few cents and you can do it today.
You are right that lawyers scare the crap out of innovators though.
But in seriousness as well, my first kid cried a lot, and it's not a ln on/off switch - even if you're hugging and cuddling and comforting them, they will still cry for ten minutes because that's what kids do. It mostly just makes your job more difficult. My second son had lungs of an opera singer, and the noise made it physically unbearable for me to be near him when he needed help. I mean literally painful to my ears. There were more than a few times when I have had to leave him alone until his crying calms down or stops because I can't do anything for him.
Like others said, it's more about attenuating the volume when you already know what they need but they keep crying. My firstborn was colicky. There were long periods of time were he would cry loudly and all I could do was hold and rock him. I bought gun-range ear muffs.
I've always thought it'd be great to have a mask/muzzle that turns a babies cry into laughing or something else more pleasant while seated next to them on the plane.
It's just about tricking senses to make your living more pleasant. Same way you do with comfy chair or woven clothes or warming up air around you beyond what's needed for survival of taking warm bath or wearing sunglasses, or tinted glasses. Better tech just gives you more comfort.
I would argue that buying an ergonomic chair goes way beyond tricking your senses. There are actual mechanical forces that support your body weight in ways more conscious of your anatomic needs; e.g. the chair is simply better, no trickery needed.
Wearing sunglasses could be seen as either way. You are definitively dimming the raw luminous stimuli that reaches your eyes, but you could argue that your are not dullying your senses but just filtering a part of the spectrum with low signal-to-noise ratio, so your senses are enhanced as a result. Technology can be seen as an extension of human capacity. That's what tools are all about.
This thing in the ear, it does not give you beyond human capacities, neither helps you recover regular capacities you lost by whatever circumpstances. This just let you pay for the privilege of not having to develop your capacities to begin with.
How good are cell phone microphones? Could I hack up a similar device with my regular headset plugged into my phone and the phone picking up the speech and translating it and reading it back to me (Google Translate)? So basically phone/app doing all the work and using whatever headphone is available.
I suppose the hard parts are identifying what originates from a person and the actual speech to text and back?
So far it seems like it could be build from existing blocks. Doesn't have to be great and can gradually improve...a worthy hack around project imo
I haven't checked in on CMU's Sphinx in a while but that was a speech-speech system I looked at for a bit. What are the state of the art building blocks for such a project? Would I want speech-text-speech (I assume this is easier?) or directly speech-speech?
I just don't see this working very well. The current language translation tools just are not good enough to support a proper conversation. Translating languages via technology is a hard problem and it's far from solved. Coupled with this fundamental problem is identifying speech accurately; something Siri fails to do regularly with my voice. If this thing gets built I am guessing it will be a novelty product.
Just the fact that they are listing their product on Indiegogo instead of Kickstarter signals to me that they probably got shut when trying to publish their project on Kickstarter—who is trying hard to distance theirselves from these moonshot projects that are likely to disappoint consumers. If you don't have a really serious working prototype now, Kickstarter is super-duper hesitant to let you on their platform.
Could they trademark (it wouldn't be copyright) a term that they themselves copied from a work of fiction they don't own the copyright to, given that the context (i.e. what it does) isn't different to that in the work of fiction?
Yes - and yes, trademark, not copyright, pre-coffee conflation. Consider the Nexus line of phones, lifted directly from PKD's Do Androids Dream. Copyright is about area of applicability - I bet babelfish is trademarked for "translation services and products".
This seems like a scam, or at best a marketing lie. Voice recognition is still an experimental field that's very difficult to get right. Even today's professional voice recognition software requires some voice training. There's no way this thing can accurately pick up speech in a random environment.
And given that most voice recognition software require some decent hardware (think your phone or your pc). I doubt it's gonna fit into that tiny earpiece with today's hardware specs.
This isn't necessarily possible for any two random languages. Not every language has a SubjectVerbObject order like English, and even if it does, each language has its own quirks with how the words are ordered. For example in spanish "un pez grande" = "a big fish", but if you translate in real time it comes out as "a fish big". Of course, an interpreter can reword it as "a fish, which is big, ..." but this is not always possible.
Too good to be true. Seems like they are just trying to get some media traction to convince some VC for some funding or creating a footprint for themselves in the tech fraternity.
Yes I will be a cynical hypocrite when it hits the market with seamless performance and after I will purchase it.
Real life scenario:
French chick: Bonjour. es tu perdu?
Scrawny-geek: (All blushing and shit...Is she interested in me? put on "groundbreaking" Pilot earbuds)
FC: excusez moi mister!!
SG: gives hand gesture to wait
10 seconds time lapse
analog audio signals are converted to digital and then data buffer is sent to the smartphone via BT. Phone pre-process the raw data and sends it to Watson-Siri-Echo-Now-esque ultra system composed of signal processor then NLP then translator and it sends it back to the phone and phone does some more voodoo on it and sends it back to the Pilot via BT and plays it to its user.
SG: Oh. I am sorry mademoiselle. I am just trying to find direction to the bathroom.
FC: lost and confused expressions appear as if she did not understand one word of English
SG: punching the air in frustration..wish this device could play the translation back to me in French
Considering the variety of French accents - Paris, regional France, Africa, Caribbean, Canadian, South East Asian, and guys like me in Texas and more - I'm rather skeptical this unit will work as seamlessly as the marketing video advertises. Just a thought. C'est mignonne, vraiment.
I remember reading an "interactive" book as a kid (a book with numbered short episodes, and at the end of each one you are presented with a couple of choices how the story would continue, it was awesome) where the good guy was sent back in time by the bad guy to the wild west where he had to convince the indians to trust him and fight with him against the same bad guy, who liked to slaughter them for fun, or die with them. In order to understand the natives, he had some gadget implanted in his brain that translated their language directly into his own. I remember thinking "yeah right, never gonna happen." :)
This is probably far away from being able to translate languages perfectly, but still. The future came pretty fast.
Just wondering if this is anything more than a in-ear bluetooth headset connected to an app that uses google translate as its backend?
Because if yes, they would maybe have more success selling the app that you can use with the headphones you already own already and so making the user acquisition a bit smoother than buying a bluetooth headset...I guess more people would give less money for an app that gets updated than an headset that could potentially break...
Unless there is some technology in the headset itself so it can work on its own without the app connected to it. Didn't see anything like that mentioned here though.
We have brilliant entrepreneur coming up with revolutionary idea no one had before(lol), sleek renders and good looking models, price point already set, all that and ... absolutely ZERO tech behind it.
Always imagine everyone using tech like this, slowly using their own slang, then something goes wrong, devices die, and no one understands the slang languages each person now uses.
To a certain extent, this is equivalent to "imagine everyone's glasses fell off and broke".
Sure, these devices are more complex and only a handful of people know how they work, but by the time they spread through an entire civilization and we adapted to be unable to communicate without them... they'd be as robust as modern day glasses and you'd have repair services in every mall.
Maybe, maybe not, but agree that your version of the future is just, if not more, as possible.
That said, more complex things get, the more likely that something will be missed, an error made, and/or unaccounted for due to its not having been seen before. On the bright side, intentional disruption rarely have this sort of impact.
Nothing new here. As the article mentions, it's the Babelfish from the Hitchhiker's Guide to the Galaxy. I've wanted to build one myself since I read the book, but the fundamental problem of machines translating languages is still a very hard problem. This thing won't work.
About a year ago someone tried hiring me to do this. At least to do research and report back with recommendations.
On one hand I think we've never been closer to being able to create this kind of product. On the other its still rife with complexity and edge cases. Reality is messy.
if you type finnish into google translate, the results in english are pretty bad.
furthermore, google translate only manages the "written" finnish language... which nobody talks like. everyone uses "spoken" finnish which is not translated at all in google translate.
i wonder how many other languages have this property.
This product indeed sounds too good to be true but we'll probably have something that delivers within 10 years. Add that to a virtual assistant and the future could be very different. Automatthew's Friend by Stanislaw Lem explores one possible scenario.
I must admit that I'm highly skeptical. Show me a working app for my phone that does voice translation, and then I'll believe that something like this works.
"The creator says he came up with the idea when he met a French girl." and just like that he came up with the whole concept of real time language translation, something the world has never even considered before /snark
Applying the Principle of Charity—i.e. choosing to respond to the strongest interpretation of a statement—would have been helpful here.
There are multiple interpretations of a phrase like that. Choosing an obviously idiotic one leads to snarky internet comments, something we need less of on HN. Choosing the strongest one might lead you to post nothing or it might trigger a thoughtful response. Either way we get better signal/noise ratio, which is what we're hoping for here.
Yeah, this is what I came here to say as well. And while we're at it, I got this great idea that we should cure cancer. The idea came to me when I saw someone sick with cancer. Next stop: patent office.
For business yes, for private? No thank you, you'll lose a lot of knowledge, understanding, mental skills by not trying to learn/understand a new language.
Both voice recognition and translation are HARD problems, for which even companies with tons of resources and talent like Apple and Google were able to provide only sketchy and partial solutions.
No to mention issues like battery life, connectivity (hard to believe this device would be able to work without a permanent connection to cloud servers doing the heavy lifting, like Siri does) etc...
The FAQ is simply "too good to be true".
http://www.waverlylabs.com/2016/05/faqs-about-early-bird-and...