Right now, we have a set of “industry presets” where we have preloaded keywords and context for different industries (GPU, LLM, GPT for AI, for example).
Over time, we want our users to build upon these preset terms, for example, automatically adding the terms mentioned in different meetings. There is a challenge here—how do we add terms that may be mispronounced or that the LLM may have mixed up? I think having the context of their conversation and their base documents for these conversations could definitely help.
We are also excited about real-time translation + voice cloning (like having your K-pop stars speaking your language with their voices!)
This is actually something we explored previously. The tech is there but we weren't sure of the the user experience, especially in terms of latency.
I think it’s similar to Netflix subtitles—some people prefer subtitles and dislike voice-overs, while others opt for the dubbed versions.
I also believe that as the meeting progresses, it feels more natural, and participants become aware of the translator. (Interestingly, they often start speaking more clearly and using fuller sentences, just as they would with human interpreters!)
Thanks for your comment. I hope you give it a try!
I agree with your observation about "Interpreter" vs. "Translator."
When we first started this project, we referred to it as an "interpreter." However, after speaking with human interpreters and considering their feedback, we settled on "real-time translation". We might have left some of our past on the internet tho..
As with everything, there are both advantages and limitations to text-based translations. Here are a few:
Limitations:
- Some people may find it challenging to follow gestures and expressions while reading.
- In more one-way scenarios, such as presentations and webinars, hearing the speaker’s voice often feels more natural.
Pros:
- Many users actually prefer text because it allows them to hear the speaker’s original voice and pick up on nuances.
- Having a written record enables post-meeting summaries and the opportunity to repurpose transcripts into other materials, such as blog posts, custom user manuals, JIRA notes, and more using AI.
- There are also technical constraints with voice-to-voice translations, which currently tend to be turn-based rather than real-time (streaming) - not ideal for exchange of ideas.
That said, we are excited to see how the TTS and STT technologies evolve and are looking forward to experimenting with “interpretation” in the future!
My batchmate was saying the same thing, actually.
He was trying to use only Hindi for the demo sakes and it was almost difficult for him to explain his product without referring to English.
We do our best to deal with langauge changes. For example, when talking bio, almost half of the sentence is in English terms and Cuckoo does pretty well in that context as well!