The pitch on those are quite different and there is obviously context to rely on too. I mean whether the software can do all that is a question, but something like Siri does OK with Japanese speech recognition (at least not a lot worse than other languages from what I can tell).