I'm deaf. Something close to standard Canadian English is my native language. Most native English speakers claim my speech is unmarked but I think they're being polite; it's slightly marked as unusual and some with a good ear can easily tell it's because of hearing loss.
Using the accent guesser, I have a Swedish accent. Danish and Australian English follow as a close tie.
It's not just the AI. Non-native speakers of English often think I have a foreign accent, too. Often they guess at English or Australian. Like I must have been born there and moved here when I was younger, right? I've also been asked if I was Scandinavian.
Interestingly I've noticed that native speakers never make this mistake. They sometimes recognize that I have a speech impediment but there's something about how I talk that is recognized with confidence as a native accent. That leads me to the (probably obvious) inference that whatever it is that non-native speakers use to judge accent and competency, it is different from what native speakers use. I'm guessing in my case, phrase-length tone contour. (Which I can sort of hear, and presumably reproduce well, even if I have trouble with the consonants.)
AI also really has trouble with transcribing my speech. I noticed that as early as the '90s with early speech recognition software. It was completely unusable. Even now AI transcription has much more trouble with me than with most people. Yet aside from a habit of sometimes mumbling, I'm told I speak quite clearly, by humans.
> AI also really has trouble with transcribing my speech. I noticed that as early as the '90s with early speech recognition software. It was completely unusable.
I don't know what your transcription use cases are, but you may be able to get an improvement by fine-tuning Whisper. This would require about $4 in training costs[1], and a dataset with 5-10 hours of your labeled (transcribed) speech, which may be the bigger hurdle[2].
1. 2000 steps took me 6 hours on an A100 on Collab, fine-tuning openai/whisper-large-v3 on 12 hours of data. I can shar my notebook/script with you if you'd like.
2. I am working on a PWA that makes it simple for humans to edit initial, automated transcriptions with mistakes for feeding the correct dataset back into the pipeline for fine-tuning, but its not ready yet
I'm also deaf, and I took 14 years of speech therapy. I grew up in Alabama. The only way you would know I'm from the South is because of the pin-pen merger[1]. Otherwise, you'd think I grew up in the American Midwest, due to how my speech therapy went. Almost nobody picks up on it, unless they are linguists that already knew about the pin-pen merger.
I’m aware of the merger, but I literally can’t hear a difference between the words. I certainly pronounce them the same way.
I also think merry-marry-Mary are all pronounced identically. The only way I can conceive of a difference between them is to think of an exaggerated Long Island accent, which, yeah, I guess is what makes it an accent.
That's exactly what the pin-pen merger is! As you know, it's not limited to pin/pen, and hearing ability (in my case, profound hearing loss) is not related to the ability to hear the difference. I don't understand the linguistics, but my very bad understanding is that there's actual brain chemistry here that means that you _can't_ hear the difference because you never learned it, never spoke it, and you pronounce them the same.
My partner is from the PNW and she pronounces "egg" as "ayg" (like "ayyyy-g") but when I say "egg" she can't hear the difference between what I'm saying and what she says. And she has perfect hearing. But she CAN hear the difference between "pin" and "pen", and she gets upset when i say them the same way. lol
But yeah, that's one of the things that makes accents accents. It's not just the sounds that come out of our mouths but the way we hear things, too. Kinda crazy. :)
When I was listening to some of the samples on the page you linked (pronunciation of “when”), it really seemed to me like the difference they were highlighting was how much the “h” was pronounced. Even knowing what I was listening for, it was very like my brain was just refusing to recognize the vowel sound distinction. So I think you must be right about it being a matter of basic brain chemistry.
In the example of the reverse pen/pin merger (HMS Pinafore) on that page, I couldn’t hear “penafore” to save my life. Fascinating stuff.
I used to think of the movie “Fargo” and think “haha comical upper midwestern accents.” And then at some point I realized that the characters in “No Country for Old Men” probably must sound similarly ridiculous to anyone whose grandparents and great grandparents didn’t all speak with a deep, rural West Texas accent - which mine did, so watching the movie it just seemed completely natural for the place and time at a deeply subconscious level.
They are the same phoneme for me in US Eastern suburbia, the only difference is in a subtle shift in the length that you drag it out. "merry" is faster than "marry" which is sometimes but not always faster than "Mary". Most UK accents seems to drag the proper name out an additional beat, and for some of them there's a slight pitch shift that sounds like "ma-ery", at its most extreme in Ireland (this is one early shibboleth by which I recognized Irish people before I really picked up on the other parts of the accent).
As someone with a German accent, to me the difference between merry and marry is the same as between German e (in this case ɛ in ipa) and ä (æ in ipa). Those two sounds are extremely close, but not quite the same. According to the Oxford dictionary that is true in British English, while it shows the same pronunciation (ɛ) for both in American English
Wow, I'm not deaf, but almost everything you mentioned applies to me too. I've never met anyone else who has experienced this before, yet all of your following points apply exactly to me:
> standard Canadian English is my native language
> Most native English speakers claim my speech is unmarked
> Non-native speakers of English often think I have a foreign accent, too. Often they guess at English or Australian. Like I must have been born there and moved here when I was younger, right?
> They sometimes recognize that I have a speech impediment but there's something about how I talk that is recognized with confidence as a native accent.
At least 2 or 3 times a year, someone asks me if I'm British, but me and my parents were born in Canada, and I've never even been to England, so I'm not really sure why some people think that I have a British accent. Interestingly, the accent checker guesses that my accent is
American English 89%
Australian English 3%
French 3%
I was born in Brooklyn, to Yiddish speaking parents and Yiddish was my first language. I now spend half my time in California and half in Israel. The accent checker said 80% American English, 16% Spanish, and 4% Brazilian Portuguese. In Israel they ask if I’m Russian when I speak Hebrew. In the US, people ask where I’m from all the time because my accent—and especially my grammar—is odd. The accent checker doesn’t look for grammatical oddities but that’s where a lot of my “accent” comes from.
Yep, I'm also deaf (since age 6), went through a lot of speech therapy, and have a very pronounced deaf accent. I live in the midwestern US (specifically, Ohio) and at least once a year I get asked where I'm from - England being the most common guess, but I've also had folks ask if I'm Scottish or Australian.
AI struggles massively with my accent. I've gotten the best results out of Whisper Large v2 and even that is only perhaps 60% accurate. It's been on my todo list to experiment with using LLMs to try to clean it up further - mostly so I can do things like dictate blog post outlines to my phone on long car rides - but I haven't had as much time as I'd like to mess around with it.
Using the accent guesser, I have a Swedish accent. Danish and Australian English follow as a close tie.
It's not just the AI. Non-native speakers of English often think I have a foreign accent, too. Often they guess at English or Australian. Like I must have been born there and moved here when I was younger, right? I've also been asked if I was Scandinavian.
Interestingly I've noticed that native speakers never make this mistake. They sometimes recognize that I have a speech impediment but there's something about how I talk that is recognized with confidence as a native accent. That leads me to the (probably obvious) inference that whatever it is that non-native speakers use to judge accent and competency, it is different from what native speakers use. I'm guessing in my case, phrase-length tone contour. (Which I can sort of hear, and presumably reproduce well, even if I have trouble with the consonants.)
AI also really has trouble with transcribing my speech. I noticed that as early as the '90s with early speech recognition software. It was completely unusable. Even now AI transcription has much more trouble with me than with most people. Yet aside from a habit of sometimes mumbling, I'm told I speak quite clearly, by humans.
Hearing different things, as it were.